Over my career, I’ve worked with several programming languages:
C# - This was my first compiled language
Python - My daily driver
C - for when I want to better appreciate how CPU’s actually work
Rust - Relatively new; have about 6 months experience on it now
Go - Same as rust; have about 6 months experience on it now…maybe a little more
A common problem I encounter when writing pipelines is to process data/files/something of that nature in parallel. Python makes this pretty simple with its ProcessPoolExecutor; below is a simple example:
This example was pulled from a prior post I did showing how to generate data from scratch. That post can be found here. But have you ever tried running that in a notebook?
Pro Tip: Don’t ever try to run Python’s ProcessPoolExecutor in a notebook; it will crash due to how the sub processes conflict with the IPython kernel.
If I can just use python to get the job done, then that’s great. But in situations where I need a compiled language, I now need to make the choice between Go or Rust (I stopped using C# a while back and haven’t looked back ever since 😁). When debating between these 2 languages, I really like how Go has implemented its concurrency model with go routines.
To understand that further, let’s take a look at the following Go program:
What this program does, is it runs 5 web requests in parallel to various websites. For Go routines, there are some key things we need to pay attention to:
sync.WaitGroup - this allows us to synchronize multiple concurrent processes until all of them complete.
the GO keyword - this tells Go to run a process asynchronously in the background
defer wg.Done() - this tells go to decrement the number of in-flight go routines running tied to the wait group once the code block completes. I really like the “defer” keyword as it guarantees that step will fire before the function exits; IMO its a better alternative to try/catch/finally.
wg.Wait() - this tells the Go program to not proceed until all go routines have completed. This allows us to effectively synchronize our parallel processing.
You will also notice in the code above, I have what is known as a semaphore. This creates bounded parallelism e.g. like max_workers in python. The semaphore is not required as Go does a good job managing all the stuff it has in flight, but I usually implement bounded parallelism when I’m doing things such as downloading or uploading a massive amount of files to and from cloud storage so I don’t overwhelm the pipes or get errors because I have too many concurrent things in flight. This example above is also a very simple example. You can get a lot more thorough with Go routines and truly unlock their power when you use channels. Here’s an example of that where I wrote a program that writes 1B rows of integers in parallel in about 3 seconds in Go, which leverages channels:
And this is where I had a change of heart on Rust
In all other compiled programming languages I’ve used besides Go, to run async operations, you have to result to function coloring with the “async” and “await” keywords on every function that you will touch with your async program. This means if you have a main function calling a series of sub functions, all of them (including the main function) will require an async/await if you want the program to behave that way. Here’s an example of the same Go code I showed above written in Rust with the async/await keywords:
Normally I’d scoff at this and get annoying saying “Why do I have to put all this async/await junk all over the place”. But after staring at both the Go and Rust code for a few hours side-by-side, I had a revelation:
Both Go and Rust are pretty much doing the same thing. Where I see “go” in my go code, just replace that with “async” in Rust. Where I see “wg.Done()”, just replace that with “await” in rust.
Do I still get a little annoyed by having to apply the “async” word everywhere - sure; but other than that, I think Rust is just fine. As a programming language, Rust definitely has a higher learning curve vs. Go, but over time, and with good practice, I can see myself writing more and more rust programs.
Here’s a link to both the Go and the Rust code from this article:
So as a data engineer, what do I still see as key strengths and weaknesses in Go and Rust?
Go
Strength - Easy learning curve
Strength - Easy concurrency model (go routines) for parallel tasks
Strength - simpler language (less data types) - some can argue that’s a con
Strength - fast compile times
Rust
Strength - Crazy fast out of the box
Strength - Many new dataframe programs written in it - Polars as an example
Strength - Memory Safety - it makes it very hard for you to produce a memory leak
Weakness - High learning curve; you will spend time fighting the compiler on how you pass variables from function to function
I will continue to invest time in both languages and use them as I see fit-for-purpose.
Thanks for reading,
Matt
Welcome to the dark side you Hobbit
Great benchmarking. We recently talked with John Arundel about Rust from Gopher perspective - https://packagemain.tech/p/rust-for-gophers