Intro To Terraform
Infrastructure as Code is a Necessary Evil...Also I thought Terraforming was at thing we wanted to do to Mars one day? 😆
I’ll admit…for a large part of my career working on cloud services, I avoided Infrastructure as Code (IaC) like the plague. I found it to not be as exciting as building data pipelines, and I more/less viewed it as “Oh that’s another team’s job”. However, as time went on, I came to realize that I could get things done exponentially faster if I went ahead and bit the bullet and learned IaC. With IaC, when you build out POC’s, your work is much easier to reproduce for someone else that wants to follow along later. They don’t have to find the test account you were using to create various cloud assets. They don’t have to chase down the IAM roles and permissions you built out. They can just rip the IaC code as-is, make some minor modifications, and run it (usually). Thankfully, IaC, in my humble opinion, is one of the easiest programming languages there is to learn.
So What Exactly is “IaC”?
IaC (Infrastructure as Code) is what it sounds like. You use code to create assets/resources in the cloud. This can be things such as:
Cloud Storage Buckets
Cloud Data Warehouses
IAM roles
IAM Policies
Cloud Bucket Replication Rules
Data Pipeline Jobs
And that’s just to name a handful. When you look at the Terraform registry for providers AWS and GCS, they have API’d the majority of their services which allows you to create them with code.
But Why Should I Bother With It?
Why not just log into the cloud console via the web browser and create your stuff there with the nice little GUI? Simply put, a few things:
You want your assets to be reproducible by others
You want to scale asset creation and management across multiple departments in an organization
You want backups and audit trails of the assets you created
Those are just a few of the benefits off the top of my head. Additionally, IT security and governance requirements are usually a big reason why IaC is required these days.
Enough of the Intro…Let’s Get Crackin’
Alright, so far we’ve given a high level overview of IaC, but let’s actually see this in action. For the remainder of this article, I will walk you through how to create Terraform code to create 2 GCS buckets in separate regions. This is usually a typical pattern these days for HADR (High Availability Disaster Recovery); a bucket in a primary region followed by a replicated bucket in a secondary region. This article however will not setup any replication stuff, as we want to keep it simple, and DR can get very complex.
The Folder Structure
I have found through working with various Terraform projects that the following folder structure works pretty well:
You have a parent folder called “terraform”. In that folder, you have typically 3 files:
main.tf - this is the entry point/main script that gets fired when you tell terraform to go do things like create assets
locals.tf - this is where you can store static variables
provider.tf - this is where you specify the provider for terraform to use to create assets (e.g. Google Cloud/AWS/etc)
Additionally, a lot of terraform projects will have a fourth file called “variables.tf”. I don’t have that in my project today, but that file allows you to create non-static e.g. mutable variables that can change throughout the terraform scripts while they are running.
After those files, we have a sub folder that I call “modules”. You can call it whatever you want such as “Franks-Pizza-Shop”. This folder will contain template scripts for the various assets you want to create, which is great when you want to scale out your IaC code and not have to repeat blocks of code over and over again.
Now that we have the folder structure squared away, let’s look at these individual files…
The Provider File
Let’s take a peak at the provider file first.
This file specifies that we are using Google (GCP) as our provider for Terraform to create assets in. You will notice that I’ve listed the “google” provider twice on lines 12 and 16 - once for the default us-east1 region and a secondary listing for region us-west1. The second listing has an alias called “west”. If you do not alias a provider, that is assumed to be the default provider.
You might ask - why do we need multiple listings of the same “provider” here e.g. Google? That is because in order for us to create buckets or other assets in multiple regions, we need to have the corresponding provider passed to our code that has the region preset. If this project was only focused on creating assets in one region, then the secondary listing of the provider with the us-west1 region would not be necessary.
The Locals File
Now, let’s take a look at our locals.tf file.
This file is where I can create static variables that I can reference in the main.tf script later. From an organizational perspective, something important to consider is to label/tag your various assets you create via Terraform. This helps with cost management and knowing who created various things in the cloud. It helps with overall finOps monitoring/reporting. In GCP, they call it “labels”. In AWS, they call it “tags”. You will see later in the terraform scripts where I add this list of labels to each asset I create.
Side Note - Google’s “Labels” are very finicky on what characters can go in them. You can’t use symbols other than hyphens or underscores from what I've noticed. This means for things like contacts, you can’t put an “@” symbol in them and provide an email address. In AWS, that’s not the same case. AWS allows you do put more informative characters in your labels.
The Main File
This is where things start to get exciting. Now that we have our initial plumbing out of the way, let’s see what the main.tf file looks like:
Alright, in this file, you will see I have affectively 2 code blocks a.k.a - modules. Each block invokes a template file, which for our example lives in the modules/gcs subfolder. This allows us to reuse our template code and not have to repeat it multiple times. This also allows us to easily create our GCS buckets in multiple regions, since we can in each module specify our provider. You will see in the first module, I simply just specify “google” as the provider, but in the second module, I specify “google.west” which indicates I want to create stuff in the west region. You will also see how I’m leveraging my locals.tf static variables; I’m passing in the region for each module to assemble my bucket name variable. I’m also passing in the tags that I want on the buckets.
Pro Tip - When you execute the main.tf script in Terraform, it will run as many things as it can in parallel. Terraform is usually smart enough to sniff out dependencies and the order in which things are created. If you want to instead force Terraform to run things based on dependencies, you can add in the module the depends_on block.
Now Let’s Look at Our Template Code for GCS
Now we will look at our gcs.tf template file. And this is where things get fun:
This is our “template” file a.k.a. our reusable code. In this file, Terraform creates a GCS bucket based on the arguments passed in from our main.tf file. You will notice in this file that those variables are prefixed with “var.”. In order to make this work, each template file will also require a file called “variables.tf”. That file allows us to pass mutable values down to the template file script. Our variables.tf file in our GCS folder is as simple as this:
Ok, so back to the gcs.tf file. We have a few things going on here:
We are passing in our bucket name, the tags, and our region.
We are setting the GCS storage class to standard
we are setting a lifecycle rule to delete items older than 30 days
we are disabling public access to the bucket (sorry cryptobros)
Side Note - Google’s provider for Terraform makes it much easier to specify a region to create your bucket in, as that is an actual parameter in the google_storage_bucket terraform code. In AWS, the region is not an available parameter, and thus the provider passed via the module kicks in. I left it in here anyways as it does not harm the google code and can be useful for those needing to create buckets in multiple regions in AWS.
Now that we have walked through the Terraform code, how does one actually run it?
Running Terraform Code
If you don’t have terraform installed, I recommend you do so 😁. I used homebrew on my mac to install it. Once you have it installed, from the terminal, navigate to your projects terraform folder and run this command:
terraform init
This will tell terraform to scan the terraform folder plus all corresponding sub folders and get itself geared up to create terraform assets. When you run this, terraform will download any necessary API’s based on the provider you specified in the provider.tf file.
Next, we will run in the terminal this command:
terraform plan
This is basically our unit test, where Terraform checks syntactically that our code is sound and that we have the permissions to do what we say we want to do.
Pro Tip - I’m using my GCP default application credentials to authenticate to google’s cloud, which you can instantiate via the GCP SDK cli. For an example of getting your default credentials loaded locally, you can see this post and look for the section running the gcloud login part. This avoids me hardcoding credentials somewhere. Terraform is smart enough to detect these creds in my environment and will use them to check and execute the terraform code.
Assuming the terraform plan runs and throws no errors, it’s time to actually go create the assets. To do this, we will run this command:
terraform apply -auto-approve
When we execute that code in our terminal, Terraform will run our main script, which calls the gcs subfolder template code twice via the module blocks to create our 2 buckets. The flag “-auto-approve” tells the script to fully run without waiting for our permission. If we did not put that flag in, once the code is actually ready to run, our terminal would prompt us to type the word “yes” to approve it making the changes.
So how does this look when we run it?
…Hot Diggity Dog! Alright, it says our buckets have been created. Let’s go take a peak in GCS:
Well there you go. We have our 2 buckets in separate regions. Now for the fun part. How do we undo this stuff quick and easy? Terraform has a command for that call “terraform destroy”. We will run this code to nuke the buckets:
terraform destroy -auto-approve
Alright, now those buckets are gone. Usually though, you don’t want to destroy these assets in production, as you are creating them for yourself or others to use. The only time I use the terraform destroy command is if i’m building stuff in a test account that I later need to clean up, so I don’t cause unnecessary spend for unused assets just hanging around out there in the cloud.
Conclusion
This article walked you through an introduction to Terraform and IaC. IMO, IaC can significantly up your game in the data engineering world and get you closer to that ever allusive full stack rainbow rocket powered unicorn of a developer.
Here’s a link to the terraform code we covered: TF Code
Thanks for reading,
Matt
Well now I'll have to quit telling myself data pipelines are another team's job.