Building Data Pipeline - Part 4 - CI/CD

Build CI/CD pipeline with Github actions
CI/CD
Github Actions
Terraform
AWS
Docker
Author

Deepak Ramani

Published

June 13, 2023

Introduction

The Continuous Integration and Continuous Delivery abbreviated as CI/CD is an important part of software development cycle. This part provides the essential step in checking everything before the final product is ready to delivered to the customer. In our case, this step should create infrastructure to deploy the web application into the cloud. In other words, automate the tasks we did in part 3.

There are many tools for CI/CD. We will be using Github Actions as our code is in Github and make sense to automate the build, test and deploy pipeline with it. Our workflow will consist of running CI upon a pull request and CD when this pull request is merged to a particular branch.

Workflow

Github actions uses yaml syntax to define the workflow. Each workflow is stored as a separate YAML file in the code repository, in a directory named .github/workflows. 1

Since we’re doing both two workflows - pull and push, we will have two yaml files - ci-test.yml and cd-deploy.yml. This post will not explain syntax used in these files. It will explain how the tasks are accomplished through these syntaxes. For thorough understanding of syntax, read this document.

CI/CD Pipeline

The source code is present at my github repo.

Triggers

We want to setup event based triggers to activate our workflows. Since we don’t to distrub main branch code base, we create two branches - develop and feature/ci-cd. We switch to the feature/ci-cd branch and define our workflows.

In ci-test.yml the trigger is set on pull_request on the branch develop. Meaning we commit our changes to the feature/ci-cd branch and then do a pull_request for develop branch. A pull in Github speak means “pulling/updating” new contents from another remote branch. Pull requests are usually done in UI as it is easy to review, clarify, approve changes and merge request.

However, in cd-deploy.yml we need our infrastructure setup deployed. So it makes sense to put a trigger upon commits being pushed into the develop branch. SO the trigger is push. This action is activated when the pull_request is merged into the develop branch.

Jobs

ci-test.yml

In CI, it is critical to check if all the parts are integrated together correctly. In our case, we need to confirm whether tf-plan works with our Terraform configuration.

In Github Actions, jobs are can be run on several hosted systems through actions. We will use ubuntu-latest with action version 3. That will in run calls aws-actions for credentials. We provide our AWS secrets to our Github secrets page. Github says the secrets are encrypted before reaching them. They also take measures to prevents secrets appearing in logs. 2

The actions access the AWS secrets to setup our final task to run our infrastructure plan.

Github Secrets
env:
  AWS_DEFAULT_REGION: "us-east-1"
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

jobs:
  tf-plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_DEFAULT_REGION }} 
  1. `${{ secrets.AWS_ACCESS_KEY_ID }} can also be directly used

Since the pipeline is used in production environment, we configure our terraform backend with our new production tfstate and use the plan command to check if our plan is valid, error-free and suitable to our needs.

TF Plan
- name: TF plan
   id: plan
    working-directory: "infrastructure"
     run: |
      terraform init -backend-config="key=mlops-grocery-sales-prod.tfstate" --reconfigure && terraform plan --var-file vars/prod.tfvars
  1. Change to infrastructure directory to execute the terraform command

cd-deploy.yml

CI in this case does only tf-plan but usually it runs several unit and integration tests. We could combine these two files into one if we want but we will leave things as it is.

Similarly, the cd-deply.yml workflow file, check the tf-plan. Upon its success triggers the tf-apply job.

tf-apply
- name: TF Apply
    id: tf-apply
    working-directory: "infrastructure"
    if: ${{ steps.tf-plan.outcome }} == "success"
    run: |
      terraform apply -auto-approve -var-file=vars/prod.tfvars
      echo "name=rest_api_url::$(terraform output rest_api_url | xargs)" >> $GITHUB_OUTPUT
      echo "name=ecr_repo::$(terraform output run_id | xargs)" >> $GITHUB_OUTPUT
      echo "name=run_id::$(terraform output run_id | xargs)" >> $GITHUB_OUTPUT
      echo "name=lambda_function::$(terraform output lambda_function | xargs)" >> $GITHUB_OUTPUT

$GITHUB_OUTPUT takes Terraform outputs and displays them in the logs.

Deleting the infrastructure

If for some reason you wish to delete the resources created in the AWS Cloud, please follow these steps -

  1. Goto infrastructure’s main.tf file. Comment out all the modules. Also comment out contents in outputs.tf

  2. Then commit the changes, push, pull the request and merge the changes.

This should delete all the resources. Confirm with AWS Console.

Alternatively we can use terraform destroy ourselves if we know which tfstate and tfvars were used.

tf-destroy
terraform init -backend-config="key=mlops-grocery-sales-prod.tfstate" --reconfigure
terraform destroy -var-file=vars/prod.tfvars
  1. Since we know which Terraform statefile is used for production
  2. Which production variable file is used

Conclusion

There it is our data pipeline that started with local development is fully integrated and automated into the cloud environment. Any changes made can be seamlessly tested and deployed. This concludes our “building data pipeline” series. Feel free to contact me for any questions or clarifications.

Want to support my blog?

Buy Me A Coffee