This blog now has a deployment pipeline!

18 Dec 2021

This site was created mostly as a learning experience. Setting it up was a complete point-and-click adventure following detailed instructions laid down by a friend of mine (more about it here). That didn’t feel like enough practice. Plus every time I write a new post, I need to scp it to the AWS instance and ssh there to move stuff around. I decided to take my learning process up a notch and write a github action to deploy the site. Moreover, this time I tried to figure things out myself instead of asking for instructions beforehand.

My first idea was to recreate exactly what I am doing manually in a github workflow:

This was the right approach in principle - do in your script what you would do manually - but apparently what I was doing manually wasn’t really the proper way. With AWS autoscaling you have an autoscaling group, responsible for making sure you always have a specified number of instances running. The autoscaling group is based on a launch template that is based on Amazon Machine Image (AMI) that was created based on some instance having the correct configuration (in my case nginx server + the contents of my static website).

AWS Autoscaling group

So when I update only the instance, the information is not propagated to the other parts of the setup. And so if/ when the instances crashes new instance would be launched from the old AMI with the old website version.

And that meant I actually need to go through almost the full setup of the website I did manually in the first place:

  1. Launch a new EC2 instance
  2. Configure nginx on it correctly
  3. Create a new AMI based on that instance
  4. Assign that new AMI to the launch template as a new template version
  5. Make that version default so it would be used by autoscaling group the next time it tries to do something.
  6. Trigger instance refresh on the autoscaling group.

The last part is meant to be a safe way to rollout a new version of the system. But since I’m a cheapskate and have only one instance running, I still had a few minutes of downtime every time I was experimenting with this.

I wanted to do this in the simplest way possible, not relying to much on other github actions. So I used:

All of this wasn’t easy, as I was not familiar with the tooling. I struggled with simple things quite a lot:

Github actions

Using ssh-agent. My idea was to have nice little steps in my workflow like:

But apparently if you add your key to ssh agent in one workflow step that information is completely lost in the next one. Fortunately, you can hack around that by binding that agent to a socket.

Testing the action. Apparently it is not a simple thing to do as workflow files are only picked up from main branch. I ended up using this hack that requires creating a dummy action on the main branch.

Bash language

Using variables. The preparation for instance refresh requires getting and passing a lot of IDs around. You need to create an instance and pass its ID for AMI creation. Then you need to update the launch template version with the AMI ID and use the new version number to set it as default. And you need to know some values like autoscaling group name and launch template ID to be able to access all that. So naturally I tried to put a lot of things into variables, to make things more readable. But I struggled with such simple task quite a lot:

As friend put it mildly ‘Welcome to the weirdness of scripting language invented before usability was a thing’.

Specifying the interpreter. When I started writing the script I just added #!/bin/sh to the top of my script as friend had shown me previously and forgot about it. And it worked fine until I needed to have an until loop to poll for the state of my instance refresh. I used bash syntax for that and started getting errors like [[: not found. It turns out /bin/sh on Linux is not Bash. In order for the script to use Bash I needed to specify that more explicitly: #!/usr/bin/env bash

Parsing JSON outputs correctly

So my variables are set and working, the until loop is running. But unfortunately it is running forever. It was set to to stop when instance refresh status changes to ‘Successful’. And I knew for sure that status is correct now, but the loop just didn’t seem to care. And it was because of some nuances of how I retrieve the status from the aws cli response. I used jq to get me the value of the status field from json response. And apparently that still included quotes. So I was comparing "Successful" against Successful. In order to get the raw output from jq you need to specify that with -r option.


As you can see sometimes very little things can get in your way. Especially when you are working with unfamiliar tooling. It took me three months, 77 commits and 22 AWS launch template versions to finally get things right! But I guess the time invested (and the fact that I didn’t give up) just makes it feel like a bigger accomplishment. I feel like I deserve some blogger gin now. Too bad they don’t seem to sell it anymore.

chin-chin

P. S. If you want to see how the end result looks like, it is something like this:

Workflow

name: CI/CD

on:
  push:
    branches: [ main ]

  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2

      - name: Set up Ruby
        uses: ruby/setup-ruby@v1
        with:
          bundler-cache: true
      - name: Build Site
        run: bundle exec jekyll build
        env:
          JEKYLL_ENV: production

      - name: Setup AWS CLI
        uses: unfor19/install-aws-cli-action@v1
        with:
          version: 2 # default
          verbose: false # default
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: $
          aws-secret-access-key: $
          aws-region: eu-west-1

      - name: Create new AWS instance
        run: |
            INSTANCE_ID=$(aws ec2 run-instances --image-id ***** --count 1 --instance-type t4g.micro --key-name '*****' --security-group-ids***** | jq -r .Instances[0].InstanceId)
            if test -z "${INSTANCE_ID}"; then exit 1; fi
            echo "INSTANCE_ID=${INSTANCE_ID}" >> $GITHUB_ENV
      - name: Get public DNS of the new instance
        run: |
            NEW_INSTANCE_PUBLIC_DNS=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --query "Reservations[].Instances[].PublicDnsName[]" --output text)
            if test -z "${NEW_INSTANCE_PUBLIC_DNS}"; then exit 1; fi
            echo "NEW_INSTANCE_PUBLIC_DNS=${NEW_INSTANCE_PUBLIC_DNS}" >> $GITHUB_ENV
      
      - name: Wait for all running instances to be in ok state
        run: aws ec2 wait instance-status-ok --instance-ids $INSTANCE_ID

      - name: Setup ssh credentials
        env:
            SSH_AUTH_SOCK: /tmp/ssh_agent.sock
        run: |
            ssh-agent -a $SSH_AUTH_SOCK > /dev/null
            ssh-add - <<<"$"

      - name: SCP files
        env:
            SSH_AUTH_SOCK: /tmp/ssh_agent.sock
        run: scp -o StrictHostKeyChecking=no -r ./_site "ec2-user@${NEW_INSTANCE_PUBLIC_DNS}:"

      - name: Setup instance
        env:
            SSH_AUTH_SOCK: /tmp/ssh_agent.sock
        run: |
          ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} "echo \"$\" >> .ssh/authorized_keys"
          ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} 'bash -s' < ./scripts/fresh_instance_setup.sh
          ssh ec2-user@${NEW_INSTANCE_PUBLIC_DNS} "sudo mv /home/ec2-user/_site/* /data/www && rm -rf ./_site"

      - name: Trigger instance refresh
        run: ./scripts/instance_refresh.sh $INSTANCE_ID  $

New instance setup script

#!/usr/bin/env bash

sudo yum -y update
sudo yum -y install yum-utils

sudo touch /etc/yum.repos.d/nginx.repo
sudo tee -a /etc/yum.repos.d/nginx.repo > /dev/null <<EOT
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/amzn2/\$releasever/\$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

[nginx-mainline]
name=nginx mainline repo
baseurl=http://nginx.org/packages/mainline/amzn2/\$releasever/\$basearch/
gpgcheck=1
enabled=0
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
EOT

sudo yum -y install nginx

sudo rm /etc/nginx/conf.d/default.conf

sudo touch /etc/nginx/conf.d/my-website.conf
sudo tee -a /etc/nginx/conf.d/my-website.conf > /dev/null <<EOT
server { 
        location / { 
                root /data/www;
        }
}
EOT

sudo mkdir -p /data/www

AWS instance refresh

#!/usr/bin/env bash

INSTANCE_ID=$1
RUN_IN=$2
AMI_OWNER_ID="*****"
TEMPLATE_ID="*****"
IMAGE_NAME="ieva.dev image ${RUN_IN}"
VERSION_DESCRIPTION="version ${RUN_IN}"
GROUP_NAME="*****"

OLD_IMAGE_ID=$(aws ec2 describe-images --owners "$AMI_OWNER_ID"  --no-include-deprecated | jq -r '.Images[].ImageId')
if test -z "${OLD_IMAGE_ID}"; then exit 1; fi

IMAGE_ID=$(aws ec2 create-image --instance-id "$INSTANCE_ID" --name "$IMAGE_NAME"|jq -r '.ImageId')
if test -z "${IMAGE_ID}"; then exit 1; fi

LATEST_TEMPLATE_VERSION=$(aws ec2 describe-launch-templates --launch-template-ids "$TEMPLATE_ID" | jq -r '.LaunchTemplates[].LatestVersionNumber')
if test -z "${LATEST_TEMPLATE_VERSION}"; then exit 1; fi

NEW_TEMPLATE_VERSION=$(aws ec2 create-launch-template-version --launch-template-id "$TEMPLATE_ID" --version-description "$VERSION_DESCRIPTION" --source-version "$LATEST_TEMPLATE_VERSION" --launch-template-data "ImageId=${IMAGE_ID}"| jq -r '.LaunchTemplateVersion.VersionNumber')
if test -z "${NEW_TEMPLATE_VERSION}"; then exit 1; fi

aws ec2 modify-launch-template --launch-template-id "$TEMPLATE_ID" --default-version "$NEW_TEMPLATE_VERSION"

INSTANCE_REFRESH_ID=$(aws autoscaling start-instance-refresh --auto-scaling-group-name "$GROUP_NAME"| jq -r '.InstanceRefreshId')
if test -z "${INSTANCE_REFRESH_ID}"; then exit 1; fi

until [[ `aws autoscaling describe-instance-refreshes --auto-scaling-group-name "$GROUP_NAME" --instance-refresh-id "$INSTANCE_REFRESH_ID" | jq -r '.InstanceRefreshes[0].Status'` == 'Successful' ]]
do
 sleep 15
 echo "Waiting for instance refresh"
done

aws ec2 terminate-instances --instance-ids "$INSTANCE_ID"
aws ec2 deregister-image --image-id "$OLD_IMAGE_ID"
aws ec2 delete-launch-template-versions --launch-template-id "$TEMPLATE_ID" --versions "$LATEST_TEMPLATE_VERSION"

© 2024. All rights reserved.