Using Packer, systemd and deployment script to build a Rails application into an AMI and launch it into production as a Spot Instance - saving up to 90% of the on-demand price.

This post is the next installment in the series on Immutable Servers. In the previous post we looked at the infrastructure needed to launch web applications into production as spot instances. In this post we will bake a Rails application into an AMI and produce a deployment script that will allow our build process to compile an AMI and launch it into production

In the first post in the series we used Packer our base server images. We'll take the rails-base image and build a new image on top of this that includes our application and a systemd service script to automatically launch the web application on boot.

Packaging with Packer and systemd

I'll assume that you have an existing Rails application that you want to bundle as an immutable server, and that you've been following the previous posts in the series.

Baking the Rails App

So I'll assume the root of your application has a structure that looks a little like this:

.
├── app
├── bin
├── config
├── config.ru
├── db
├── Gemfile
├── Gemfile.lock
├── lib
├── package.json
├── public
├── Rakefile
├── README.md
├── test
└── vendor

In the root of your application we'll create a build folder. In this folder, we'll put everything that we need for Packer and the script that will deploy our application.

All the files we create from here on will live in the build folder.

Let's first create the outline of our Packer template that will use the rails-base image we produced previously.

{
  "builders": [
    {
      "type": "amazon-ebs",
      "access_key": "{{user `aws_access_key`}}",
      "secret_key": "{{user `aws_secret_key`}}",
      "region": "eu-west-1",
      "instance_type": "t3.small",
      "ssh_username": "ubuntu",
      "source_ami_filter": {
        "filters": {
          "name": "rails-base*"
        },
        "owners": ["self"],
        "most_recent": true
      },
      "ami_name": "demo-web-app {{timestamp}}",
      "associate_public_ip_address": true,
      "tags": {
        "Name": "demo-web-app",
        "Project": "demo-web-app",
        "Commit": "{{user `git_commit`}}"
      }
    }
  ],
  "provisioners": [
    ...
  ],
  "post-processors": [
    {
      "output": "manifest.json",
      "strip_path": true,
      "type": "manifest"
    }
  ]
}

The first thing that we'll want to do is create a folder for our web app to reside in, to do this we'll create a packer-init.sh:

#!/bin/bash
set -e
mkdir -p /srv/demowebapp
chown ubuntu: /srv/demowebapp

We can now update the provisioners in our packer.json template to run the packer-init.sh and copy our entire application into this folder:

{
  "builders": [
    ...
  ],
  "provisioners": [
    {
      "type": "shell",
      "execute_command": "echo 'vagrant' | {{.Vars}} sudo -S -E bash '{{.Path}}'",
      "scripts": [
        "packer-init.sh"
      ]
    },
    {
      "type": "file",
      "source": "../",
      "destination": "/srv/demowebapp"
    },
    ...
  ],
  "post-processors": [
    [
      {
        "output": "manifest.json",
        "strip_path": true,
        "type": "manifest"
      }
    ]
  ]
}

The file provisioner will copy the entire root of our application into /srv/demowebapp

Installing dependencies, testing and running with systemd

Copying our code into our server image isn't enough to get it to run:

  • We'll need to install our dependencies
  • We should run our tests to ensure that the image will do what it's supposed to do when we launch it in to production
  • We need to setup systemd to run the application upon boot

We'll add another script (packer-configure.sh) which will run after we've copied the code into the server image. This script will take care of the dependency installation, test execution and systemd setup.

Create a packer-configure.sh script file and add another script provisioner:

{
  "builders": [
    ...
  ],
  "provisioners": [
    {
      "type": "shell",
      "execute_command": "echo 'vagrant' | {{.Vars}} sudo -S -E bash '{{.Path}}'",
      "scripts": [
        "packer-init.sh"
      ]
    },
    {
      "type": "file",
      "source": "../",
      "destination": "/srv/demowebapp"
    },
    {
      "type": "shell",
      "execute_command": "echo 'vagrant' | {{.Vars}} sudo -S -E bash '{{.Path}}'",
      "scripts": [
        "packer-configure.sh"
      ]
    }
  ],
  "post-processors": [
    [
      {
        "output": "manifest.json",
        "strip_path": true,
        "type": "manifest"
      }
    ]
  ]
}

Installing Dependencies

The first thing we'll do in our packer-configure.sh script is install our dependencies. When packer copied our code over using the file provisioner, the files in the server images were created with root ownership, we should change this to use our webapp user.

#!/bin/bash
set -e

gem install bundler
sudo chown -R webapp:webapp /srv/demowebapp

(
    cd /srv/demowebapp
    sudo -u webapp bundle install --path /srv/demowebapp/.bundle
)

Running Tests

Running tests in our server image as we build it will help us reduce the number of potential environmental issues and give us a high degree of confidence that our application will function as we'd expect.

Once the dependencies have been installed, we'll run the tests in a subshell. This subshell will be executed as the webapp user and will ensure any environment variables used for testing are isolated to that subshell:

#!/bin/bash
set -e

gem install bundler
sudo chown -R webapp:webapp /srv/demowebapp

(
    cd /srv/demowebapp
    sudo -u webapp bundle install --path /srv/demowebapp/.bundle
)

sudo -u webapp bash <<"EOF"
cd /srv/demowebapp
export RAILS_ENV=test
bundle exec rails db:environment:set
bundle exec rake db:drop db:create db:migrate
bundle exec rspec --format documentation --format RspecJunitFormatter --out rspec.xml
git rev-parse HEAD > REVISION
EOF

If any of our tests fail during the build process the test failure will fail the entire building, preventing us from shipping broken code.

Systemd

Systemd will be used to run our application as a service and will automatically launch the service whenever the server image is booted. This will require us to create a service manifest and a run script that systemd can invoke.

Let's first create a run script that will be responsible for pre-compiling assets, running migrations and starting puma. We'll put this script in the build folder.

#!/bin/bash
cd "$(dirname "$0")/.." || exit
./bin/bundle exec rails assets:precompile
./bin/bundle exec rake db:migrate
./bin/bundle exec puma -C config/puma.rb

Within the build folder create a system.d folder that will contain our service manifest:

[Unit]
Description=Demo Web App
Requires=network.target

[Service]
Type=simple
User=webapp
Group=webapp
Environment=RAILS_ENV=production
WorkingDirectory=/srv/demowebapp
ExecStart=/bin/bash -lc '/srv/demowebapp/build/run.sh'
TimeoutSec=30
RestartSec=15s
Restart=always

[Install]
WantedBy=multi-user.target

Note, that the ExecStart command uses bash to invoke our run script as it would be located in the server image.

The final step of the packer-configure.sh script should be to add the demowebapp service to systemd:

#!/bin/bash
set -e

gem install bundler
sudo chown -R webapp:webapp /srv/demowebapp

(
    cd /srv/demowebapp
    sudo -u webapp bundle install --path /srv/demowebapp/.bundle
)

sudo -u webapp bash <<"EOF"
cd /srv/demowebapp
export RAILS_ENV=test
bundle exec rails db:environment:set
bundle exec rake db:drop db:create db:migrate
bundle exec rspec --format documentation --format RspecJunitFormatter --out rspec.xml
git rev-parse HEAD > REVISION
EOF

sudo mkdir -p /usr/lib/systemd/system
cp /srv/demowebapp/build/system.d/demowebapp.service /usr/lib/systemd/system/demowebapp.service
cp /srv/demowebapp/build/user-data.sh /etc/rc.local
chmod +x /etc/rc.local
systemctl enable demowebapp.service

You'll notice in the final step we also add a user-data.sh script as /etc/rc.local, this is an optional step where I am using /etc/rc.local (which runs on boot) to customise the hostname into a standard format for this specific server image:

#!/bin/bash
set -e

INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
HOSTNAME="demo-web-app-${INSTANCE_ID}"

# Hostname
echo -n "${HOSTNAME}" > /etc/hostname
hostname -F /etc/hostname

Whenever I launch an instance of this server image, the hostname will match the pattern: demo-web-app-*, e.g.:

demo-web-app-i-0804f202925fd084a

Build Process

Now that we have a packer template fully configured, we can put together a simple build.sh in our build folder that will invoke packer for us.

In the build process, there is little value in packaging up any logs, code coverage results or any other files that we wouldn't want as part of the build.

When we looked at building our base server images with packer we baked the git commit hash into the AMI as a tag. Knowing exactly what code went into any kind of build is always useful, so we'll do this again. This time round we'll do this without using jq and by using variables in packer instead.

You will notice when we created our packer template this time the Commit tag was set to {{user `git_commit`}} which will pull in the user vairbale git_commit which can be passed to packer via command line argments

{
  "builders": [
    {
      ...
      "tags": {
        "Name": "demo-web-app",
        "Project": "demo-web-app",
        "Commit": "{{user `git_commit`}}"
      }
    }
  ],
  ...
}

Our build.sh will look as follows:

#!/bin/bash
set -e
cd "$(dirname "$0")" || exit

rm -f ../REVISION
rm -rf ../coverage/
rm -rf ../log/*.log

rm -f manifest.json
packer build -var "git_commit=$(git rev-parse HEAD)" packer.json

We also use a manifest post-processor in our packer template which produces a manifest.json, which will look something like this:

{
  "builds": [
    {
      "name": "amazon-ebs",
      "builder_type": "amazon-ebs",
      "build_time": 1553103589,
      "files": null,
      "artifact_id": "eu-west-1:ami-12e0bca147d8846e3",
      "packer_run_uuid": "fa5e8a22-0f35-6c83-7381-a04b15f8917b"
    }
  ],
  "last_run_uuid": "fa5e8a22-0f35-6c83-7381-a04b15f8917b"
}

The manifest.json file will be important when we put together our deploy script, as we'll be able to extract the AMI ID of the image image we've just built from the artifact_id attribute.

We now have a fully fledged build process that can package our application as an immutable server.

Deployment Script

As I'm writing an example Rails app to demonstrate this concept I'll stick with using Ruby for my deployment script

When we defined our immutable infrastructure, we created a Target Group on an Application Load Balancer that we can attach our spot instances to. Our script will discover the Target Group and other relevant infrastructure, and deploy our applications AMI using a spot fleet request.

Whenever our script needs to reference our infrastructure (such as target groups and security groups), we will determine the IDs of the resources based on how they are named. For example, we'll look for the target group named "website", rather than the target group with as specific ARN. This is so that we decouple our deployment script from the specific instance of our resource. In the future, we may need to go back to our terraform infrastructure and change a resource attribute that causes terraform to re-create the resource with a new ID. If we hardcode IDs everywhere this will cause our script to break, instead we'll derive IDs and ARNs based on the name of resources.

At a highlevel, our deployment script will:

  1. Identify existing spot fleet requests that are being used to host the current in-production version of our application
  2. Launch a new spot fleet request againstr the Target Group using the new AMI that we want to deploy
  3. If the AMI deploys successfully, we'll retire any old versions of the application that were identified in Step 1. Alternatively, if the deploy fails we'll just cancel the new spot fleet request we tried to launch

We'll place our deploy.rb script in our build folder and declare a separate Gemfile specifically for the deploy process.

source 'https://rubygems.org'

git_source(:github) do |repo_name|
  repo_name = "#{repo_name}/#{repo_name}" unless repo_name.include?('/')
  "https://github.com/#{repo_name}.git"
end

gem 'aws-sdk'

We'll invoke the deploy.rb script we're about to write with a lightweight deploy.sh wrapper:

#!/bin/bash
cd "$(dirname "$0")" || exit

bundle install --path ./bundle
bundle exec ruby deploy.rb

Now let's make a start on the deploy.rb script

require 'aws-sdk'
require 'logger'
require 'json'
require 'net/http'

$stdout.sync = true
logger = Logger.new($stdout)

aws_region = begin                                                                                                                                                                  
  JSON.parse(Net::HTTP.get(URI("http://169.254.169.254/latest/dynamic/instance-identity/document")))["region"]                                                                      
rescue Errno::EHOSTUNREACH                                                                                                                                                          
  logger.info("No route to host for AWS meta-data (169.254.169.254), assuming running as localhost and defaulting to eu-west-1 region")                                             
  'eu-west-1'                                                                                                                                                                       
end 

In this first snippet, I've done some very basic setup. We'll declare the dependencies which we need, which are largely built-in libraries with the exception of the AWS SDK. I've setup a logger and used $stdout.sync to foce the stdout buffer to immediately flush whenever it receives any new data. This can be really helpful when running the script through a CI/CD tool as these tools typically require the buffer to flush for them to be able to show you logs.

Using a rescue block I attempt to determine the region of the EC2 instance using the AWS Instance Metadata. I've done this incase you end up running your deployment process on a CI server within your AWS account. In a future blog post I plan to cover how you can setup an entire Jenkins environment running on spot instances. As a fallback if the GET request fails we'll assume we're in the eu-west-1 region. This can be useful when you're testing or using the deploy script locally or outside of AWS.

Next, we'll create a load of API clients which will be important for determining the IDs and ARNs of resources. We will also get the AMI ID from the manifest.json that's created as part of our build process.

ec2 = Aws::EC2::Client.new(region: aws_region)
elbv2 = Aws::ElasticLoadBalancingV2::Client.new(region: aws_region)
iam = Aws::IAM::Client.new(region: aws_region)

packer_manifest = JSON.parse(File.read('manifest.json'))
ami_id = packer_manifest['builds'][0]['artifact_id'].split(':')[1]
logger.info("AMI ID: #{ami_id}")

WEBSITE_TARGET_GROUP_ARN = elbv2.describe_target_groups({names: ['website']}).target_groups[0].target_group_arn

logger.info("Using ELB target group: #{WEBSITE_TARGET_GROUP_ARN}")

iam_fleet_role = iam.get_role({role_name: 'aws-ec2-spot-fleet-tagging-role'}).role.arn

default_sg_id = ec2.describe_security_groups({
                                                 filters: [
                                                     {
                                                         name: "description",
                                                         values: [
                                                             "default VPC security group",
                                                         ],
                                                     },
                                                 ],
                                             }).security_groups[0].group_id

rails_app_sg_id = ec2.describe_security_groups({
                                                 filters: [
                                                     {
                                                         name: "tag:Name",
                                                         values: [
                                                             "Rails App",
                                                         ],
                                                     },
                                                 ],
                                             }).security_groups[0].group_id


logger.info("IAM Fleet Role ARN: #{iam_fleet_role}")
logger.info("Default Security Group: #{default_sg_id}")
logger.info("Rails App Security Group: #{rails_app_sg_id}")

Before we start deploying our application we should use the EC2 API to determine the IDs of the spot fleet requests that are running the current in-production version of our application.

existing_website_spot_fleet_request_ids = []

ec2.describe_spot_fleet_requests.each do |resps|
  resps.spot_fleet_request_configs.each do |fleet_request|
    if fleet_request.spot_fleet_request_state == 'active' || fleet_request.spot_fleet_request_state == 'modifying'
      target_groups_config = fleet_request.spot_fleet_request_config.load_balancers_config.target_groups_config
      if target_groups_config.target_groups.all? {|tg| tg.arn == WEBSITE_TARGET_GROUP_ARN}
        existing_website_spot_fleet_request_ids << fleet_request.spot_fleet_request_id
      end
    end
  end
end


logger.info("Existing website fleet requests: #{existing_website_spot_fleet_request_ids}")

We can then create our new spot fleet request that will launch our newly built application AMI.

response = ec2.request_spot_fleet({
                                      spot_fleet_request_config: {
                                          allocation_strategy: 'lowestPrice',
                                          on_demand_allocation_strategy: "lowestPrice",
                                          excess_capacity_termination_policy: "noTermination",
                                          fulfilled_capacity: 1.0,
                                          on_demand_fulfilled_capacity: 1.0,
                                          iam_fleet_role: iam_fleet_role,
                                          launch_specifications: [
                                              {
                                                  security_groups: [
                                                      {
                                                          group_id: default_sg_id
                                                      },
                                                      {
                                                          group_id: rails_app_sg_id
                                                      }
                                                  ],
                                                  iam_instance_profile: {
                                                      name: "website",
                                                  },
                                                  image_id: ami_id,
                                                  instance_type: "t3.micro",
                                                  key_name: "demo",
                                                  tag_specifications: [
                                                      {
                                                          resource_type: "instance",
                                                          tags: [
                                                              {
                                                                  key: "Name",
                                                                  value: "demo-web-app",
                                                              },
                                                              {
                                                                  key: "Project",
                                                                  value: "demo-web-app",
                                                              },
                                                          ],
                                                      }
                                                  ],
                                              },
                                          ],
                                          target_capacity: 2,
                                          type: 'maintain',
                                          valid_from: Time.now,
                                          replace_unhealthy_instances: false,
                                          instance_interruption_behavior: 'terminate',
                                          load_balancers_config: {
                                              target_groups_config: {
                                                  target_groups: [
                                                      {
                                                          arn: WEBSITE_TARGET_GROUP_ARN
                                                      },
                                                  ],
                                              },
                                          },
                                      },
                                  })

logger.info("Launching spot instance request: '#{response.spot_fleet_request_id}'")

We will then want to wait for the spot fleet request to be provisioned, and for the instances to become available.

spot_provisioned = false
begin
  ec2.describe_spot_fleet_requests({spot_fleet_request_ids: [response.spot_fleet_request_id]}).each do |resps|
    resps.spot_fleet_request_configs.each do |fleet_request|
      if fleet_request.activity_status == 'fulfilled'
        spot_provisioned = true
      end
      if fleet_request.activity_status == 'error'
        logger.error("Provisioning spot instance request '#{response.spot_fleet_request_id}' has failed!")
        exit 1
      end
      logger.info("Spot instance request '#{response.spot_fleet_request_id}' has activity status: '#{fleet_request.activity_status}'")
      sleep 10
    end
  end
end until spot_provisioned
logger.info("Launched spot instance request: '#{response.spot_fleet_request_id}' !")
sleep 10

When the spot fleet request launches our instrances it will initially have a state of initial, once the health checks on the load balancer have called out to our new instances they'll then transition to healthy or unhealthy.

The next stage of our script will wait for all instances to become healthy or abort is any instance moves to an unhealthy state. By using a spot fleet request cleaning up instances is really easy, we just have to cancel the spot fleet request and AWS will take care of terminating the instances and removing them from the target group.

target_group_resp = elbv2.describe_target_health({target_group_arn: WEBSITE_TARGET_GROUP_ARN})
until target_group_resp.target_health_descriptions.all? {|thd| thd.target_health.state == 'healthy'}
  total_instances = target_group_resp.target_health_descriptions.size
  healthy_instances = target_group_resp.target_health_descriptions.count {|thd| thd.target_health.state == 'healthy'}
  unhealthy_instances = target_group_resp.target_health_descriptions.count {|thd| thd.target_health.state == 'unhealthy'}

  logger.info("#{total_instances} total instances in target group. #{healthy_instances} healthy instances...")

  if unhealthy_instances > 0
    logger.error("#{unhealthy_instances} unhealthy instances! aborting...")
    ec2.cancel_spot_fleet_requests(spot_fleet_request_ids: [response.spot_fleet_request_id], terminate_instances: true)
    logger.error("Cancelled new fleet request (id: #{response.spot_fleet_request_id})")
  end

  if total_instances != healthy_instances
    sleep 10
    target_group_resp = elbv2.describe_target_health({target_group_arn: WEBSITE_TARGET_GROUP_ARN})

    total_instances = target_group_resp.target_health_descriptions.size
    healthy_instances = target_group_resp.target_health_descriptions.count {|thd| thd.target_health.state == 'healthy'}
    unhealthy_instances = target_group_resp.target_health_descriptions.count {|thd| thd.target_health.state == 'unhealthy'}

    logger.info("#{total_instances} total instances in target group. #{healthy_instances} healthy instances...")
  end
end
sleep 10

If our new AMI instances launched into the load balancer successfully with a healthy state, we can now terminate any spot fleet requests that were serving up older versions of our application:

unless existing_website_spot_fleet_request_ids.empty?
  logger.info("Cancelling old spot instances: #{existing_website_spot_fleet_request_ids}")
  ec2.cancel_spot_fleet_requests(spot_fleet_request_ids: existing_website_spot_fleet_request_ids, terminate_instances: true)
end

logger.info("Deployed!")

And that's it! We have a deploy script that will replace existing in-production instances with our new AMI.

Producing a new build of our application is as simple as:

./build/build.sh

We can then release it with:

./build/deploy.sh

The deployment and build scripts used here are simple and fairly basic. You could certainly make many improvements to the process to allow for more complex development practices such as being able to deploy branched builds to development and test environments. In this post, as with my other posts in the immutable series, I've focused on the simplest example to prove and explain the concept.

Conclusion

Over the course of my series on Immutable Servers we're looked at:

EC2 Spot Instances can provide a cost saving of up to 90% off the on-demand price. Transitioning to an immutable model has it's challenges, but with some well placed tooling and architecture design you can leverage additional benefits and avoid common issues such as configuration drift.