EC2 AMI creation without magic

Magic

While I enjoy the fact that there are people out there maintaining EC2 AMI:s for other people to use, I was faced with two problems. First, there were no AMI:s maintained for the Linux distribution I wanted to use (Arch). Second, I don’t like the idea of relying on something magical out of my hands that I don’t understand and cannot affect – in this case, I am referring to the kernel AKI:s that were traditionally not under the control of an average EC2 user (I believe Amazon itself and select partners were able to provide these kernels). Using one of those AKI:s I would essentially be relying on someone else release engineering kernels that are compatible with my userland.

In short, given a Linux kernel I have built, and a userland I know how to prepare, I want to create an EC2 bootable disk image/AMI.

As it turns out, this is possible nowadays, but the details were a bit hard to find (for me anyway). So, here is a short guide on how to go create an AMI from scratch, relying only on the kernel, your distribution and a host system on which to create the image (can be virtualized, such as with VirtualBox). It is assumed that you’re already familiar with things like boot loaders, building a kernel and such.

arch4ec2

If you wish, you can look at arch4ec2 as an example of the process briefly described below. arch4ec2 is a small tool that automates the creation of an Arch Linux system (with a btrfs root fs). It must be run from a host Arch Linux system, such as one installed using the Arch Linux installation ISO onto a VirtualBox. Alternatively if you want to play, you can use one of the AMI:s I built and list in README.

Doing without the magic (almost)

EC2 supports something called user specified kernels. Without it, as mentioned above, you choose which kernel to boot by selecting a so-called AKI to boot your image with. The AKI was provided by Amazon or (I believe) one of a few select partners, and you had to run an image that was compatible with that kernel.

With user specified kernels, the AKI you choose is instead pv-grub (which I assume stands for “paravirtualized grub”). As a result, all you have to do is create a disk image which is accessible by grub (i.e., correct partitioning/filesystem layout) and which has a grub configuration that points to a kernel which is compiled with the necessary support for paravirtualization (i.e., it has to be Xen compatible). The only significant difference from installing grub locally is that grub itself is provided by Amazon (through the AKI chosen) rather than being installed in the boot record of your image (this is where there is still a small bit of magic).

Step 1: Selecting an AKI

In the user specified kernel documentation (NOTE: do not cut’n’paste the “hyphen” from this PDF as it is not actually a hyphen, and ec2-register will fail) there is a list of AKI:s to use depending on whether you intend to run a 32 bit or a 64 bit kernel, which region you intend to run in, and whether or not your image will be based on EBS or S3. I have only tested EBS, and I don’t know what might be different for S3 based images. I have also only tested 32 bit as of this writing.

Step 2: Paravirtualization support in the kernel

In order to enable the appropriate support (for a 32 bit kernel, 64 bit not yet tested by me), these are needed:

CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y
CONFIG_PARAVIRT_GUEST=y
CONFIG_XEN=y
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_CLOCK=y
CONFIG_XEN_BLKDEV_FRONTEND=y
CONFIG_XEN_NETDEV_FRONTEND=y
CONFIG_HVC_XEN=y
CONFIG_XEN_BALLOON=y
CONFIG_XEN_SCRUB_PAGES=y

(It is possible some variation is acceptable.)

Step 3: Use pv-grub compatible kernel compression

If you’re using a sufficiently new kernel, the kernel build might produce a kernel compress with XZ/LZMA2 instead of GZIP. Such a kernel will not boot on EC2 and you need to use GZIP instead:

CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set

Step 4: Populate a boot partition and root file system

Your disk image should be partitioned and file system initialized (in the case of arch4ec2 I use a small ext3fs boot partition and a btrfs root partition). If you have a separate boot partition mounted under /boot, do not forget to put a boot directory in it and symlink grub to boot/grub.

How to best populate a system is mostly up to which distribution you use. In the case of Arch Linux, the mkarchroot tool is helpful for scripting it (this is what arch4ec2 uses). But, in most cases if you are doing this manually as a one-off, you can just install your system as you would normally in a virtualized environment and take whatever steps necessary to switch to a properly configured kernel.

Step 5: Make an EBS snapshot of your disk image

In order to register an AMI, you must have an EBS snapshot which contains the contents to be used when spawning an instance using the AMI. If you did the original setup on ec2 maybe you already have an EBS volume and can just snapshot it. Otherwise, you are going to have to get your disk image onto EC2 in some way. For example, you can use the alesic AMI:s I mentioned before and boot a system, then mount an EBS volume you’ve created and ‘dd’ your device image to it over ssh.

In any case, once you have an EBS volume containing the image as you want it to appear in the AMI, snapshot it:

ec2-create-snapshot -d 'my-ami-snapshot' vol-XXXXXXX

Step 6: Register an AMI based on your snapshot

You will first have to recall the AKI you chose in step 1, and the snapshot id that was emitted by ec2-create-snapshot in step 5. Then, register the AMI:

ec2-register --debug -s snap-XXXXXXXX --root-device-name /dev/sda -n my-arch-ami --kernel AKI

The AMI registration will not succeed until the EBS snapshot has completed, so you will have to wait for that first.

Done

At this point your AMI is ready and you should be able to spawn instances with your AMI.

Some resources


Pipeline stalls and the human brain

It’s amazing how often I am stricken by how closely, in some ways, the human brain mimics the behavior of a CPU in day-to-day work and interaction with other people or organizations. The analogy works particularly well for optimizing productivity in the workplace (at least I find that to be the case as a developer).

Consider a productive day programming, being in the zone. Very likely, this involves little interaction with others, and few interruptions. In fact, it seems most programmers feel the most productive when they are able to sit down and just work continuously, with focus, for an extended period. In my analogy, this is executing a tight loop with all instructions and memory fetches in cache.

An example of the opposite is when you are blocking on an external entity. Having to interrupt your work to wait for something else (outside of your control) to complete is analogous to a pipeline stall. Examples include:

  • Waiting for the compiler to build your software.
  • Waiting for a deployment step to complete.
  • Waiting for a slow web page to load when searching for documentation.
  • Waiting for someone to answer a question for you.
  • Waiting for someone to give you access to something so that you can complete testing or deployment.
  • Waiting for someone to review something.

It is worth noting that blockages can be either technical or human or a combination thereof.

Given a fixed amount of time spent working, and assuming you wish to be productive, clearly you do not want to twiddle your thumbs while waiting for various things that you’re blocking on. For example, if you have to wait 3 hours for someone to review your code prior to merging to the production branch, you do not want to sit there idling for 3 hours. The solution to this is typically to context switch to something else (in other words, you are engaged in concurrent execution of multiple tasks). Whenever you reach a blocking point, you context switch to an appropriately non-blocked (runnable) task.

What happens when everything you are working on is blocking on something? You start working on something else, increasing concurrency. As is generally recognized, context switching is expensive for the human mind (just as it is for an operating system on a CPU). In this case, as you increase concurrency, you will either be context switching more often (each context switch carrying with it some overhead) or you’re context switching to tasks that you, on average, worked on a longer time ago. In other words, all the caches relevant to the switched-to task will be colder.

Either way, all forms of blocks are expensive in the sense that overall productivity is decreased.

A way to mitigate the problem is to predict future choices or over-commit on options (i.e., branch prediction), thereby causing would-be blocking points to block for less time or not at all once you reach them (i.e., pre-fetching).

While the analogy is fun to draw, the main point here is that blocking on external entities is very expensive and in order to remain productive such blockages should be minimized. It follows that whenever you do absolutely have to block, you want to do so for a short a time as possible. In other words, the latency of external dependencies is crucial to productivity.

Interestingly, this implies that while a single individual or team may be considered most productive when they focus only on the task at hand, reality complicates matters. Suppose you have people or teams that have dependencies on other people or teams in a working environment. Every time you interrupt your work to help someone else, you lose some productivity. But someone else is gaining because you are decreasing latency for them.

I believe that in reality, a reasonable balance has to be kept in terms of quickly servicing external requests vs. focusing for productivity. I also believe that typically, this is ignored completely when arguments are put forth that you should e.g. only check your E-Mail once per day or refuse to do anything other than your main task for an entire day.

People, scrum teams, whatever it is – don’t exist in a vacuum and the greater good has to be considered in order to maximize global productivity.