Beginner Tutorial: cloud-init in AWS

Cloud Service Image

When we use Amazon EC2 instance for scaling applications or docker hosts, provisioning required softwares and configuration files in instances are essential aspects to keep the system consistent and maintainable. Here are 2 approaches to provision such instances 1) by installing and configuring softwares when an EC2 instance is launched or 2) by creating a golden image with a tool like Packer (Hashicorp).

The difference of these approaches is 1) can be performed when an instance becomes up and it might be flexible for some cases to get the latest packages and information in web if it’s needed. 2) would be more safer to prepare fixed image when we use this image for ALB’s members or Docker hosts because we would want consistency and idempotence for EC2 instance’s set-up and configuration.

cloud-init can be used for the approach 1) by installing and configuring softwares when an EC2 instance is launched.

Cloud-init is the industry standard multi-distribution method for cross-platform cloud instance initialization. It is supported across all major public cloud providers, provisioning systems for private cloud infrastructure, and bare-metal installations.

cloud-init Documentation

I’d like to cover how to use cloud-init in AWS and how it’s implemented in AWS, also some frequent used directives at last. TL;DR.

  • user data can take both shell script and cloud-init directives
  • user data scripts and cloud-init directives run only during the boot cycle when you first launch an instance, here’s the link how to change the configuration to ensure that your user data scripts and cloud-init directives run every time you restart your instance
  • cloud-init consists of 4 services cloud-init-local.service, cloud-init.service, cloud-config.service and cloud-final.service and there is the order how to invoke each service in systemd configuration file
  • cloud-init running in AWS is not exactly the same of cloud-init, it’s customized and altered so that some directive might not be supported

User data and cloud-init

Let’s take a look how cloud-init is implemented in AWS first. In AWS we can put our configurations in user data of an instance. We’re allowed to use both shell script and cloud-init directives in user-data field.

We can pass user data to an instance as plain text, as a file (this is useful for launching instances using the command line tools), or as base64-encoded text (for API calls).

If you’re familier with shell-scripting, you can use shell script starting with #! characters and put the path to the interpreter you want to read the script (commonly /bin/bash) in user data. This is simpler way to give instructions for an instance.

If you give characters #cloud-config, an instance recognizes this is cloud-init instructions and you can use cloud-init directives in user data field. In either way you can check script logs in /var/log/cloud-init-output.log file.

Here’s the example from the official documentation for cloud-init. This cloud-init will do these installations and set up as following.

User data and cloud-init directives

  • The distribution software packages are updated.
  • The necessary web server, php, and mariadb packages are installed.
  • The httpd service is started and turned on via systemctl.
  • The ec2-user is added to the apache group.
  • The appropriate ownership and file permissions are set for the web directory and the files contained within it.
  • A simple web page is created to test the web server and PHP engine.

If you spin up an instance with this user data, you’ll be able to see phpinfo page on your instance’s public IP or DNS name via your browser. Cool, if something went wrong or the result was different from what you intended, you can check these 2 log files. cloud-init.log is log information of cloud-init software while cloud-init-output.log is what you see on console if you run instructions on the instance.

  • /var/log/cloud-init.log – a log file of cloud-init running result
  • /var/log/cloud-init-output.log – a log file of output of cloud-init

You can modify user data if an instance is stopped by console or aws cli. Here’s a good reference from the official documentation.

User data and the AWS CLI

user data is base64-encoded, if you want to confirm the decoded user data from aws cli, here’s the command I usually run.

$ aws ec2 describe-instance-attribute --instance-id <instance id> --attribute userData --output text --query "UserData.Value" | base64 --decode

Cloud-init implementation in EC2

cloud-init in AWS consists of 4 services in a target Linux system as long as I confirmed on Amazon Linux2. These 4 services start cloud-init software and take user data given from AWS to install softwares and configuring softwares when an EC2 instance is launched.

By the way, user data scripts and cloud-init directives run only during the boot cycle when you first launch an instance. You can update your configuration to ensure that your user data scripts and cloud-init directives run every time you restart your instance. Here’s the how-to link of the official documentation.

$ systemctl list-unit-files --type service| grep ^cloud
cloud-config.service                          enabled
cloud-final.service                           enabled
cloud-init-local.service                      enabled
cloud-init.service                            enabled

It’s important how systemd takes precedence about the services, cloud-init related services are invoked with a few dependencies each other. systemctl list-dependencies command shows each service’s dependencies in a visualized way.

$ systemctl list-dependencies

What are dependencies of cloud-init related services and what do these services do? systemctl cat command can show service’s systemd configuration file.

$ systemctl cat cloud-init-local.service

Or you can search the configuration file location and just cat that file on your console.

$ find /etc/systemd/system -name cloud-init-local.service

$ sudo cat /etc/systemd/system/

As a result, we will know 4 services start in specific order as configured in systemd configuration files. As long as I checked, here’s the order how the systemd starts the cloud-init related services. This is also explained in cloud-init official document as “Boot Stages”.

  1. cloud-init-local.service, runs cloud-init init –local
  2. cloud-init.service, runs cloud-init init
  3. cloud-config.service, runs cloud-init modules –mode=config
  4. cloud-final.service, runs cloud-init modules –mode=final

There is a configuration file of cloud-init /etc/cloud/cloud.cfg that determines what modules each service run. Now you should understand cloud-init related services and dependencies, what modules are run inside each service. This information is taken from Amazon Linux2 so the result might not be the same if you check other cloud provider’s configuration file.

$ sudo cat /etc/cloud/cloud.cfg

user data file is pass under /var/lib/cloud/instance and interpreted by cloud-init to execute instructions in each module. So the order of your instructions in user data (cloud config) might not be the same order when instructions are executed in an instance. It depends on 4 service dependencies and what modules are taken in each service.

$ sudo cat /var/lib/cloud/instance/user-data.txt # user data

cloud-init directives

There are a lot of directives in cloud-init and you can find the examples in the official documentation. In this section I’d like to introduce what I’ve used recently. Please note that AWS cloud-init might not support all directives of cloud-init because it is customized and altered for AWS.

For example users directive might not be available AWS cloud-init, in that case you will have to use runcmd directive to create a new user and set up that new user properly.

Cloud config examples

repository update and upgrade packages
repo_update directive will update the distribution software packages when it’s set true. repo_upgrade directive will upgrade each package software based on classification. all attribute upgrades all applicable available updates, regardless of their classification. You can check security updates section to understand how repo_upgrade works in details.

repo_update: true
repo_upgrade: all

additional yum repository
yum_repos directive can add and enable an additional yum repository. If you set enabled enable, the added repository will be effective. yum repolist all command shows that the repository has been added successfully.

  # The name of the repository
    enabled: enable
    failovermethod: priority
    gpgcheck: true
    gpgkey: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
    name: Extra Packages for Enterprise Linux 5 - Testing

Install arbitrary packages
As we’ve seen it in the example, a package directive installs required packages in an instance. If required packages are not available on the repositories, you would use yum_repos directive to add required repositories first.


# [<package>, <version>] wherein the specifc
# package version will be installed.
 - httpd
 - mariadb-server
 - [libpython2.7, 2.7.3-0ubuntu3.1]

Run commands on boot
There are 2 directives to run commands on instance’s boot stage. bootcmd directive will run written commands on every boot while runcmd directive runs commands only during the first boot.

One more thing to note is that bootcmd module is run under cloud-init.service while runcmd is set in cloud-config.service. So bootcmd is prioritized over runcmd always.

  - echo >> /etc/hosts

 - systemctl start httpd
 - sudo systemctl enable httpd

writing out arbitrary files
write_files directive writes a file and save it in the system. Here’s the example to overwrite banner configuration with write_files directive. base64 or gzip, or base64+gzip encoding can be given in cloud config.

 - content: |
      version=$(rpm -q --qf '%{version}' system-release)
      cat << EOF
           __     ______       __     ______    
          /\ \   /\  __ \     /\ \   /\  __ \   
         _\_\ \  \ \ \/\ \   _\_\ \  \ \ \/\ \  
        /\_____\  \ \_____\ /\_____\  \ \_____\ 
        \/_____/   \/_____/ \/_____/   \/_____/$version-release-notes/
   owner: root:root
   path: /etc/update-motd.d/30-banner
   permissions: '0766'

configure ssh keys
There are 2 directives to configure ssh keys in an instance. ssh_authorized_keys directive simply adds user’s public key under .ssh/authorized_keys file. ssh_keys directive can set pre-generated ssh private key and set it in an instance.


# add each entry to ~/.ssh/authorized_keys
  - ssh-rsa <ssh-rsa key>
  - ssh-rsa <ssh-rsa key> smoser@brickies

# Send pre-generated SSH private keys to the server
# If these are present, they will be written to /etc/ssh and
# new random keys will not be generated
  rsa_private: |
    -----END RSA PRIVATE KEY-----

  rsa_public: ssh-rsa <public key> smoser@localhost

Disable EC2 metadata
disable_ec2_metadata directive simply disables a reachability to metadata service of EC2 instance. As a result this operation will configure a blockhole route to the metadata service IP address.

disable_ec2_metadata: true

netstat -rn command can show system’s routing table without DNS name resolving.

$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface         UG        0 0          0 eth0 -      !H        - -          - - UH        0 0          0 eth0   U         0 0          0 eth0

To wrap up, cloud-init is widely used for cloud instance initialization as industry standard. It is useful if you need to install and configuring softwares and need some operations when launching an instance.

An alternative way of provisioning instance is to use an automated tool to create a golden machine image such as Packer (Hashicorp). This would be safer to keep the image consistent and maintenable to some degree.

Leave a Reply

Your email address will not be published. Required fields are marked *