Introduction
In the first part of this series, we setup our workstation so that it could communicate with the Amazon Web Services APIs, and setup our AWS account so that it was ready to provision EC2 compute infrastructure. In this section, we’ll start building our Ansible playbook that will provision immutable infrastructure.
Ansible Dynamic Inventory
When working with cloud resources, Ansible has the capability of using a dynamic inventory system to find and configure all of your instances within AWS, or any other cloud provider. In order for this to work properly, we need to setup the EC2 external inventory script in our playbook.
- First, create the playbook folder (I named mine ~/immutable) and the inventory folder within it:
mkdir -p ~/immutable/inventory;cd ~/immutable/inventory
- Next, download the EC2 external inventory script from Ansible:
wget https://raw.github.com/ansible/ansible/devel/contrib/inventory/ec2.py
- Make the script executable by typing:
chmod +x ec2.py
- Configure the EC2 external inventory script by creating a new file called ec2.ini in the inventory folder alongside the ec2.py script. If you specify the region you are working with, you will significantly decrease the execution time because the script will not need to scan every EC2 region for instances. Here is a copy of my ec2.ini file, configured to use the us-east-1 region:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137# Ansible EC2 external inventory script settings#[ec2]# to talk to a private eucalyptus instance uncomment these lines# and edit edit eucalyptus_host to be the host name of your cloud controller#eucalyptus = True#eucalyptus_host = clc.cloud.domain.org# AWS regions to make calls to. Set this to 'all' to make request to all regions# in AWS and merge the results together. Alternatively, set this to a comma# separated list of regions. E.g. 'us-east-1,us-west-1,us-west-2'regions = us-east-1regions_exclude =# When generating inventory, Ansible needs to know how to address a server.# Each EC2 instance has a lot of variables associated with it. Here is the list:# http://docs.pythonboto.org/en/latest/ref/ec2.html#module-boto.ec2.instance# Below are 2 variables that are used as the address of a server:# - destination_variable# - vpc_destination_variable# This is the normal destination variable to use. If you are running Ansible# from outside EC2, then 'public_dns_name' makes the most sense. If you are# running Ansible from within EC2, then perhaps you want to use the internal# address, and should set this to 'private_dns_name'. The key of an EC2 tag# may optionally be used; however the boto instance variables hold precedence# in the event of a collision.destination_variable = public_dns_name#destination_variable = private_dns_name# For server inside a VPC, using DNS names may not make sense. When an instance# has 'subnet_id' set, this variable is used. If the subnet is public, setting# this to 'ip_address' will return the public IP address. For instances in a# private subnet, this should be set to 'private_ip_address', and Ansible must# be run from within EC2. The key of an EC2 tag may optionally be used; however# the boto instance variables hold precedence in the event of a collision.# WARNING: - instances that are in the private vpc, _without_ public ip address# will not be listed in the inventory until You set:#vpc_destination_variable = private_ip_addressvpc_destination_variable = ip_address# To tag instances on EC2 with the resource records that point to them from# Route53, uncomment and set 'route53' to True.route53 = False# To exclude RDS instances from the inventory, uncomment and set to False.#rds = False# To exclude ElastiCache instances from the inventory, uncomment and set to False.#elasticache = False# Additionally, you can specify the list of zones to exclude looking up in# 'route53_excluded_zones' as a comma-separated list.# route53_excluded_zones = samplezone1.com, samplezone2.com# By default, only EC2 instances in the 'running' state are returned. Set# 'all_instances' to True to return all instances regardless of state.all_instances = False# By default, only RDS instances in the 'available' state are returned. Set# 'all_rds_instances' to True return all RDS instances regardless of state.all_rds_instances = False# By default, only ElastiCache clusters and nodes in the 'available' state# are returned. Set 'all_elasticache_clusters' and/or 'all_elastic_nodes'# to True return all ElastiCache clusters and nodes, regardless of state.## Note that all_elasticache_nodes only applies to listed clusters. That means# if you set all_elastic_clusters to false, no node will be return from# unavailable clusters, regardless of the state and to what you set for# all_elasticache_nodes.all_elasticache_replication_groups = Falseall_elasticache_clusters = Falseall_elasticache_nodes = False# API calls to EC2 are slow. For this reason, we cache the results of an API# call. Set this to the path you want cache files to be written to. Two files# will be written to this directory:# - ansible-ec2.cache# - ansible-ec2.indexcache_path = ~/.ansible/tmp# The number of seconds a cache file is considered valid. After this many# seconds, a new API call will be made, and the cache file will be updated.# To disable the cache, set this value to 0cache_max_age = 0# Organize groups into a nested/hierarchy instead of a flat namespace.nested_groups = False# The EC2 inventory output can become very large. To manage its size,# configure which groups should be created.group_by_instance_id = Truegroup_by_region = Truegroup_by_availability_zone = Truegroup_by_ami_id = Truegroup_by_instance_type = Truegroup_by_key_pair = Truegroup_by_vpc_id = Truegroup_by_security_group = Truegroup_by_tag_keys = Truegroup_by_tag_none = Truegroup_by_route53_names = Truegroup_by_rds_engine = Truegroup_by_rds_parameter_group = Truegroup_by_elasticache_engine = Truegroup_by_elasticache_cluster = Truegroup_by_elasticache_parameter_group = Truegroup_by_elasticache_replication_group = True# If you only want to include hosts that match a certain regular expression# pattern_include = staging-*# If you want to exclude any hosts that match a certain regular expression# pattern_exclude = staging-*# Instance filters can be used to control which instances are retrieved for# inventory. For the full list of possible filters, please read the EC2 API# docs: http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiReference-query-DescribeInstances.html#query-DescribeInstances-filters# Filters are key/value pairs separated by '=', to list multiple filters use# a list separated by commas. See examples below.# Retrieve only instances with (key=value) env=staging tag# instance_filters = tag:env=staging# Retrieve only instances with role=webservers OR role=dbservers tag# instance_filters = tag:role=webservers,tag:role=dbservers# Retrieve only t1.micro instances OR instances with tag env=staging# instance_filters = instance-type=t1.micro,tag:env=staging# You can use wildcards in filter values also. Below will list instances which# tag Name value matches webservers1*# (ex. webservers15, webservers1a, webservers123 etc)# instance_filters = tag:Name=webservers1* - Next, create the default Ansible configuration for this playbook by editing a file in the root of the playbook directory (in our example, ~/immutable), named ansible.cfg. This file should have the following text in it:
123456[defaults]remote_user = ubuntuhost_key_checking = Falseinventory = ./inventory/ec2.py[ssh_connection]control_path = %(directory)s/%%C
To test and ensure that this script is working properly (and that your boto credentials are setup properly), execute the following command:
./ec2.py --list
You should see the following output:
Ansible Roles
Roles are a way to automatically load certain variables and tasks into an Ansible playbook, and allow you to reuse your tasks in a modular way. We will heavily (ab)use roles to make the tasks in our playbook reusable, since many of our infrastructure provisioning operations will use the same tasks repeatedly.
Group Variables
The following group variables will apply to any tasks configured in our playbook, unless we override them at the task level. This will allow us to specify a set of sensible defaults that will work for most provisioning use cases, but still have flexibility to change them when we need it. Start by creating a folder for your playbook (I called it immutable, but you can call it whatever you like), then create a group_vars folder underneath it:
cd ~immutable; mkdir group_vars
Now, we can edit the file all.yml in that folder we just created, group_vars, to contain the following text. Please note that indentation is important in YAML syntax:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
--- # group_vars/all.yml region: us-east-1 zone: us-east-1a # zone that the master AMI will be configured in keypair: immutable security_groups: ['default'] instance_type: t2.micro # specify group_name on the command line with -e group_name=devX group_name: test instances_min: 1 # minimum number of instances in the auto scaling group instances_max: 3 # maximum number of instances in the auto scaling group iam_profile: "noaccess" volumes: - device_name: /dev/sda1 device_type: gp2 volume_size: 8 # size of the root disk delete_on_termination: true |
Now that our group_vars are setup, we can move onto creating our first role.
The Launch Role
The launch role performs an important first step – it first searches for the latest Ubuntu 14.04 LTS (long term support) AMI (Amazon Machine Image) that is published by Canonical, the creators of Ubuntu, then launches a new EC2 compute instance, in the region and availability zone specified in our group_vars file. Note that the launch role also launches a very small compute instance (t2.micro) because this instance will only live for a short time while it is configured by subsequent tasks, then baked into a golden master AMI snapshot that lives in S3 object storage.
A quick note about Availability Zones: if you comment out the zone variable in our group_vars file, your instances will be launched in a random zone within the region specified. This can be useful if you want to ensure that an outage in a single AZ doesn’t take down every instance in your auto-scaling group, but there is a trade-off as data transfer between zones incurs a charge, so if your database, for example, is in another zone, you’ll pay a small network bandwidth fee to access it.
Create a new folder under your playbook directory called roles, and create a launch folder within it, then create a tasks folder under that, then edit a file called main.yml in this tasks folder:
mkdir -p roles/launch/tasks
Now, put the following contents in the main.yml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
--- # roles/launch/tasks/main.yml - name: Search for the latest Ubuntu 14.04 AMI ec2_ami_find: region: "{{ region }}" name: "ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-*" owner: 099720109477 sort: name sort_order: descending sort_end: 1 no_result_action: fail register: ami_result - name: Launch new instance ec2: region: "{{ region }}" keypair: "{{ keypair }}" zone: "{{ zone }}" group: "{{ security_groups }}" image: "{{ ami_result.results[0].ami_id }}" # No point in creating an expensive instance just during the AMI configuration phase - let the ASG do that # instance_type: "{{ instance_type }}" instance_type: "t2.micro" instance_tags: Name: "{{ name }}" volumes: "{{ volumes }}" wait: yes register: ec2 - name: Add new instances to host group add_host: name: "{{ item.public_dns_name }}" groups: "{{ name }}" ec2_id: "{{ item.id }}" with_items: ec2.instances - name: Wait for instance to boot wait_for: host: "{{ item.public_dns_name }}" port: 22 delay: 30 timeout: 300 state: started with_items: ec2.instances |
You’ll notice that this launch role also waits for the instance to boot by waiting for port 22 (ssh) to be available on the host. This is useful because subsequent tasks will use an ssh connection to configure the system, so we want to ensure the system is completely booted before we proceed.
The Workstation Role
Now that we have a role that can launch a brand new t2.micro instance, our next role will allow us to configure this instance to be used as a workstation. This workstation configuration will be fairly simplistic, however, you can easily customize it as much as you want later. This is mainly just to illustrate how you would configure the golden image.
There are two directories we need to create for this role, the tasks directory, as well as the files directory, as there is an init script we want to populate onto the workstation that will create a swapfile on first boot:
mkdir -p roles/workstation/tasks;mkdir roles/workstation/files
Next, we’ll create the task:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
--- # roles/workstation/tasks/main.yml - name: add timezone configuration command: bash -c 'echo US/Eastern > /etc/timezone' - name: add kernel parameters to /etc/sysctl.d/60-custom.conf blockinfile: | dest=/etc/sysctl.d/60-custom.conf create=yes content="# Auto-reboot linux 10 seconds after a kernel panic kernel.panic = 10 kernel.panic_on_oops = 10 kernel.unknown_nmi_panic = 10 kernel.panic_on_unrecovered_nmi = 10 kernel.panic_on_io_nmi = 10 # Controls whether core dumps will append the PID to the core filename, useful for debugging multi-threaded applications kernel.core_uses_pid = 1 # Turn on address space randomization - security is super important kernel.randomize_va_space = 2 vm.swappiness = 0 vm.dirty_ratio = 80 vm.dirty_background_ratio = 5 vm.dirty_expire_centisecs = 12000 vm.overcommit_memory = 1 # ------ VM ------ fs.file-max = 204708 #fs.epoll.max_user_instances = 4096 fs.suid_dumpable = 0 # ------ NETWORK SECURITY ------ # Turn on protection for bad icmp error messages net.ipv4.icmp_ignore_bogus_error_responses = 1 # Turn on syncookies for SYN flood attack protection net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 8096 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2 # Log suspicious packets, such as spoofed, source-routed, and redirect net.ipv4.conf.all.log_martians = 1 net.ipv4.conf.default.log_martians = 1 # Disables these ipv4 features, not very legitimate uses net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.accept_source_route = 0 # ------ NETWORK PERFORMANCE ------ # Netflix 2014 recommendations net.core.netdev_max_backlog = 5000 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 net.ipv4.tcp_rmem = 4096 12582912 16777216 # Allow reusing sockets in TIME_WAIT state for new connections net.ipv4.tcp_tw_reuse = 1 # Socket max connections waiting to get accepted; the listen() backlog. # Default is 128. net.core.somaxconn = 4096 # Decrease fin timeout. After telling the client we are closing, how long to wait for a FIN, ACK? # Default is 60. net.ipv4.tcp_fin_timeout = 10 # Avoid falling back to slow start after a connection goes idle # keeps our cwnd large with the keep alive connections net.ipv4.tcp_slow_start_after_idle = 0" - name: reload sysctl kernel parameter settings command: bash -c 'sysctl -p /etc/sysctl.d/60-custom.conf' - name: copy swapfile init script copy: src=aws-swap-init dest=/etc/init.d/aws-swap-init mode=0755 - name: register swapfile init script command: bash -c 'update-rc.d aws-swap-init defaults' - name: add entries to sudoers lineinfile: | dest=/etc/sudoers.d/90-cloud-init-users line="luke ALL=(ALL) NOPASSWD:ALL" - name: add luke user account user: name=luke uid=1001 comment="Luke Youngblood" createhome=yes shell=/bin/bash - name: accept Oracle license part 1 command: bash -c 'echo debconf shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections' - name: accept Oracle license part 2 command: bash -c 'echo debconf shared/accepted-oracle-license-v1-1 seen true | /usr/bin/debconf-set-selections' - name: Update apt cache apt: update_cache=yes - name: Add apt key for ubuntu apt_key: keyserver=keyserver.ubuntu.com id=E56151BF - name: Determine Linux distribution distributor shell: lsb_release -is | tr ''[:upper:]'' ''[:lower:]'' register: release_distributor - name: Determine Linux distribution codename command: lsb_release -cs register: release_codename - name: Add Mesosphere repository to sources list lineinfile: > dest=/etc/apt/sources.list.d/mesosphere.list line="deb http://repos.mesosphere.io/{{ release_distributor.stdout }} {{ release_codename.stdout }} main" mode=0644 create=yes - name: Add java apt repository apt_repository: repo='ppa:webupd8team/java' - name: Add Google Chrome key shell: wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - - name: Add Google Chrome repo copy: > content="deb http://dl.google.com/linux/chrome/deb/ stable main" dest="/etc/apt/sources.list.d/google-chrome.list" owner=root group=root mode=0644 - name: Update apt cache apt: update_cache=yes - name: Install packages apt: pkg={{ item }} state=installed with_items: - nfs-common - oracle-java8-installer - liblapack-dev - libblas-dev - gfortran - ntp - vnc4server - xfce4 - xfce4-goodies - emacs - firefox - git - maven - npm - python-pip - clusterssh - graphviz - google-chrome-stable - python-numpy - python-scipy - python-dev - python-nose - g++ - libopenblas-dev - name: Install boto and run setup command: bash -c 'git clone git://github.com/boto/boto.git;cd /home/ubuntu/boto;python setup.py install' args: chdir: /home/ubuntu - name: Install bower command: bash -c 'npm install -g bower' - name: Install Theano command: bash -c 'pip install Theano' - name: Update all packages to the latest version apt: upgrade=dist - name: Create nodejs symlink file: src=/usr/bin/nodejs dest=/usr/bin/node state=link - name: execute dpkg-reconfigure command: "dpkg-reconfigure -f noninteractive tzdata" |
Initializing a swap file automatically
When you provision the Ubuntu 14.04 LTS instance, it won’t have a swapfile by default. This is a bit risky because if you run out of memory your system could become unstable. This file should be placed in roles/workstation/files/aws-swap-init, where the task above will copy it to your workstation during the configuration process, so that a swap file will be created when the system is booted for the first time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
#!/bin/sh # # aws-swap-init # # chkconfig: 2345 99 10 # description: Check to see if epemeral disk is mounted. If it is, and swap doesn't exist, \ # create swap equivalent to memory on the ephemeral drive. If it isn't, and \ # exist, create 4096MB of swap on /. # processname: aws-swap-init # ### BEGIN INIT INFO # Provides: aws-swap-init # Required-Start: $remote_fs $syslog # Required-Stop: $remote_fs $syslog # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Ensure swap is created and enabled # Description: Check to see if epemeral disk is mounted. If it is, and swap doesn't exist, # create swap equivalent to memory on the ephemeral drive. If it isn't, and # exist, create 512MB of swap on /. ### END INIT INFO # Copyright 2012 Corsis # http://www.corsis.com/ # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. DEFAULTSIZE=4194304 start () { # Check to see if there is swap mounted right now # If there is, we're done here ISSWAP=`/bin/cat /proc/meminfo | /bin/grep SwapTotal | /usr/bin/awk '{print $2}'` if [ $ISSWAP -ne 0 ] then exit 0 fi # What OS are we running? if [ -f /etc/system-release -o -f /etc/redhat-release ] then OSTYPE="amzn" elif [ -f /etc/lsb-release ] then OSTYPE="ubuntu" fi # Set the target directory. If unsupported platform, use root case "$OSTYPE" in amzn) TARGET="/media/ephemeral0" ;; ubuntu) TARGET="/mnt" ;; *) TARGET="/" ;; esac # Does a swapfile already exist? If so, activate and be done if [ -f $TARGET/swapfile00 ] then /sbin/swapon $TARGET/swapfile00 exit 0 fi # OK, so there's no existing swapfile. Let's make one and activate it. # If we're on an unsupported OS, or ephemeral disk isn't mounted, use a safe # default size. Otherwise, use RAM size if [ $TARGET = "/" ] then SIZE=$DEFAULTSIZE else /bin/mount | grep -q " on $TARGET type " if [ $? -eq 0 ] then SIZE=`/bin/cat /proc/meminfo | /bin/grep "^MemTotal" | /usr/bin/awk '{print $2}'` else SIZE=$DEFAULTSIZE TARGET="/" fi fi # OK, time to get down to business. /bin/dd if=/dev/zero of=$TARGET/swapfile00 bs=1024 count=$SIZE /sbin/mkswap $TARGET/swapfile00 /sbin/swapon $TARGET/swapfile00 } stop () { exit 0 } case "$1" in start) $1 ;; stop) $1 ;; *) echo $"Usage: $0 {start|stop}" exit 2 esac exit $? |
The DeployWorkstation Play
Now, we’ll create a play that calls these tasks in the right order to provision our workstation and configure it. This file will be created in the root of your playbook, and I named it deployworkstation.yml.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
--- # deployworkstation.yml - hosts: localhost connection: local gather_facts: no roles: - role: launch name: "{{ group_name }}amibuild" - hosts: "{{ group_name }}amibuild" become: True become_user: root become_method: sudo roles: - yaegashi.blockinfile - workstation |
Testing our Playbook
To test our work so far, we simply need to execute it with Ansible and see if we are successful:
ansible-playbook -vv -e group_name=test deployworkstation.yml
You should see some output at the end of the playbook run like this:
Next, connect to your instance with ssh by typing the following command:
ssh ubuntu@ec2-52-90-220-60.compute-1.amazonaws.com
(hint: copy/paste the hostname above)
You should see something like the following after you connect with ssh:
That’s it! You’ve now created a workstation in the Amazon public cloud. Be sure to terminate the instance you’ve created so that you don’t incur any unexpected fees. You can do this by navigating to the EC2 (top left) dashboard from the AWS console, then selecting any running instances and choosing to terminate them:
After selecting to Terminate them from the Instance State menu, you’ll need to confirm it:
Now that you’ve terminated any running instances, in the next part, we’ll learn how to create snapshots, launch configurations, and auto-scaling groups from our immutable golden master images.
1 Response