During a previous post I talked about how its possible to programmatically capture an architecture and store it in a graph database.

Why would you want to do that? Well if you’ve ever had to cpature, document or explain an architecture you will know that its a difficult task. It can be tendious, time consuming and ultimately winds up stale and misleading. Anything we can do to automate this process is valuable.

Capturing an architecture in a graph also allows you to explore and ask interesting questions, ones that might not be possible or very tedious to do otherwise.

This post will give an example of how you can obtain information about your AWS infrastructure through Clojure.

Clojure is one of my favourite environments for experimentation and there is a great library called Amazonica for working with AWS.

What we have is an arrangement like this:

There are various representations of an architecture that you may wish to capture, we’ll start simply with the ‘physical’ - in this case regions, availability zones and physical instances.

First create a working directory, navigate to that directory and add the following to a project.clj in that directory

(defproject arch-graph-1 "0.0.1-SNAPSHOT"
 :dependencies [[org.clojure/clojure "1.6.0"]
                 [clojurewerkz/neocons "3.0.0"]
                 [amazonica "0.3.28"]]
  	 :source-paths ["src/clj"]
 :test-paths ["test/clj"])

Give your environment some AWS credentials, in this case we’ll use an access key and secret. You can obtain one from your AWS console; this page provides details on how to create some.

bash$ export AWS_ACCESS_KEY_ID={your access key id here}
bash$ export AWS_SECRET_KEY={your secret key here}

Ensure that you have lein installed, you can find instructions here. lein is a clojure build tool which makes working with clojure easier automating a lot of the more mundane tasks.

Next let’s run up a repl

bash$ lein repl

And you should see it download the dependencies and put you in a repl ready to explore.

user=> (require '[amazonica.aws.ec2 :as ec2])
user=> (def regions (ec2/describe-regions))
user=> (pprint regions)

You should see something like the following:

{:regions
 [{:endpoint "ec2.eu-central-1.amazonaws.com",
   :region-name "eu-central-1"}
  {:endpoint "ec2.sa-east-1.amazonaws.com", :region-name "sa-east-1"}
  {:endpoint "ec2.ap-northeast-1.amazonaws.com",
   :region-name "ap-northeast-1"}
  {:endpoint "ec2.eu-west-1.amazonaws.com", :region-name "eu-west-1"}
  {:endpoint "ec2.us-east-1.amazonaws.com", :region-name "us-east-1"}
  {:endpoint "ec2.us-west-1.amazonaws.com", :region-name "us-west-1"}
  {:endpoint "ec2.us-west-2.amazonaws.com", :region-name "us-west-2"}
  {:endpoint "ec2.ap-southeast-2.amazonaws.com",
   :region-name "ap-southeast-2"}
  {:endpoint "ec2.ap-southeast-1.amazonaws.com",
   :region-name "ap-southeast-1"}]}

This shows nine AWS regions and their associated endpoints (which are needed for other calls to specify which region you mean). These are the AWS regions that are visible to your account, you won’t see regions that you don’t have access to, for example China or AWSgov in the US.

Now you can start to explore the regions, for a start lets look at the availability zones in a region

user=> (pprint
         (ec2/describe-availability-zones { :endpoint "ec2.us-east-1.amazonaws.com" }))

This will give us something like:

{:availability-zones
 [{:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1a",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1b",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1c",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1d",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1e",
   :messages []}]}

Now as an aside note as far as I understand the availability zones you see are not necessarily the availability zones I see, different accounts will have different mappings to the underlying physical data centres, this helps stop the problem of everyone grabbing an instance in availability zone ‘a’ consuming resources and leaving the others under utilised.

Let’s have a look at what instances there are in a given region.

WARNING: this output could be very large as it will list a lot of detail about every instance in the region.

user=> (pprint (ec2/describe-instances {:endpoint "ec2.us-east-1.amazonaws.com"}))

Some details below have been redacted or removed, but it gives you an idea of the information available, instances are grouped into their reservations:

{:reservations
 [{:requester-id "xxxxxxxxxx",
   :reservation-id "r-xxxxxxx",
   :group-names [],
   :groups [],
   :instances
   [{:monitoring {:state "disabled"},
     :tags
     [{:value "foo",:key "Component"}
      {:value "live", :key "Environment"}],
     :root-device-type "ebs",
     :private-dns-name "ip-xx-xx-xx-xx.ec2.internal",
     :hypervisor "xen",
     :subnet-id "subnet-xxxxx",
     :key-name "xxxxxxxxx",
     :architecture "x86_64",
     :security-groups
     [{:group-name "xxxxxx", :group-id "xxxxxxx"}],
     :virtualization-type "hvm",
     :instance-type "c3.large",
     :image-id "ami-xxxxxxx",
 	     :state {:name "running", :code 16},
 	     :state-transition-reason "",
 	     :network-interfaces
 	     [{:description "",
   	     :private-dns-name "ip-x-x-x-x.ec2.internal",
   	     :subnet-id "subnet-xxxxxx",
 	     :vpc-id "vpc-xxxxx",
 	     :instance-id "i-xxxxxxx",
 	     :iam-instance-profile
 	     {:id "XXXXXXXXXXX",
  	     :arn
  	     "arn:aws:iam::xxxxxxx:instance-profile/xxxxxxxxx"},
 	     :public-dns-name "",
 	     :private-ip-address "x.x.x.x",
 	     :placement
 	     {:tenancy "default",
  	     :group-name "",
  	     :availability-zone "us-east-1a"},
 	     :launch-time #<DateTime 2015-06-26T12:54:37.000-04:00>,
 	     :block-device-mappings
 	     [{:ebs
   	     {:attach-time #<DateTime 2015-06-26T12:54:40.000-04:00>,
    	:delete-on-termination true,
    	:status "attached",
    	:volume-id "vol-xxxxxxx"},
   	     :device-name "/dev/sda1"}]}],
   	     :owner-id "440474553311"}
   	...
   	]

Now using what we have learnt above lets put together a collection of instances:

user=> (def endpoints (for [region (:regions (ec2/describe-regions))] (:endpoint region)))
user=> (def reservations (flatten (for [endpoint endpoints] (:reservations (ec2/describe-instances {:endpoint endpoint})))))
user=> (def instances (flatten (for [reservation reservations] (:instances reservation))))

Now you are able to extract some information about all your instances.

Want the number of instances?

user=> (count instances)

Want the types of instances that you are using?

user=> (pprint (distinct (for [instance instances] (:instance-type instance))))

Now amazonica gives you access to the full AWS api, however its documentation is relatively poor and if you are like me you will spend significant time looking at AWS CLI and AWS Javadocs working out what to try in order to get the results you are after - the repl however makes this bareable.

This post gave a very brief into using amazonica to query AWS, in the next post we’ll look at putting this and other data in a more useful form, a graph.

My thanks to Mark Needham and Rik Van Bruggen for reviewing an early draft of this post

In the past I’ve worked on several projects to capture the ‘As-Is’ architecture of a system. Its generally pretty tedious work, and involves use of either specialist difficult to use tools or handcrafted diagrams and spreadsheets.

One of the most depressing things is the moment you finish (and typically well before) it’s out of date. During the time its taken to do the exercise something in the architecture has changed. Even if it hasn’t you know you’ve created a fragile thing that will break very easily and will likely be ignored by the people who need it most as they’ll treat it as most likely incorrect or out of date, because they know from their own experience these things always are.

Today we work in the cloud, we work with immutable servers, we actively seek out and destroy snowflakes ensuring everything is automated and reproducible. Its time to do that with our architectural documentation and to stop treating them as static documents.

One of the great things about moving into the cloud or any large scale virtualised environment is it forced us to think about things like discovery and management. We cannot rely on things being there all of the time or being in the exact same place.

This means in some of these environments we can actually capture the architecture of our system by programmatic means. Now this is only part of the problem, you can find it but how do you then track it?

You could throw it into a database which is perfectly valid, I chose to put it into a graph database, after all an architecture is really just a graph.

From that graph its possible to answer a lot of questions, from how many components do we have, to how many instances do we have running, to what component does this dns entry relate to.

You can also use this to help shape your architecture and control costs. How many loadbalancers are running that have no connected instances?

I will put together some blog posts on how you can do this yourself if you are using AWS.

Recently I tweeted that “Interesting things happen as deployment costs approach zero”. From that I got a few questions about what I meant and why that may be.

Firstly a distinction, I said deployment and not release. They are different things. A release requires a deployment in order to deliver a feature or bundle of features that are needed in a particular environment. But a deployment may infact be the existing artifacts or a set of previous artifacts (for example a roll-back).

What do I mean by the cost of a deployment? I am not talking so much in monetary terms, but more in the energy expended by the members of your team. How mentally challenging is a deployment, does it:

  • envoke a sense of fear?
  • require a lot of box ticking?
  • result in a scramble to get signatures and approvals?
  • need hours of planning and scheduling?
  • require a lot of typing?

A deployment should not result in any of those things. Deployments often do however and it typically boils down to fear. Fear of making mistakes because it is hard.

The problem with fear of deployment is that it becomes self reinforcing, because we are afraid, because it is hard, we make mistakes which in turn makes us more afraid, which makes us hesitant to do it again and makes it likely for us to add even more sign-offs to make it more difficult to make mistakes next time.

Who has worked anywhere where a release has gone wrong and all of a sudden all releases need to be approved by the CTO/VP/Director or even CEO? Most of us I am betting. Now what happens when that doesn’t actually happen, but rather the process of deployments is made, simple, safe and transparent? This is what I mean by cost approaching zero.

We have observed the following:

  • More deployments happen, as there is little penalty incurred.
  • The difference between active developer code base and the code base actually running in production drops.
  • Time is saved as engineers spend less time considering and debating if a deployment is worth doing. (Or complaining about the deployment process).
  • The decision point as to whether as a deployment can be done or not moves from Senior Management down to the group, then down to the team lead and ultimately down to the engineers working on the changes themselves. (This does enormous things for team morale and feelings of empowerment).

You find that as the confidence increases that all of a sudden a deployment for a one line change (and a proper one, not a hack) becomes possible and actually begins to happen. This opens the door for faster iterations, it enables more experiments and for the developers themselves to begin driving incremental improvements such as taking on refactorings that otherwise might not happen.

All of these things add up to a better product, a less fragile product and happier teams.

Well technically it has already started, but the conference proper kicks off tomorrow and there are some awesome sessions lined up.

My talk is coming along, a challenge, but a fun one.

Personally I am looking forward to both talks by Michael T Nygard, however choosing other talks will be a difficult with so many interesting parallel tracks.

See you there.

I will be speaking at QCon London again this year, this time on the London Startup track on the topic of Lean.

It will be a big challenge to do justice to both the subject and what I’ve learnt over the last few years but I am looking forward to it.