In the previous posts I looked at using Amazonica to get details on AWS and then how to put those details into a graph. Why would you want to do this? There is a lot you can learn by having your architecture or even just part of it in a graph, you can ask questions that in many cases would be difficult, impossible or expensive to answer.

In this post we’ll extend the ideas of the previous two by pulling in more details from AWS and use our graph to build a slightly more complete model of our architecture.

When we refer to architecture we often mean one particular aspect of the whole. It depends on the context at hand as to what we mean by the systems architectyre, it may be network, logical, physical, data flow and so on. One of the good things about using a graph is that these different aspects can all live together in the same graph enabling you to move between the different aspects easily and uncover more insight by doing so.

First lets start with a question I might want to ask.

  • given a particular FDQN (fully qualified domain name) what are the components underlying it, and how are they reached?

This is a typical use case when investigating a problem. What do we need to be able to answer this question?

  • some representation of dns entries
  • some representation of IP addresses
  • some representation of Amazon Load Balancers (ELBs)
  • some relationship between dns entries and IP addresses (e.g. A-Records)
  • some relationship between dns entries and other dns entries (e.g. CNAME)
  • some representation of relationships between ELBs and Instances
  • some representation of Instances and components

This translates to a graph model like:

In this context :ALIAS_FOR is an AWS feature from Route53 which enables A-Record aliases to AWS systems and services.

Clone the graphing your architecture repo from github

Ensure that you have neo4j running as per the last post, and that your environment has suitable AWS access and then:

bash$ lein repl

graph-your-arch.core=> (go)

You will then see some information displayed and then it will attempt to open your browser to the neo4j web console so you can explore the graph interactively.

Lets find the instances in a path from a given FQDN using the browser:

MATCH p=(dns:DNS)-[*0..3]-()-[:BALANCES|:POINTS_TO]->(i:Instance)
  WHERE dns.dns = '{your fqdn}' RETURN p;

This should give you the sub-graph showing the path from the FDQN node through to list of instances that are reachable. This query is saying from a starting node which is the node labeled as dns where the dns is {your fqdn} follow up to three relations deep in order to find nodes that balances or point to one or more instances.

There are a lot of questions you can ask of even this simple graph, but I’d encourage you to try it out, click around on the results and actually see what your architecture looks like.

Now this is only the beginning, there is a lot more you can add to this graph using the AWS apis. You can make use of AWS’s ability to add tags to entities to augment AWS information with your own specific domain, and its of course also possible to use other apis to yet further expand the graph to provide much deeper insights.

I’ve used this approach to help understand and describe architectures, as well as to help identify waste in usage of AWS resources, for example how many ELB’s are you running with no balanced instances? With a graph that is a simple query.

My thanks to Mark Needham and Rik Van Bruggen for reviewing an early draft of this post

In the last post we used Amazonica to extra some information from AWS about our instances, and using clojure we can do some cool things with that. However more value can be gained by storing that data in a database, in our case a graph database Neo4j.

We are going to extend what we did last time to this:

First you need a working neo installation, so follow the instructions from the download page.

If you are running neo locally you should verify its working by visiting the web interface.

Now, before we can do anything we’ll need a model for our graph. Now we could just replicate the AWS model, but thats not really the domain of interest, we are thinking about our architecture so lets create our own super simple one for now.

A very basic model which simply matches instances to their type. Let us now take our AWS info that we delved into last time and populate the graph using the NeoCons library.

Note that at the time of writing when using the current version of NeoCons you need to disable server authentication, you will need to update conf/neo4j-server.properties and set:

dbms.security.auth_enabled=false

And then restart the Database.

First lets use that library to create a couple of nodes and relationships

user=> (require '[clojurewerkz.neocons.rest :as nr])
user=> (require '[clojurewerkz.neocons.rest.cypher :as cy])
user=> (def connection (nr/connect "http://localhost:7474/db/data/" "neo4j" "neo4j"))

You can check this works by

user=> (pprint (cy/query connection "MATCH n RETURN count(n)"))

Which should give you:

{:data [[0]], :columns ["count(n)"]}

Now lets take our model and translate it into some Cypher to create our database. Cypher the query language that neo4j uses think of it as SQL for graphs, but better. Checkout the refcard if you’d like to know more about it.

MERGE (instance:Instance {name:$name}
  SET instance.state=$state
  WITH instance
    MERGE (iType:InstanceType {name:$instanceType})
    WITH instance, iType
      MERGE (instance)-[:OF_TYPE]->(iType)

Effectively this Cypher says:

  1. if the instance with this name doesn’t exist then create it, otherwise use it
  2. then set its state property to this value
  3. then if this instance-type doesn’t exist create it otherwise use it
  4. then if it doesn’t already exist create an :OF_TYPE relationship from the instance to the instance type.

Note that you could use CREATE, however then you would end up with loads of duplicates which is not what we want. You will find that MERGE is a very common and useful operation in Cypher.

Continuing our repl from last time, lets define the above cypher (this time we are using named parameters to make it easier):

user=> (def cypher (str "MERGE (instance:Instance {id:{id}})"
                        "  SET instance.state={state}"
                        "  WITH instance"
                        "    MERGE (iType:InstanceType {name:{type}})"
                        "    WITH instance, iType"
                        "      MERGE (instance)-[:OF_TYPE]->(iType)"))

Now lets use that to build a graph:

user=> (doseq [instance instances]
         (let [id (:instance-id instance)
               state (-> instance :state :name)
               type (:instance-type instance)]
           (cy/query connection cypher { :id id
                                             :state state
                                             :type type } )))

Now we can look at this with the web console and run a simple query:

match n return n limit 6;

You will see something like:

Now we should be able to verify that we have written the right number of instance nodes by:

user=> (== (count instances) (-> (cy/query
                                   connection
                                   "MATCH (n:Instance) RETURN count(n)")
                                 :data
                                 first
                                 first))

It should report true if it doesn’t something has gone wrong.

This post has shown how we can take information about AWS from Amazonica and then put that information into a graph using Neocons. However our graph model here is very simple so not very interesting, there is a lot more we need to start describing an architecture.

The next post will take us a step closer to being able to describe an architecture by extending our model.

My thanks to Mark Needham and Rik Van Bruggen for reviewing an early draft of this post

During a previous post I talked about how its possible to programmatically capture an architecture and store it in a graph database.

Why would you want to do that? Well if you’ve ever had to cpature, document or explain an architecture you will know that its a difficult task. It can be tendious, time consuming and ultimately winds up stale and misleading. Anything we can do to automate this process is valuable.

Capturing an architecture in a graph also allows you to explore and ask interesting questions, ones that might not be possible or very tedious to do otherwise.

This post will give an example of how you can obtain information about your AWS infrastructure through Clojure.

Clojure is one of my favourite environments for experimentation and there is a great library called Amazonica for working with AWS.

What we have is an arrangement like this:

There are various representations of an architecture that you may wish to capture, we’ll start simply with the ‘physical’ - in this case regions, availability zones and physical instances.

First create a working directory, navigate to that directory and add the following to a project.clj in that directory

(defproject arch-graph-1 "0.0.1-SNAPSHOT"
 :dependencies [[org.clojure/clojure "1.6.0"]
                 [clojurewerkz/neocons "3.0.0"]
                 [amazonica "0.3.28"]]
  	 :source-paths ["src/clj"]
 :test-paths ["test/clj"])

Give your environment some AWS credentials, in this case we’ll use an access key and secret. You can obtain one from your AWS console; this page provides details on how to create some.

bash$ export AWS_ACCESS_KEY_ID={your access key id here}
bash$ export AWS_SECRET_KEY={your secret key here}

Ensure that you have lein installed, you can find instructions here. lein is a clojure build tool which makes working with clojure easier automating a lot of the more mundane tasks.

Next let’s run up a repl

bash$ lein repl

And you should see it download the dependencies and put you in a repl ready to explore.

user=> (require '[amazonica.aws.ec2 :as ec2])
user=> (def regions (ec2/describe-regions))
user=> (pprint regions)

You should see something like the following:

{:regions
 [{:endpoint "ec2.eu-central-1.amazonaws.com",
   :region-name "eu-central-1"}
  {:endpoint "ec2.sa-east-1.amazonaws.com", :region-name "sa-east-1"}
  {:endpoint "ec2.ap-northeast-1.amazonaws.com",
   :region-name "ap-northeast-1"}
  {:endpoint "ec2.eu-west-1.amazonaws.com", :region-name "eu-west-1"}
  {:endpoint "ec2.us-east-1.amazonaws.com", :region-name "us-east-1"}
  {:endpoint "ec2.us-west-1.amazonaws.com", :region-name "us-west-1"}
  {:endpoint "ec2.us-west-2.amazonaws.com", :region-name "us-west-2"}
  {:endpoint "ec2.ap-southeast-2.amazonaws.com",
   :region-name "ap-southeast-2"}
  {:endpoint "ec2.ap-southeast-1.amazonaws.com",
   :region-name "ap-southeast-1"}]}

This shows nine AWS regions and their associated endpoints (which are needed for other calls to specify which region you mean). These are the AWS regions that are visible to your account, you won’t see regions that you don’t have access to, for example China or AWSgov in the US.

Now you can start to explore the regions, for a start lets look at the availability zones in a region

user=> (pprint
         (ec2/describe-availability-zones { :endpoint "ec2.us-east-1.amazonaws.com" }))

This will give us something like:

{:availability-zones
 [{:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1a",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1b",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1c",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1d",
   :messages []}
  {:state "available",
   :region-name "us-east-1",
   :zone-name "us-east-1e",
   :messages []}]}

Now as an aside note as far as I understand the availability zones you see are not necessarily the availability zones I see, different accounts will have different mappings to the underlying physical data centres, this helps stop the problem of everyone grabbing an instance in availability zone ‘a’ consuming resources and leaving the others under utilised.

Let’s have a look at what instances there are in a given region.

WARNING: this output could be very large as it will list a lot of detail about every instance in the region.

user=> (pprint (ec2/describe-instances {:endpoint "ec2.us-east-1.amazonaws.com"}))

Some details below have been redacted or removed, but it gives you an idea of the information available, instances are grouped into their reservations:

{:reservations
 [{:requester-id "xxxxxxxxxx",
   :reservation-id "r-xxxxxxx",
   :group-names [],
   :groups [],
   :instances
   [{:monitoring {:state "disabled"},
     :tags
     [{:value "foo",:key "Component"}
      {:value "live", :key "Environment"}],
     :root-device-type "ebs",
     :private-dns-name "ip-xx-xx-xx-xx.ec2.internal",
     :hypervisor "xen",
     :subnet-id "subnet-xxxxx",
     :key-name "xxxxxxxxx",
     :architecture "x86_64",
     :security-groups
     [{:group-name "xxxxxx", :group-id "xxxxxxx"}],
     :virtualization-type "hvm",
     :instance-type "c3.large",
     :image-id "ami-xxxxxxx",
 	     :state {:name "running", :code 16},
 	     :state-transition-reason "",
 	     :network-interfaces
 	     [{:description "",
   	     :private-dns-name "ip-x-x-x-x.ec2.internal",
   	     :subnet-id "subnet-xxxxxx",
 	     :vpc-id "vpc-xxxxx",
 	     :instance-id "i-xxxxxxx",
 	     :iam-instance-profile
 	     {:id "XXXXXXXXXXX",
  	     :arn
  	     "arn:aws:iam::xxxxxxx:instance-profile/xxxxxxxxx"},
 	     :public-dns-name "",
 	     :private-ip-address "x.x.x.x",
 	     :placement
 	     {:tenancy "default",
  	     :group-name "",
  	     :availability-zone "us-east-1a"},
 	     :launch-time #<DateTime 2015-06-26T12:54:37.000-04:00>,
 	     :block-device-mappings
 	     [{:ebs
   	     {:attach-time #<DateTime 2015-06-26T12:54:40.000-04:00>,
    	:delete-on-termination true,
    	:status "attached",
    	:volume-id "vol-xxxxxxx"},
   	     :device-name "/dev/sda1"}]}],
   	     :owner-id "440474553311"}
   	...
   	]

Now using what we have learnt above lets put together a collection of instances:

user=> (def endpoints (for [region (:regions (ec2/describe-regions))] (:endpoint region)))
user=> (def reservations (flatten (for [endpoint endpoints] (:reservations (ec2/describe-instances {:endpoint endpoint})))))
user=> (def instances (flatten (for [reservation reservations] (:instances reservation))))

Now you are able to extract some information about all your instances.

Want the number of instances?

user=> (count instances)

Want the types of instances that you are using?

user=> (pprint (distinct (for [instance instances] (:instance-type instance))))

Now amazonica gives you access to the full AWS api, however its documentation is relatively poor and if you are like me you will spend significant time looking at AWS CLI and AWS Javadocs working out what to try in order to get the results you are after - the repl however makes this bareable.

This post gave a very brief into using amazonica to query AWS, in the next post we’ll look at putting this and other data in a more useful form, a graph.

My thanks to Mark Needham and Rik Van Bruggen for reviewing an early draft of this post

In the past I’ve worked on several projects to capture the ‘As-Is’ architecture of a system. Its generally pretty tedious work, and involves use of either specialist difficult to use tools or handcrafted diagrams and spreadsheets.

One of the most depressing things is the moment you finish (and typically well before) it’s out of date. During the time its taken to do the exercise something in the architecture has changed. Even if it hasn’t you know you’ve created a fragile thing that will break very easily and will likely be ignored by the people who need it most as they’ll treat it as most likely incorrect or out of date, because they know from their own experience these things always are.

Today we work in the cloud, we work with immutable servers, we actively seek out and destroy snowflakes ensuring everything is automated and reproducible. Its time to do that with our architectural documentation and to stop treating them as static documents.

One of the great things about moving into the cloud or any large scale virtualised environment is it forced us to think about things like discovery and management. We cannot rely on things being there all of the time or being in the exact same place.

This means in some of these environments we can actually capture the architecture of our system by programmatic means. Now this is only part of the problem, you can find it but how do you then track it?

You could throw it into a database which is perfectly valid, I chose to put it into a graph database, after all an architecture is really just a graph.

From that graph its possible to answer a lot of questions, from how many components do we have, to how many instances do we have running, to what component does this dns entry relate to.

You can also use this to help shape your architecture and control costs. How many loadbalancers are running that have no connected instances?

I will put together some blog posts on how you can do this yourself if you are using AWS.

Recently I tweeted that “Interesting things happen as deployment costs approach zero”. From that I got a few questions about what I meant and why that may be.

Firstly a distinction, I said deployment and not release. They are different things. A release requires a deployment in order to deliver a feature or bundle of features that are needed in a particular environment. But a deployment may infact be the existing artifacts or a set of previous artifacts (for example a roll-back).

What do I mean by the cost of a deployment? I am not talking so much in monetary terms, but more in the energy expended by the members of your team. How mentally challenging is a deployment, does it:

  • envoke a sense of fear?
  • require a lot of box ticking?
  • result in a scramble to get signatures and approvals?
  • need hours of planning and scheduling?
  • require a lot of typing?

A deployment should not result in any of those things. Deployments often do however and it typically boils down to fear. Fear of making mistakes because it is hard.

The problem with fear of deployment is that it becomes self reinforcing, because we are afraid, because it is hard, we make mistakes which in turn makes us more afraid, which makes us hesitant to do it again and makes it likely for us to add even more sign-offs to make it more difficult to make mistakes next time.

Who has worked anywhere where a release has gone wrong and all of a sudden all releases need to be approved by the CTO/VP/Director or even CEO? Most of us I am betting. Now what happens when that doesn’t actually happen, but rather the process of deployments is made, simple, safe and transparent? This is what I mean by cost approaching zero.

We have observed the following:

  • More deployments happen, as there is little penalty incurred.
  • The difference between active developer code base and the code base actually running in production drops.
  • Time is saved as engineers spend less time considering and debating if a deployment is worth doing. (Or complaining about the deployment process).
  • The decision point as to whether as a deployment can be done or not moves from Senior Management down to the group, then down to the team lead and ultimately down to the engineers working on the changes themselves. (This does enormous things for team morale and feelings of empowerment).

You find that as the confidence increases that all of a sudden a deployment for a one line change (and a proper one, not a hack) becomes possible and actually begins to happen. This opens the door for faster iterations, it enables more experiments and for the developers themselves to begin driving incremental improvements such as taking on refactorings that otherwise might not happen.

All of these things add up to a better product, a less fragile product and happier teams.