As a DevOps consultancy, we spend a lot of time thinking about, and evaluating DevOps tools.
There are a number of different tools that form part of our DevOps workbench, and we base our evaluation on years of experience in IT, working with complex, heterogeneous technology stacks.
We’ve found that DevOps tooling has become a key part of our tech and operations. We take a lot of time to select and improve our DevOps toolset. The vast majority of tools that we use are open source. By sharing the tools that we use and like, we hope to start a discussion within the DevOps community about what further improvements can be made.
We hope that you enjoy browsing through the list below.
You may already be well acquainted with some of the tools, and some may be newer to you.
What is it? Puppet is designed to provide a standard way of delivering and operating software, no matter where it runs. Puppet has been around since 2005 and has a large and mature ecosystem, which has evolved to become one of the best in breed Infrastructure Automation tools that can scale. It is backed up and supported by a highly active Open Source community.
Why use Puppet? Planning ahead and using config management tools like Puppet can cut down on the amount of time you spend repeating basic tasks, and help ensure that your configurations are consistent, accurate and repeatable across your infrastructure. Puppet is one of the most mature tools in this area and has an excellent support backbone
What are the problems with Puppet? The learning curve is quite high for those who are unfamiliar with puppet, and the ruby DSL may seem unfamiliar for users who have no development experience.
What is it? Vagrant – another tool from Hashicorp – provides easy to configure, easily reproducible and portable work environments that are built on top of industry standard technology. Vagrant helps enforce a single consistent workflow to maximise the flexibility of you and your team.
Why use Vagrant? Vagrant provides operations engineers with a disposable environment and consistent workflow for developing and testing infrastructure management scripts. Vagrant can be downloaded and installed within minutes on Mac OS X, Linux and Windows.
Vagrant allows you to create a single file for your project, to define the kind of machine you want to create, the software that needs to be installed, and the way you want to access the machine.
Are there any problems with Vagrant? Vagrant has been criticised as being painfully, troublingly slow.
3. ELK Stack
What is ELK? The ELK stack actually refers to three technologies – Elasticsearch, Logstash and Kibana. Elasticsearch is a NoSQL database that is based on the Lucene search engine, Logstash is a log pipeline tool that accepts inputs from different sources and exports the data to various targets, and Kibana is a visualisation layer for Elasticsearch. And they work very well together.
What are its use cases? Together they’re often used in log analysis in IT environments (although you can also use the ELK stack for BI, security and compliance & analytics.)
Why is it popular? ELK is incredibly popular. The stack is downloaded 500,000 times every month. This makes it the world’s most popular log management platform. SaaS and web startups in particular are not overly keen to stump up for enterprise products such as Splunk. In fact, there’s an increasing amount of discussion as to whether open source products are overtaking Splunk, with many seeing 2014 as a tipping point.
What is Consul.io? Consul is a tool for discovering and configuring services in your infrastructure. It can be used to present nodes and services in a flexible interface, allowing clients to have an up-to-date view of the infrastructure they’re part of.
Why use Consul.io? Consul.io comes with a number of features for providing consistent information about your infrastructure. Consul provides service and node discovery, tagging, health checks, consensus based election routines, key value storage and more. Consul allows you to build awareness into your applications and services.
Anything else I should know? Hashicorp have a really strong reputation within the developer community for releasing strong documentation with their products, and Consul.io is no exception. Consul is distributed, highly available, and datacentre aware.
What is Jenkins? Everyone loves Jenkins! Jenkins is an open source CI tool, written in Java. CI is the practice of running tests on a non-developer machine, automatically, every time someone pushes code into a source repo. Jenkins is considered a prerequisite for Continuous Integration.
Why would I want to use Jenkins? Jenkins helps automate a lot of the work of frequent builds, allows you to resolve and detect issues quickly, and also reduce integration costs because serious integration issues become less likely.
Any problems with Jenkins? Jenkins configuration can be tricky. Jenkins UI has evolved over many years without a guiding vision – and it’s arguably got more complex. It has been compared unfavourably to more modern tools such as Travis CI (which of course isn’t open source).
What is it? There was a time last year, when it seemed that all anyone wanted to talk about was Docker. Docker provides a portable application environment which enables you to package an application in a unit for application development.
Should I use it? Depending on who you ask, Docker is either the next big thing in software development or a case of the emperor’s new clothes. Docker has some neat features, including DockerHub, a public repository of Docker containers, and docker-compose, a tool for managing multiple containers as a unit on a single machine.
It’s been suggested that Docker can be a way of reducing server footprint by packing containers on physical tin without running physical kernels – but equally Docker’s security story is a hot topic. Docker’s UI also continues to improve – Docker has just released a new Mac and Windows client.
What’s the verdict? Docker can be a very useful technology – particularly in development and QA – but you should think carefully about whether you need or want to run it in production. Not everyone needs to operate at Google scale.
What is it? Ansible is a free platform for configuring and managing servers. It combines multi-node software deployment, task execution and configuration management.
Why use Ansible? Configuration management tools such as Ansible are designed to automate away much of the work of configuring machines.
Manually configuring machines via SSH, and running the commands you need to install your application stack, editing config files, and copying application code can be tedious work, and can lead to each machine being its own ‘special snowflake’ depending on who configured it. This can compound if you are setting up tens, or thousands of machines.
What are the problems with using Ansible? Ansible is considered to have a fairly weak UI. Tools such as Ansible Tower exist, but many consider them a work in progress, and using Ansible Tower drives up the TCO of using Ansible.
Ansible also has no notion of state – it just executes a series of tasks, stopping when it finishes, fails, or encountering an error. Ansible has also been around for less time than Chef and Puppet, meaning that it has a smaller developer community than some of its more mature competitors.
What is it? Saltstack, much like Ansible, is a configuration management tool and remote execution engine. It is primarily designed to allow the management of infrastructure in a predictable and repeatable way. Saltstack was designed to manage large infrastructures with thousands of servers – the kind seen at LinkedIn, Wikipedia and Google.
What are the benefits of using Salt? Because Salt uses the ZeroMQ framework, and serialises messages using msgpack, Salt is able to achieve severe speed and bandwidth gains over traditional transport layers, and is thus able to fit far more data more quickly through a given pipe. Getting set up is very simple, and someone new to configuration management can be productive before lunchtime.
Any problems with using Saltstack? Saltstack is considered to have weaker Web UI and reporting capabilities than some of its more mature competitors. It also lacks deep reporting capabilities. Some of these issues have been addressed in Saltstack Enterprise, but this may be out of budget for you.
What is it? Kubernetes is an open-source container cluster manager by Google. It aims to provide a platform for automating deployment, scaling and operations of container clusters across hosts.
Why should I use it? Kubernetes is a system for managing containerised applications across a cluster of nodes. Kubernetes was designed to address some of the disconnect between the way that modern, clustered applications work, and the assumptions they make about some of their environments.
On the one hand, users shouldn’t have to care too much about where work is scheduled – the unit is presented at the service level, and can be accomplished by any of the member nodes. On the other hand, it is important because a sysadmin will want to make sure that not all instances of a service are assigned to the same host. Kubernetes is designed to make these scheduling decisions easier.
What is it? Collectd is a daemon that collects statistics on system performance, and provides mechanisms to store the values in different ways.
Why should I use collectd? Collectd helps you collect and visualise data about your servers, and thus make informed decisions. It’s useful for working with tools like Graphite, which can render the data that collectd collects.
Collectd is an incredibly simple tool, and requires very few resources. It can even run on a Raspberry Pi! It’s also popular because of its pervasive modularity. It’s written in C, contains almost no code that would be specific to any operating system, and will therefore run on any Unix-like operating system.
What is Git? Git is the most widely used version control system in the world today.
An incredibly large number of products use Git for version control: from hobbyist projects to large enterprises, from commercial products to open source. Git is designed with speed, flexibility and security in mind, and is an example of a distributed version control system.
Should I use Git? Git is an incredibly impressive tool – combining speed, functionality, performance and security. When compared side by side to other SCM tools, Git often comes out ahead. Git has also emerged as a de facto standard, meaning that vast numbers of developers already have Git experience.
Why shouldn’t I use Git? Git has an initially steep learning curve. Its terminology can seem a little arcane and new to novices. Revert, for instance, has a very different meaning in Git than it does in SCM and CVS. However, it rewards that investment curve with increased development speed once mastered.
What is Rudder? Rudder is (yet another!) open source audit and configuration management tool that’s designed to help automate system config across large IT infrastructures.
What are the benefits of Rudder? Rudder allows users (even non-experts) to define parameters in a single console, and check that IT services are installed, running and in good health. Rudder is useful for keeping configuration drift low. Managers are also able to access compliance reports and access audit logs. Rudder is built in Scala.
What is it? Gradle is an open source build automation tool that builds upon the concepts of Apache Ant and Apache Maven and introduces a Groovy-based DSL instead of the XML form used by Maven.
Why use Gradle instead of Ant or Maven? For many years, build tools were simply about compiling and packaging software. Today, projects tend to involve larger and more complex software stacks, have multiple programming languages, and incorporate many different testing strategies. It’s now really important (particularly with the rise of Agile) that build tools support early integration of code as well as easy delivery to test and prod.
Gradle allows you to map out your problem domain using a domain specific language, which is implemented in Groovy rather than XML. Writing code in Groovy rather than XML cuts down on the size of a build, and is far more readable.
What is Chef? Chef is a config management tool designed to automate machine setup on physical servers, VMs and in the cloud. Many companies use Chef software to manage and control their infrastructure – including Facebook, Etsy and Indiegogo. Chef is designed to define Infrastructure as Code.
What is infrastructure as code? Infrastructure as Code means that, rather than manually changing and setting up machines, the machine setup is defined in a Chef recipe. Leveraging Chef allows you to easily recreate your environment in a predictable manner by automating the entire system configuration.
What are the next steps for Chef? Chef has released Chef Delivery, a tool for creating automated workflows around enterprise software development and establishing a pipeline from creation to production. Chef Delivery establishes a pipeline that every new piece of software should go through in order to prepare it for production use. Chef Delivery works in a similar way to Jenkins, but offers greater reporting and auditing capabilities.
What is it? Cobbler is a Linux provisioning server that facilitates a network-based system installation of multiple OSes from a central point using services such as DHCP, TFTP and DNS.
Cobbler can be configured for PXE, reinstallations and virtualised guests using Xen, KVM and Xenware. Cobbler also comes with a lightweight configuration management system, as well as support for integrating with Puppet.
What is it? SimianArmy is a suite of tools designed by Netflix to support cloud operations. ChaosMonkey is part of SimianArmy, and is described as a ‘resiliency tool that helps applications tolerate random instance failures.’
What does it do? The SimianArmy suite of tools are designed to help engineers test the reliability, resiliency and recoverability of their cloud services running on AWS.
Netflix began the process of creating the SimianArmy suite of tools soon after they moved to AWS. Each ‘monkey’ is decided to help Netflix make its service less fragile, and better able to support continuous service.
The SimianArmy includes:
- Chaos Monkey – randomly shuts down virtual machines (VMs) to ensure that small disruptions will not affect the overall service.
- Latency Monkey – simulates a degradation of service and checks to make sure that upstream services react appropriately.
- Conformity Monkey – detects instances that aren’t coded to best-practices and shuts them down, giving the service owner the opportunity to re-launch them properly.
- Security Monkey – searches out security weaknesses, and ends the offending instances. It also ensures that SSL and DRM certificates are not expired or close to expiration.
- Doctor Monkey – performs health checks on each instance and monitors other external signs of process health such as CPU and memory usage.
- Janitor Monkey – searches for unused resources and discards them.
Why use SimianArmy? SimianArmy is designed to make cloud services less fragile and more capable of supporting continuous service, when parts of cloud services come across a problem. By doing this, potential problems can be detected and addressed.
What is it? AWS is a secure cloud services platform, which offers compute, database storage, content delivery and other functionality to help businesses scale and grow.
Why use AWS? EC2 is the most popular AWS service, and provides a very easy way for DevOps teams to run tests. Whenever you need them, you can set up an EC2 server with a machine image up and running in seconds.
EC2 is also great for scaling out systems. You can set up bundles of servers for different services, and when there is additional load on servers, scripts can be configured to spin up additional servers. You can also handle this automatically through Amazon auto-scaling.
What are the downsides of AWS? The main downside of AWS is that all of your servers are virtual. There are options available on AWS for single tenant access, and different instance types exist, but performance will vary and never be as stable as physical infrastructure.
If you don’t need elasticity, EC2 can also be expensive at on-demand rates.
What is it? CoreOS is a Linux distribution that is designed specifically to solve the problem of making large, scalable deployments on varied infrastructure easy to manage. It maintains a lightweight host system, and uses containers to provide isolation.
Why use CoreOS? CoreOS is a barebones Linux distro. It’s known for having a very small footprint, built for “automated updates” and geared specifically for clustering.
If you’ve installed CoreOS on disk, it will update by having two system partitions – one “known good” because you’ve used it to boot to, and another that is used to download updates to. It will then automatically reboot and switch to update.
CoreOS gives you a stack of systemd, etcd, Fleet, Docker and rkt with very little else. It’s useful for spinning up a large cluster where everything is going to run in Docker containers.
What are the alternatives? Snappy Ubuntu and Project Atomic offer similar solutions.
What is Grafana? Grafana is a neat open source dashboard tool. Grafana is useful for because it displays various metrics from Graphite through a web browser.
What are the advantages of Grafana? Grafana is very simple to setup and maintain, and displays metrics in a simple, Kibana-like display style. In 2015, Grafana also released a SaaS component, Grafana.net.
You might wonder how Grafana differs from the ELK stack. While ELK is about log analytics, Grafana is more about time-series monitoring.
Grafana helps you maximise the power and ease of use of your existing time-series store, so you can focus on building nice looking and informative dashboards. It also lets you define generic dashboards through variables that can be used in metrics queries. This allows you to reuse the same dashboards for different servers, apps and experiments.
What is Chocolatey? Chocolatey is apt-get for Windows. Once installed, you can install Windows applications quickly and easily using the command line. You could install Git, 72Zip, Ruby, or even Microsoft Office! The catalogue is now incredibly complete – you really can install a wide array of apps using Chocolatey.
Why should I use Chocolatey? Because manual installs are slow and inefficient. Chocolatey promises that you can install a program (including dependencies, such as the .NET framework) without user intervention.
You could use Chocolatey on a new PC to write a simple command, and download and install a fully functioning dev environment in a few hours. It’s really cool.
What is it? Zookeeper is a centralised service for maintaining configuration information, naming, providing distributed synchronisation, and providing group services. All of these services are used in one form or another by distributed applications.
Why use Zookeeper? Zookeeper is a co-ordination system for maintaining distributed services. It’s best to see Zookeeper as a giant properties file for different processes, telling them which services are available and where they are located. This post from the Engineering team at Pinterest outlines some possible use cases for Zookeeper.
Where can I read more? Aside from Zookeeper’s documentation, which is pretty good, chapter 14 of “Hadoop: The Definitive Guide” has around 35 pages, describing in some level of detail what Zookeeper does.
What is GitHub? GitHub is a web based repository service. It provides distributed revision control and source control management functionality.
At the heart of GitHub is Git, the version control system designed and developed by Linus Torvalds. Git, like any other version control system, is designed to system, manage and store revisions of products.
GitHub is a centralised repository system for Git, which adds a Web-based graphical user interface and several collaboration features, such as wiki and basic task management tools.
One of GitHub’s coolest features is “forking” – copying a repo from one user’s account to another. This allows you to take a project that you don’t have write access to, and modify it under your own account. If you make changes, you can send a notification called a “pull request” to the original owner. The user can then merge your changes with the original repo.
What is it? Drone is a continuous integration platform, based on Docker and built in Go. Drone works with Docker to run tests, and also works with Github, Gitlab and Bitbucket.
Why use Drone? The use case for Drone is much the same as any other continuous integration solution. CI is the practice of making regular commits to your code base. Since with CI you will end up building and testing your code more frequently, the development process will be sped up. Drone does this – speeding up the process of building and testing.
How does it work? Drone pulls code from a Git repository, and then runs scripts that you define. Drone allows you to run any test suite, and will report back to you via email or indicate the status with a badge on your profile. Because Drone is integrated with Docker, it can support a huge number of languages including PHP, Go, Ruby and Python, to name just a few.
What is it? Pagerduty is an alarm aggregation and monitoring system that is used predominantly by support and sysadmin teams.
How does it work? PagerDuty allows support teams to pull all of their incident reporting tools into a single place, and receive an alert when an incident occurs. Before PagerDuty came along, companies used to cobble together their own incident management solutions. PagerDuty is designed to plug in whatever monitoring systems they are using, and manage the incident reporting from one place.
Anything else? PagerDuty provides detailed metrics on response and resolution times too.
What is it? Dokku is a mini-Heroku, running on Docker.
Why should I use it? If you’re already deploying apps the Heroku way, but don’t like the way that Heroku is getting more expensive for hobbyists, running Dokku from a tool such as DigitalOcean could be a great solution.
Having the ability to deploy a site to a remote and have it immediately using Github is a huge boon. Here’s a tutorial for getting it up and running.
What is it? OpenStack is free and open source software for cloud computing, which is mostly deployed as Infrastructure as a Service.
What are the aims of OpenStack? OpenStack is designed to help businesses build Amazon-like cloud services in their own data centres.
OpenStack is a Cloud OS designed to control large pools of compute, storage and networking resources through a datacentre, managed through a dashboard giving administrators control while also empowering users to provision resources.
What is it? Sublime-Text is a cross-platform source code editor with a Python API. It supports many different programming languages and markup languages, and has extensive code highlighting functionality.
What’s good about it? Sublime-Text is feature-ful, it’s stable, and it’s being continuously developed. It is also built from the ground up to be extremely customisable (with a great plugin architecture, too).
What is it? Nagios is an open source tool for monitoring systems, networks and infrastructure. Nagios provides alerting and monitoring services for servers, switches, applications and services.
Why use Nagios? Nagios main strengths are that it is open source, relatively robust and reliable, and is highly configurable. It has an active development community, and runs on many different kind of operating systems. You can use Nagios to monitor services such as DHCP, DNS, FTP, SSH, Telnet, HTTP, NTP, POP3, IMAP, SMTP and more. It can also be used to monitor database servers such as MySQL, Postgres, Oracle and SQL Server.
Has it had any criticism? Nagios has been criticised as lacking scalability and usability. However, Nagios is stable and its limitations and problems are well-known and understood. And certainly some, including Etsy, are happy to see Nagios live on a little longer.
What is it? Spinnaker is an open-source, multi-cloud CD platform for releasing software changes with high velocity and confidence.
What’s it designed to do? Spinnaker was designed by Netflix as the successor to its “Asgard” project. Spinnaker is designed to allow companies to hook into and deploy assets across two cloud providers at the same time.
What’s good about it? It’s battle-tested on Netflix’s infrastructure, and allows the creation of pipelines that begin with the creation of some deployable asset (say a Docker image or a jar file), and end with a deployment. Spinnaker offers an out of the box setup, and engineers can make and re-use pipelines on different workflows.
What is it? Flynn is one of the most popular open source Docker PaaS solutions. Flynn aims to provide a single platform that Ops can provide to developers to power production, testing and development, freeing developers to focus.
Why should you use Flynn? Flynn is an open source PaaS built from pluggable components that you can mix and match however you want. Out of the box, it works in a very similar way to Heroku, but you are able to replace pieces and put whatever you need into Flynn.
Is Flynn production-ready? The Flynn team correctly point out that “production ready” means different things to different people. As with many of the tools in this list, the best way to find out if it’s a fit for you is to try them!
If you’re interested in learning more about DevOps or specific DevOps tools, why not take a look at our Training pages.
We offer regular Introduction to DevOps courses, and have a number of upcoming Jenkins training courses.