The world of Information Technology and software development often conflates DevOps with SRE to mean one and the same thing. However, there are vast differences between the two. While Site Reliability Engineering (SRE) has gained traction in recent years, DevOps has been around much longer (even before the term DevOps existed).
To put it simply, DevOps and SRE are both practices put in place to deliver software faster. The only difference between the two is in their approaches; DevOps is focused on reducing the software development lifecycle, and SRE concentrates on eliminating system weaknesses to achieve the same purpose.
In this article, we will look at the fundamental ways in which DevOps and SRE differ from each other. Before we do that, let’s start with understanding what DevOps and SRE are.
In the words of Gene Kim, author of The DevOps Handbook and The Phoenix Project,
“DevOps is [the] set of cultural norms and technology practices that [enables] the fast flow of planned work from, among others, development, through tests into operations while preserving world-class reliability, operation and security. DevOps isn’t about what you do, but what your outcomes are.”
So, DevOps is mainly focused on transforming the cultural practices inside an organisation to speed up the software development lifecycle (SDLC). It is not targeted at a person, group, or position. DevOps aims to strengthen the collaboration between both the Information Technology operations and software development teams.
What it does and how it does so isn’t important; only the outcome of the process is given acknowledgement.
DevOps uses a set of principles to intensify a software engineering teams’ exposure to production systems and enables the IT operations team to escalate discrepancies to the development team more efficiently. In fact, SRE plays a crucial role in a DevOps organization by facilitating proactive testing, speed, allowing observability, and improving service reliability. DevOps encourages every DevOps-centric organization to operate as per the cultural principles outlined in its model CALMS.
SRE, short for Site Reliability Engineering, is a term coined by Google’s Senior VP Ben Treynor in charge of overseeing technical operations.
Drew Farnsworth (from Green Lane Design) explains, “I generally like to think of SRE as a system wherein development controls operations. This is a system where the environment is broken down to the most basic components of the IT stack and rolled out with the best practices baked into the hardware.”
Essentially, SREs team with expertise in software development are tasked with the responsibility of resolving problems in system production while maintaining a balance between delivery speed and the system’s reliability. In this manner, the SRE approach brings together software development personnel under operations roles to apply structured engineering practices to uphold an organisation’s policies.
They ensure that systems are available and running efficiently at all times for software teams to develop technical services to boost the reliability of the system. It’s an SREs responsibility to identify any potential weakness before it balloons into a major problem.
DevOps vs SRE: Top Differences Between DevOps and SRE
In practice, DevOps and SRE should be viewed as complementary disciplines where SREs as part of a DevOps-centric structure are focussed on bettering the reliability of their technical services. So, essentially there is no such thing as a DevOps versus SRE.
Therefore, what we are doing in this section is to assess the fundamental differences between DevOps and SRE.
In order that updates happen frequently and users have access to newer and more relevant technology, both DevOps and SRE intend to pace quickly. However, DevOps moves ahead gradually, with caution, while SRE takes into account the cost of failure to move faster.
Both implement automation and use tools to achieve this purpose.
Regarding Failures as Normal
DevOps is huge in accepting failures and regarding them as learning oppression. For that reason, it encourages a blameless culture by accepting that failures are a part of the process and doesn’t focus on making systems 100% fault-tolerant. An example of this is Netflix with its Simian Army.
SRE, on the other hand, supports blameless postmortem. The purpose behind this is to identify the cause of failure, assign accountability and work to avoid similar failures in future. How many failures a system can undergo is included in the error budget. The SLI, SLO, and SLA metrics determine this to cut down the cost of production. Basically, SRE adopts proactive monitoring and alerting practices to avert a potential failure.
Automation vs Innovation
DevOps places the utmost importance on automation. In a DevOps-focused environment, this translates into systems being automated as much as possible, resulting in boring releases. After a developer has committed the code, most of the following activities, if not all, must be automated.
Therefore, DevOps reason to pursue CI/CD is to develop high-quality systems at a higher velocity.
SRE’s reasons to pursue CI/CD are different, they are aimed at reducing the cost of failure. Any common, generic or repetitive tasks in operations such as deployment and backups are regarded as less worthy of attention. Therefore, SREs set aside a specific amount of time to avoid operational toil. This is done so they can pursue more appealing tasks such as executing or innovating new technologies or architecture-related activities.
Checkout: DevOps Projects for Beginners
Breaking Down Organisational Silos
Developers and operators walk into conflict when it comes to the deployment process. While developers would benefit from features being deployed as soon as they are coded, operation folks are focussed on making systems available which hampers the deployment process.
Both DevOps and SRE differ in how they remove the silos in an organisation.
DevOps, as explained in The DevOps Handbook, approaches this problem by including practices such as operating in smaller batches, and managing configurations better.
SREs do not just aim to optimise the flow between teams but also aid the systems in production. They do so by integrating into the teams as consultants and supporting the developers by sharing the responsibility of running systems. This is how SREs breakdown silos in an organisation.
Measuring a Successful Implementation
DevOps metrics are all about the speed of operations; this includes how often deployments take place, the time taken to do so, and how frequently they run into problems.
As per the 2017 report from Puppet and DORA, measuring a successful implementation in DevOps depends on the following:
- the frequency at which deployments happen
- the time duration between a code commit and its deployment
- the frequency at which deployments fail
- the time is taken for recovering from a deployment failure
These feedback loops are put in place to help DevOps improve the system quality while facilitating a change at experimentation.
SRE, on the other hand, works on improving systems while keeping their reliability in mind. It considers the following key metrics to determine a successful implementation:
- service-level objective (SLO)
- service-level indicator (SLI)
- service-level agreement (SLA)
The above-mentioned metrics are indicators of the reliability of a system. These metrics determine beforehand whether or not a release for change will reach production.
In SRE, these metrics of speed and quality come in handy when building an error budget and improving the reliability of systems rather than working on new features.
Google had published an eBook on how they implement Site Reliability Engineering in their production systems wherein Treynor explained SRE as,
“SRE is what happens when you ask a software engineer to design an operations team.”
When it comes to how different DevOps and SRE are, all you need to remember is that SRE is driven by developers rather than an operation team. Both maintenance and monitoring are majorly under the control of developers. That is what primarily sets the two disciplines apart.
If you’re interested to learn more about big DevOps, full stack development, check out upGrad & IIIT-B’s PG Diploma in Full-stack Software Development which is designed for working professionals and offers 500+ hours of rigorous training, 9+ projects and assignments, IIIT-B Alumni status, practical hands-on capstone projects & job assistance with top firms.