Accelerate
Nicole Forsgren, Jez Humble, Gene Kim
How to measure the performance of an engineering team, and what capabilities to invest in to drive higher performance. A great entry point to audit an existing team.
This book has been recommended to me when I was focusing on improving the quality of software shipped in an already swift.
4 metrics for software delivery performance
While there are many topics covered by the book, I keep coming back regularly to a single chapter: the one about measuring software delivery performance.
Based on the result of their research, the authors present four metrics to gauge performance.
- Lead Time: time it takes to go from code committed to code successfully running in production
- Deployment Frequency: how often new code is deployed to production
- Mean Time to Restore (MTTR): How long does it take to restore a service when an incident occurs (unplanned outage, service impairment)
- Change failure rate: what percentage of changes to production results in degraded service or require remediation (lead to service impairment, require a hotfix or a rollback)
Those metrics cover both velocity (lead time & deployment frequency) and reliability (mean time to restore and change failure rate) matters.
I really like that, with only four easily measurable metrics, one can have a good overview on how an engineering team is performing.
They are also quite high-level. Behind each metric lies a lot of engineering topics (code review, QA, CI/CD, tests, …). I found it to be a great entry-point to address areas of improvement in a team.
Lead Time
👉 Time it takes to go from code committed to code successfully running in production
High Performers | Medium Performers | Low Performers | |
---|---|---|---|
Lead Time | Less than one hour | Between one week and one month | Between one month and six months |
While this definition is just a subpart of the global lead time (from idea to production), it has the virtue of focusing on a development stage the engineering team is fully responsible for and can autonomously make changes.
Some subjects to tackle to reduce lead time:
- Pull Request & Code Review best practices
- CI/CD pipelines
- QA processes
Deployment frequency
👉 How often new code is deployed to production
High Performers | Medium Performers | Low Performers | |
---|---|---|---|
Deployment frequency | On demand (multiple deploys per day) | Between once per week and once per month | Between once per month and once every six months |
Instead of deployment frequency, an ideal measure here would be batch size.
Small batch size reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces costs and schedule growth.
However, batch size is hard to measure. Therefore, deployment frequency is a good proxy for batch size as there is usually a strong correlation between the two: high frequency deployment implies small batch size, and vice versa.
Some subjects to tackle to increase deployment frequency:
- CI/CD pipelines
- Feature flags
- Software architecture
Mean Time to Restore (MTTR)
👉 How long does it take to restore a service when an incident occurs (unplanned outage, service impairment)
High Performers | Medium Performers | Low Performers | |
---|---|---|---|
MTTR | Less than one hour | Less than one day | Between one day and one week |
Failure is inevitable, so the key question is: How quickly can service be restored ?
Some subjects to tackle to reduce MTTR:
- Monitoring (how incident are detected)
- Alerting
- Tech bandwidth for support
- Prioritization
Change failure rate
👉 What percentage of changes to production results in degraded service or require remediation (lead to service impairment, require a hotfix or a rollback)
High Performers | Medium Performers | Low Performers | |
---|---|---|---|
Change failure rate | 0-15% | 16-30% | >30% |
This is a key quality metric highlighting how good a delivery process is.
Some subjects to tackle to reduce change failure rate:
- Code Review process
- Test automation
- QA processes
The book gives then pointers on measuring and changing culture depending on the organizational structure you are currently in: pathological (Power-oriented), bureaucratic (Rule-oriented) or generative (Performance-oriented).
It also addresses topics such as Continuous Integration, loosely coupled architecture, InfoSec, lean management, lean product development, …
Lastly, there is a whole section dedicated to the research process (based on surveys) and the statistical analysis made to come up with the insight presented in the book. It goes into great details and gives strong legibility to the insights discovered. It is not too often we see it in opinionated software engineering books.
Long story short: Accelerate is a great resource for anyone interested in measuring and improving the performance of an engineering team. A must-read !