The DORA metrics are four key metrics used to indicate the velocity and stability of software development.
They are:
- Deployment frequency - how often you successfully releases to production.
- Lead time for changes - the amount of time it takes for a commit to get into production.
- Change failure rate - the percentage of deployments causing a failure in production.
- Time to restore service - the amount of time it takes to recover from a failure in production.
If you had an issue after a release to production, how long would it take you to recover?
If the amount of changes is small and it hasn't been long since the last release, it could be easy to revert the code change and re-deploy.
If it's been a while since the last release or the release contains large changes, this will be harder.
If you use feature flags, can you disable a flag and stop the code that's causing the issue?
What if you need to recreate the whole environment?
How old is your most recent backup? Have you verified the backup works and can be used to restore a database, the user-uploaded files or the whole environment?
A backup is only good if it is recent and can be restored. Otherwise, it's useless.
But, restoring from backups can take time and lose data, so this should be the last option.
Releasing small changes often and using tools like feature flags will help minimise the downtime from an issue and allow service to be restored as quickly as possible.