Keep running smoothly with Site Reliability Engineering
As your product grows, it's crucial to balance site reliability with new feature production. We can help you adopt Site Reliability Engineering (SRE) tenets and upgrade your team and processes to effectively manage SLOs and error budgets.
Let's make your product resilient

“An SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s). There are codified rules of engagement and principles for how SRE teams interact with their environment—not only the production environment, but also the product development teams, the testing teams, the users, and so on."
What we do
We'll help you establish good SRE practices, then support you when needed
We bring the tenets of SRE to your product team, sharing ways of working and building product resilience. Once the team is empowered to manage SLOs and error budgets on their own, thoughtbot moves into the background as on-call and long-term support.
Services
Fulltime Site Reliability Engineering
For projects with significant reliability and operations needs, we can assign a full-time SRE or DevOps Engineer to your team.
- Pitch SRE tenets and help product teams and stakeholders adopt the SRE mindset
- Establish SLOs and Error Budgets
- Implement monitoring and alerting to ensure Error Budgets are met
- Improve performance and scaling for applications to meet SLOs
- Improve CI/CD pipelines to allow continuous, fearless deployment to production environments
- Deploy new infrastructure to meet scaling, security, and compliance needs
- Implement infrastructure as code to ensure long-term maintainability
- Clients in the UK public sector can access our services as part of the G-Cloud-13 purchasing framework.

Let's Talk
What does site reliability look like for your app?
