Skip to main content

This is a new service – your feedback (opens in a new tab) will help us to improve it.

Monitor and measure proactively

Last updated: 28 July 2023
Relates to (tags): Ways of working, Build, release and deploy, Observability, SRE

Proactive and effective monitoring is the key to observable systems. Ensure consistent deployment, user experience and delivery across our systems by defining metrics and monitoring according to observability principles, and acting on those insights to instigate improvements.


Rationale

Observability is the ability to measure the internal states of a system by examining its outputs… (it) is important in software development because it gives you greater control over complex systems

Splunk - What is observability?

At the Home Office we follow the GDS Service Manual guidance on how to measure and monitor the status of all aspects of the services we provide, including user experience, security, accounting, reliability and performance. We do this by building observability into our estate, which is highly complex and distributed. Using telemetry to provide deep visibility into systems enables our teams to cut through complexity to identify and resolve issues, and improve performance. In addition to this, high observability has other benefits:


Applications and Implications

  • Implement infrastructure monitoring to determine the health and performance of the containers, environments and managed services your applications run on
  • Investigate the behaviour of your application at the service level with Application Performance Monitoring (APM)
  • When you can, use Real User Monitoring (RUM) to understand the real experience of users by collecting data from browsers about how your site performs and looks. This helps to isolate issues between the frontend or backend
  • Synthetic monitoring allows you to test and measure the experience of your web application by simulating traffic with set test variables
  • Log capture, aggregation and viewer tools help you to monitor and, importantly, analyse the logs generated from your applications and infrastructure. This helps troubleshooting and remediation
  • Use DORA metrics to understand software delivery performance

Content version permalink (GitHub) (opens in a new tab)