Skip to main content

This is a new service – your feedback (opens in a new tab) will help us to improve it.

SEGAS-00011 Infrastructure utilisation monitoring

Last updated: 20 September 2023
Relates to (tags): Observability, Monitoring, Infrastructure, SRE

Monitoring infrastructure utilisation enables the increased reliability and performance of Home Office services by improving:

  • Automation of infrastructure scaling
  • More predictable workloads
  • Trend analysis and capacity planning
  • Cost optimisation
  • Detecting, identifying and remediating issues
  • Assuring the reliability of services

Monitoring infrastructure utilisation without also monitoring other signals of service performance is not enough to ensure a high quality service. Teams should look to our patterns for monitoring (for example Monitoring-as-code) to meet this standard and complement other service monitoring.


Requirements

Infrastructure MUST be observable relative to defined service level expectations

Infrastructure utilisation should be baselined so that Service Level Objectives (SLOs) can be defined for infrastructure measures. This enables triggers for automated proactive measures.

CPU utilisation MUST be observable

CPU utilisation by applications, services, systems or pods are to be monitored so that effective measures such as scaling out can be triggered in periods of saturation.

Memory utilisation MUST be observable

Memory utilisation by applications, services, systems or pods are to be monitored so that effective measures such as scaling out can be triggered in periods of saturation.

Disk utilisation MUST be observable

Disk utilisation by applications, services, systems or pods are to be monitored so that effective measures such as scaling out can be triggered in periods of saturation.

Network utilisation MUST be observable

Network utilisation by applications, services, systems or pods are to be monitored so that effective measures such as scaling out can be triggered in periods of saturation.

Historical infrastructure monitoring metrics MUST be retained for analysis

In order to allow for trend analysis and capacity planning, infrastructure monitoring metrics must be retained for a time period appropriate to the usage profile of the service.


Content version permalink (GitHub) (opens in a new tab)