Case study
Kernel update triggered high load from Go-based monitoring
After a kernel update, a customer saw the dockerd service drive high load and suspected their Go-based application containers. They disabled those containers, but the load continued. The real cause was Netdata monitoring, a Go-based service they had not considered.
Context
A customer experienced a sharp increase in load attributed to the dockerd service shortly after applying a kernel update. Because the platform included Go-based application containers, those containers became the first suspected cause.
The customer disabled the Go-based app containers, expecting the dockerd load to drop. It did not. That made the incident harder to reason about because the obvious suspect had already been removed from the equation.
The problem
- High load appeared against the dockerd service after a kernel update, creating a strong but unproven link between Docker, the update and the application stack.
- Go-based application containers were suspected, but disabling them did not resolve the dockerd load.
- The remaining load source was not obvious because Netdata, another Go-based component, was still running and interacting with Docker metrics outside the disabled app containers.
- The customer needed a structured process-level review rather than more trial-and-error service restarts.
Our approach
- Reviewed host load, dockerd behaviour, process-level CPU usage, container activity and service behaviour instead of assuming the application containers were still responsible.
- Separated actual container workload from Docker daemon activity and host-level services so the investigation did not stop at Docker alone.
- Identified that the apparent dockerd load was being driven by Netdata monitoring, a Go-based monitoring service still running on the host.
- Explained why the issue had been easy to miss: the customer had focused on Go application containers, but Netdata itself is Go-based, so disabling the app containers did not remove every Go workload or Docker-related monitor from the server.
Practical outcomes
Relevant technologies and keywords
These are the main technologies, services and search terms connected to this case study.
Related services
Relevant services for similar infrastructure problems.
Want help with a similar issue?
Send the symptoms, affected service, recent changes and business impact. We will suggest the most appropriate route: emergency support, a fixed-scope technical fix, an infrastructure review or a wider project.