Your Kubernetes node suddenly became NotReady?
You run: kubectl get nodes
and see: worker-node-1 NotReady
At this point:
- pods may stop scheduling
- workloads can fail
- cluster health starts degrading
In most cases, the issue is related to:
- kubelet failure
- resource exhaustion
- networking issues
- container runtime problems
This guide walks through the fastest way to diagnose and fix it.
First: Check Why the Node Became NotReady
Run: kubectl describe node <node-name>
Now look under: Conditions
This section usually tells you the real issue.
Scenario 1: Kubelet Stopped Running
This is one of the most common causes.
Check kubelet: systemctl status kubelet
Wait 30–60 seconds and check: kubectl get nodes
If node becomes Ready again → issue solved.
Scenario 2: Disk Space Is Full
A node with almost no storage frequently becomes NotReady.
Check storage: df -h
Look for:
- /var
- /
- container storage paths
If usage is near 100%:
- remove old logs
- clean unused images
- prune containers
Example: docker system prune -a
or for containerd environments: clean unused snapshots/images
Scenario 3: Container Runtime Failure
If Docker or containerd crashes:
- Kubernetes cannot manage pods
- node health fails
Check runtime: systemctl status containerd or systemctl status docker
Restart if needed.
Scenario 4: Kubernetes Networking Broke
A broken CNI plugin can isolate the node.
Check kube-system pods: kubectl get pods -n kube-system
Look for failures in:
- Calico
- Flannel
- Cilium
Especially: CrashLoopBackOff
This usually indicates networking issues.
Scenario 5: Node Ran Out of Memory
High memory pressure can push the node into unhealthy state.
Check: free -m and top
If memory usage is extremely high:
- identify memory-heavy pods
- increase node size
- optimize workloads
Fastest Recovery Path (Most Engineers Do This)
If production is impacted badly:
- Restart kubelet
- Restart container runtime
- Check disk space
- Reboot node if required
In many real-world cases, this restores the node quickly.
The Mistake Most Teams Make
Most teams only restart the node.
But if the real issue is:
- disk pressure
- memory exhaustion
- broken networking
…the problem comes back again.
You need to identify the actual root cause.
How Teams Prevent “Node Not Ready” Incidents
Modern SRE teams use:
- node health monitoring
- automated alerts
- resource pressure detection
- AI-based root cause analysis
This helps detect failures before workloads go down.
Tools That Help Detect Node Failures Early
Nudgebee
Useful for:
- Kubernetes health monitoring
- automated diagnostics
- MTTR reduction
- infrastructure incident workflows
Prometheus + Grafana
Good for:
- node metrics
- resource monitoring
Datadog
Useful for:
- Kubernetes infrastructure visibility
Quick Command Cheat Sheet
kubectl get nodes
kubectl describe node <node-name>
systemctl status kubelet
systemctl status containerd
df -h
kubectl get pods -n kube-system
FAQs
Why does a Kubernetes node become NotReady?
A node usually becomes NotReady because of kubelet failures, disk pressure, memory exhaustion, networking issues, or container runtime problems.
How do I check node health in Kubernetes?
Run:
kubectl describe node <node-name>
Check the Conditions section for:
- MemoryPressure
- DiskPressure
- NetworkUnavailable
How do I fix a Kubernetes Node Not Ready error?
Common fixes include:
- restarting kubelet
- checking disk space
- restarting container runtime
- verifying network plugins
- rebooting the node if needed
Can disk pressure cause Kubernetes Node Not Ready?
Yes. If the node runs out of storage, kubelet and containers may stop functioning properly, causing the node to enter the NotReady state.
Can networking issues cause Node Not Ready?
Yes. Problems with CNI plugins like:
- Calico
- Flannel
- Cilium
can prevent the node from communicating with the cluster.
What happens when a node becomes NotReady?
When a node becomes NotReady:
- new pods may not schedule
- existing workloads may fail
- applications can become unavailable