Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2088726

Summary:	oc is not reporting pods with status NodeLost for Daemonset pods when a node is marked NotReady
Product:	OpenShift Container Platform	Reporter:	Christian Passarelli <cpassare>
Component:	kube-controller-manager	Assignee:	Filip Krepinsky <fkrepins>
Status:	CLOSED WONTFIX	QA Contact:	zhou ying <yinzhou>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.10	CC:	fkrepins, maszulik, mfojtik
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-09-22 18:42:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Christian Passarelli 2022-05-20 10:33:04 UTC

Description of problem:
oc reports the daemonset pods status as "Running" if the related node is "NotReady".

Version-Release number of selected component (if applicable):
4.10

How reproducible:
100% 

Steps to Reproduce:
1. Manually shutdown a cluster node
2. Wait until the node became NotReady
3. Run "oc get pod" in any of the namespaces having Daemonsets pods to see status is not changed from "Running" for the ones running on the shutter down node. For example, execute "oc get pod -n openshift-dns -o wide" 

Actual results:
oc still report the pod as "Running".

Expected results:
The pod shouldn't be marked Running if the node is NotReady


Additional info:
In Openshift 3.11 oc reports daemonset pods as "NodeLost" when the node is "NotReady"

Comment 1 Maciej Szulik 2022-05-25 10:43:01 UTC

How long it takes for those pods to be reporting something else than Running, the problem is that by default pods won't be evicted 
from unreachable node for at least 5 minutes (see description below the table in https://kubernetes.io/docs/concepts/architecture/nodes/#condition).

Comment 2 Maciej Szulik 2022-05-25 10:43:19 UTC

*** Bug 2088727 has been marked as a duplicate of this bug. ***

Comment 3 Christian Passarelli 2022-05-25 11:33:01 UTC

From my and customer tests, they never change the status from Running. And this is expected because of the phase of the pod remaining Running. But I noticed that in 3.11 oc reports a NodeLost status.

Comment 4 Maciej Szulik 2022-05-25 13:02:49 UTC

Filip can you check what we report and how accurate that is?

Comment 5 Filip Krepinsky 2022-06-15 15:45:05 UTC

This works fine for normal pods as they get their Ready condition updated after the node becomes unreachable and are evicted after pod-eviction-timeout (as mentioned in documentation above). This will result in oc/kubectl changing their STATUS to Terminating.

Daemon set pods are special since they are not evicted and can keep "running" on the node even if it becomes unreachable. This is achieved by the following tollerations and eviction logic:

tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists

As also documented in the code:

// DaemonSet pods shouldn't be deleted by NodeController in case of node problems.
// Add infinite toleration for taint unreachable:NoExecute here
// to survive taint-based eviction enforced by NodeController
// when node turns unreachable.

Workaround as suggested by k8s docs:

In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from Kubernetes causes all the Pod objects running on the node to be deleted from the API server and frees up their names.

Kubectl also reports Running status for these daemon set pods - which is just status.phase of the pod, so I am inclined to keep the same behaviour in oc as well.

I am not sure if this (kubectl reporting NodeLost) is worth pursuing in upstream, as especially with server side printint might be difficult to get in and might get pushed to rather do changes in API.

This new feature would be only a secondary indicator as node going down (Not Ready) should be catched by alerting in the first place. Also other option is to observe the node and/or the Ready condition of the pod with other means.

IMO it is not a good idea to start customizing the pod columns in kubectl.

@maszulik thoughts on this? ^

Comment 6 Maciej Szulik 2022-06-21 08:13:07 UTC

I agree with Filip's statement above, wrt keeping both kubectl and oc consistent in that matter. It would help to better 
understand the problem at hand to describe what kind of DaemonSet that is and what the customer expectation are.

Comment 9 Filip Krepinsky 2022-09-21 21:53:20 UTC

> I would suggest to have a status like "disconnected" or "unknown" as on a node with malfunction or bad firewall rule the pod still can work ...

nitpick: with faulty network we can not be sure if it is working correctly even if it is running

I am still inclined to not to pursue this and close as wontfix this since it has a minor impact when compared to the node being down.

Comment 10 Maciej Szulik 2022-09-22 12:08:31 UTC

(In reply to Filip Krepinsky from comment #9)
> > I would suggest to have a status like "disconnected" or "unknown" as on a node with malfunction or bad firewall rule the pod still can work ...
> 
> nitpick: with faulty network we can not be sure if it is working correctly
> even if it is running
> 
> I am still inclined to not to pursue this and close as wontfix this since it
> has a minor impact when compared to the node being down.

agree, explain the workarounds and the impact and feel free to close.

Comment 11 Filip Krepinsky 2022-09-22 18:42:39 UTC

The main problem of node being down/unreachable should be catched by alerting and resolved manually as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=2088726#c5. The wrong / unknown status of daemon set pods is only secondary to this and has a minor impact.