Bug 1779916

Summary:	dns operator exits with code 255 on sigterm, should be 0 on safe exit
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Node	Assignee:	Mrunal Patel <mpatel>
Status:	CLOSED ERRATA	QA Contact:	Sunil Choudhary <schoudha>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.3.0	CC:	aos-bugs, dcbw, dmace, jokerman, mpatel, pehunt, weinliu, wking
Target Milestone:	---
Target Release:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-01-23 11:17:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-12-05 01:22:18 UTC

Dec 04 16:54:32.641 E ns/openshift-dns pod/dns-default-4fphf node/ip-10-0-154-52.ec2.internal container=dns container exited with code 255 (Error): .:5353\n2019-12-04T16:47:56.233Z [INFO] plugin/reload: Running configuration MD5 = acefebac40c697acb35b9e96ca3c7ec9\n2019-12-04T16:47:56.233Z [INFO] CoreDNS-1.5.2\n2019-12-04T16:47:56.233Z [INFO] linux/amd64, go1.12.9, \nCoreDNS-1.5.2\nlinux/amd64, go1.12.9, \n[INFO] SIGTERM: Shutting down servers then terminating\n

SIGTERM for any operator / pod in the control plane must be 0 if everything is gracefully shut down.

Must be fixed for 4.3.

Comment 1 Dan Mace 2019-12-05 22:55:12 UTC

Still researching, but here's what I've found so far.

Using an example[1], the problems start during a node reboot. According to the node journal, the reboot begins:

Dec 05 00:02:51 ip-10-0-148-144 systemd-logind[1877]: System is rebooting.
Dec 05 00:02:51 ip-10-0-148-144 systemd[1]: Stopping Kubernetes Kubelet...
Dec 05 00:04:22 ip-10-0-148-144 systemd[1]: Starting Reboot...
Dec 05 00:04:22 ip-10-0-148-144 systemd[1]: Shutting down.
Dec 05 00:04:22 ip-10-0-148-144 systemd-shutdown[1]: Syncing filesystems and block devices.
Dec 05 00:04:22 ip-10-0-148-144 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Dec 05 00:04:22 ip-10-0-148-144 systemd-journald[1188]: Journal stopped
Dec 05 00:05:04 localhost NetworkManager[1848]: <info>  [1575504304.6470] policy: set-hostname: set hostname to 'ip-10-0-148-144' (from DHCPv4)
Dec 05 00:05:07 ip-10-0-148-144 systemd[1]: Starting Kubernetes Kubelet...
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: I1205 00:05:09.313116    2353 kubelet_pods.go:1328] Generating status for "dns-default-4wmcw_openshift-dns(fa587656-1002-4986-bf72-62b90f9aa5a4)"

During shutdown, crio reports the 'dns' container stopping container stopping:

Dec 05 00:02:52 ip-10-0-148-144 systemd[1]: Stopped libcontainer container 7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c.
Dec 05 00:02:52 ip-10-0-148-144 systemd[1]: crio-7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c.scope: Consumed 642ms CPU time

After the reboot, PLEG (via Kubelet) syncs the last known status for that container, and reports that the 'dns' container exited 255 (which is suspicious, because the coredns process logs graceful shutdown in response to a TERM):

Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: I1205 00:05:09.352840    2353 kubelet.go:1953] SyncLoop (PLEG): "dns-default-4wmcw_openshift-dns(fa587656-1002-4986-bf72-62b90f9aa5a4)", event: &pleg.PodLifecycleEvent{ID:"fa587656-1002-4986-bf72-62b90f9aa5a4", Type:"ContainerDied", Data:"7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c"}
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: I1205 00:05:09.517234    2353 status_manager.go:561] Patch status for pod "dns-default-4wmcw_openshift-dns(fa587656-1002-4986-bf72-62b90f9aa5a4)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"message\":\"containers with unready status: [dns dns-node-resolver]\",\"reason\":\"ContainersNotReady\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2019-12-05T00:05:09Z\",\"message\":\"containers with unready status: [dns dns-node-resolver]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c\",\"image\":\"registry.svc.ci.openshift.org/ocp/4.3-2019-12-04-224546@sha256:6ae4883e50c3b307c7b6dc301057e5edd94e18318f608da3a8090985f5e53f9c\",\"imageID\":\"registry.svc.ci.openshift.org/ocp/4.3-2019-12-04-224546@sha256:6ae4883e50c3b307c7b6dc301057e5edd94e18318f608da3a8090985f5e53f9c\",\"lastState\":{},\"name\":\"dns\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c\",\"exitCode\":255,\"finishedAt\":\"2019-12-05T00:05:06Z\",\"message\":\".:5353\\n2019-12-04T23:59:26.072Z [INFO] plugin/reload: Running configuration MD5 = acefebac40c697acb35b9e96ca3c7ec9\\n2019-12-04T23:59:26.072Z [INFO] CoreDNS-1.5.2\\n2019-12-04T23:59:26.072Z [INFO] linux/amd64, go1.12.9, \\nCoreDNS-1.5.2\\nlinux/amd64, go1.12.9, \\n[INFO] SIGTERM: Shutting down servers then terminating\\n\",\"reason\":\"Error\",\"startedAt\":\"2019-12-04T23:59:25Z\"}}},{\"containerID\":\"cri-o://0bf66b6910fc3082754ec9e0924bf4abb351fadcb7137dcd3d3e691010542444\",\"image\":\"registry.svc.ci.openshift.org/ocp/4.3-2019-12-04-224546@sha256:3d491d61aaeb329980c631ada69aacad350bfa2999cd31da92cb373e877ffcfb\",\"imageID\":\"registry.svc.ci.openshift.org/ocp/4.3-2019-12-04-224546@sha256:3d491d61aaeb329980c631ada69aacad350bfa2999cd31da92cb373e877ffcfb\",\"lastState\":{},\"name\":\"dns-node-resolver\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://0bf66b6910fc3082754ec9e0924bf4abb351fadcb7137dcd3d3e691010542444\",\"exitCode\":255,\"finishedAt\":\"2019-12-05T00:05:07Z\",\"message\":\"/bin/bash: line 1: kill: (40) - No such process\\n\",\"reason\":\"Error\",\"startedAt\":\"2019-12-04T23:59:30Z\"}}}],\"podIP\":null,\"podIPs\":null}}"
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: I1205 00:05:09.517523    2353 status_manager.go:568] Status for pod "dns-default-4wmcw_openshift-dns(fa587656-1002-4986-bf72-62b90f9aa5a4)" updated successfully: (1, {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-04 23:59:13 +0000 UTC Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-05 00:04:27 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [dns dns-node-resolver]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-05 00:05:09 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [dns dns-node-resolver]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-04 23:59:13 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.148.144 PodIP: PodIPs:[] StartTime:2019-12-04 23:59:13 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:dns State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:.:5353
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: 2019-12-04T23:59:26.072Z [INFO] plugin/reload: Running configuration MD5 = acefebac40c697acb35b9e96ca3c7ec9
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: 2019-12-04T23:59:26.072Z [INFO] CoreDNS-1.5.2
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: 2019-12-04T23:59:26.072Z [INFO] linux/amd64, go1.12.9,
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: CoreDNS-1.5.2
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: linux/amd64, go1.12.9,
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: [INFO] SIGTERM: Shutting down servers then terminating
Dec 05 00:05:09 ip-10-0-148-144 hyperkube[2353]: ,StartedAt:2019-12-04 23:59:25 +0000 UTC,FinishedAt:2019-12-05 00:05:06 +0000 UTC,ContainerID:cri-o://7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.svc.ci.openshift.org/ocp/4.3-2019-12-04-224546@sha256:6ae4883e50c3b307c7b6dc301057e5edd94e18318f608da3a8090985f5e53f9c ImageID:registry.svc.ci.openshift.org/ocp/4.3-2019-12-04-224546@sha256:6ae4883e50c3b307c7b6dc301057e5edd94e18318f608da3a8090985f5e53f9c ContainerID:cri-o://7acdba1ed8da67ca6e63337edd566ee4667a08295cc96b49a92390babed7ae5c Started:0xc00100fec0} {Name:dns-node-resolver State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:/bin/bash: line 1: kill: (40) - No such process

Unfortunately, because of the log level of crio, I don't have crio's log of the container status to check against PLEG's logs.

If I issue a `crictl stop` to the same kind of dns container on a new cluster with crio logging set to 'debug', I can see the last container status (in the CONTAINER_EXITED state) shows exit code 0.

Still working on a reproducer.

[1] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12064

Comment 2 Dan Mace 2019-12-05 23:02:02 UTC

>If I issue a `crictl stop` to the same kind of dns container on a new cluster with crio logging set to 'debug', I can see the last container status (in the CONTAINER_EXITED state) shows exit code 0.

Also in this case, I can see the Kubelet records ContainerDied with an exit code of 0.

Comment 3 Dan Mace 2019-12-06 01:47:34 UTC

I can reproduce this by issuing a `reboot` on the node. Containers are torn down, e.g.:

Dec 06 00:55:03 ip-10-0-130-123 systemd[1]: Stopped libcontainer container d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef.
Dec 06 00:55:03 ip-10-0-130-123 systemd[1]: crio-d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef.scope: Consumed 21.833s CPU time
Dec 06 00:55:04 ip-10-0-130-123 systemd[1]: crio-conmon-d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef.scope: Consumed 49ms CPU time

crio observes the containers exiting:

Dec 06 00:55:04 ip-10-0-130-123 crio[2414276]: time="2019-12-06 00:55:04.563771362Z" level=debug msg="event: \"/var/run/crio/exits/d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef.PEE2B0\": RENAME" file="server/server.go:687"
Dec 06 00:55:04 ip-10-0-130-123 crio[2414276]: time="2019-12-06 00:55:04.563871375Z" level=debug msg="event: \"/var/run/crio/exits/d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef\": CREATE" file="server/server.go:687"
Dec 06 00:55:04 ip-10-0-130-123 crio[2414276]: time="2019-12-06 00:55:04.563905509Z" level=debug msg="container or sandbox exited: d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef" file="server/server.go:690"
Dec 06 00:55:04 ip-10-0-130-123 crio[2414276]: time="2019-12-06 00:55:04.563935623Z" level=debug msg="container exited and found: d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef" file="server/server.go:693"

I didn't catch the exit files, which I suspect will have the state I want. Interesting that there's no logging about crio itself stopping the container. My understanding is that Systemd is signaling the containers pids directly.

Upon reboot, crio reports a 255 status for the container's last state (but no message...):

Dec 06 00:55:55 ip-10-0-130-123 crio[1527]: time="2019-12-06 00:55:55.394577936Z" level=debug msg="request: &ContainerStatusRequest{ContainerId:d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef,Verbose:false,}" file="v1alpha2/api.pb.go:7798" id=fa272cc6-2dca-4c9c-9283-88a669d9c817
Dec 06 00:55:55 ip-10-0-130-123 crio[1527]: time="2019-12-06 00:55:55.394710442Z" level=debug msg="response: &ContainerStatusResponse{Status:&ContainerStatus{Id:d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef,Metadata:&ContainerMetadata{Name:dns,Attempt:1,},State:CONTAINER_EXITED,CreatedAt:1575585108224674534,StartedAt:1575585108278984847,FinishedAt:1575593752845524914,ExitCode:255,Image:&ImageSpec{Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:234cb579f88f53781e547f202152eb273cf569073818fdeb4ba7cd7086071065,},ImageRef:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:234cb579f88f53781e547f202152eb273cf569073818fdeb4ba7cd7086071065,Reason:Error,Message:,Labels:map[string]string{io.kubernetes.container.name: dns,io.kubernetes.pod.name: dns-default-d8dnh,io.kubernetes.pod.namespace: openshift-dns,io.kubernetes.pod.uid: 9ea08485-9593-40a6-90a7-c5d37ccdda0d,},Annotations:map[string]string{io.kubernetes.container.hash: 4d61f4f6,io.kubernetes.container.ports: [{\"name\":\"dns\",\"containerPort\":5353,\"protocol\":\"UDP\"},{\"name\":\"dns-tcp\",\"containerPort\":5353,\"protocol\":\"TCP\"},{\"name\":\"metrics\",\"containerPort\":9153,\"protocol\":\"TCP\"}],io.kubernetes.container.restartCount: 1,io.kubernetes.container.terminationMessagePath: /dev/termination-log,io.kubernetes.container.terminationMessagePolicy: FallbackToLogsOnError,io.kubernetes.pod.terminationGracePeriod: 30,},Mounts:[]*Mount{},LogPath:/var/log/pods/openshift-dns_dns-default-d8dnh_9ea08485-9593-40a6-90a7-c5d37ccdda0d/dns/1.log,},Info:map[string]string{},}" file="v1alpha2/api.pb.go:7798" id=fa272cc6-2dca-4c9c-9283-88a669d9c817

The kubelet posts a status update for the last container state, propagating the 255 (and now the message absent from the response log is back):

Dec 06 00:55:56 ip-10-0-130-123 hyperkube[1824]: I1206 00:55:56.175479    1824 status_manager.go:561] Patch status for pod "dns-default-d8dnh_openshift-dns(9ea08485-9593-40a6-90a7-c5d37ccdda0d)" with "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"message\":\"containers with unready status: [dns dns-node-resolver]\",\"reason\":\"ContainersNotReady\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2019-12-06T00:55:55Z\",\"message\":\"containers with unready status: [dns dns-node-resolver]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef\",\"image\":\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:234cb579f88f53781e547f202152eb273cf569073818fdeb4ba7cd7086071065\",\"imageID\":\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:234cb579f88f53781e547f202152eb273cf569073818fdeb4ba7cd7086071065\",\"lastState\":{},\"name\":\"dns\",\"ready\":false,\"restartCount\":1,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://d5368d03283b3a7fb2acc7a1b7ef01ca75da4c0beaeaadeb91a6145942ec37ef\",\"exitCode\":255,\"finishedAt\":\"2019-12-06T00:55:52Z\",\"message\":\".:5353\\n2019-12-05T22:31:48.396Z [INFO] plugin/reload: Running configuration MD5 = acefebac40c697acb35b9e96ca3c7ec9\\n2019-12-05T22:31:48.396Z [INFO] CoreDNS-1.5.2\\n2019-12-05T22:31:48.396Z [INFO] linux/amd64, go1.12.12, \\nCoreDNS-1.5.2\\nlinux/amd64, go1.12.12, \\n[INFO] SIGTERM: Shutting down servers then terminating\\n\",\"reason\":\"Error\",\"startedAt\":\"2019-12-05T22:31:48Z\"}}},{\"containerID\":\"cri-o://6c8bd0b7b2a37c4dc2aea2d1dbfa5ce6164920b6fd729d5a0dceadb90c1d0a3f\",\"image\":\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c51252d457f4975303dee5edcb55564cd5d9304b98b965288cb0ed94eb0ee8a\",\"imageID\":\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c51252d457f4975303dee5edcb55564cd5d9304b98b965288cb0ed94eb0ee8a\",\"lastState\":{},\"name\":\"dns-node-resolver\",\"ready\":false,\"restartCount\":0,\"started\":false,\"state\":{\"terminated\":{\"containerID\":\"cri-o://6c8bd0b7b2a37c4dc2aea2d1dbfa5ce6164920b6fd729d5a0dceadb90c1d0a3f\",\"exitCode\":255,\"finishedAt\":\"2019-12-06T00:55:52Z\",\"message\":\"kill: sending signal to 2454 failed: No such process\\n\",\"reason\":\"Error\",\"startedAt\":\"2019-12-05T19:49:54Z\"}}}],\"podIP\":null,\"podIPs\":null}}"

If I TERM the container pid directly, I get an exit code 0 from crictl inspect immediately after, and kubelet also receives the 0 when updating status upon noticing container death. If I KILL the pid, I don't get the coredns graceful shutdown log messages and the return code is 137 (so it's not clear whether Systemd is issuing a KILL on shutdown to these processes).

Comment 4 W. Trevor King 2019-12-06 02:18:08 UTC

> I didn't catch the exit files, which I suspect will have the state I want. Interesting that there's no logging about crio itself stopping the container. My understanding is that Systemd is signaling the containers pids directly.

I bet this is:

* systemd TERMs everyone.
* conmon and the container processes race each other to the grave.
* When conmon dies first, or maybe in all cases, since the TERM is likely to land before conmon gets much done, conmon doesn't fully reap and report the container process's exit code.

We should have conmon catch TERM and wait an appropriate amount of time to see if it can gracefully reap the container process.  Or maybe, because everyone's about to die anyway, we don't care?  If we do care, we should tell the kubelet / CRI-O to gracefully terminate containers before we tell systemd to stop everything (which was, I though, what the machine-config daemon did when rebooting?).

Comment 5 W. Trevor King 2019-12-06 02:27:28 UTC

conmon sets on_sig_exit for TERM [1].  I'm not familiar enough with C/conmon to understand what on_sig_exit is doing [2] ;).

[1]: https://github.com/containers/conmon/blob/df8c6aa516b22ca8fd28903024a90d9093d383e5/src/conmon.c#L1572
[2]: https://github.com/containers/conmon/blob/df8c6aa516b22ca8fd28903024a90d9093d383e5/src/conmon.c#L388-L404

Comment 6 W. Trevor King 2019-12-06 02:35:58 UTC

Maybe it's just forwarding the signal to possible children.  Then the raise(SIGUSR1) pops us over to on_sigusr1_cb [1], which runs check_child_processes [2].  Which... checks some stuff [3]?  Probably loops as long as non-blocking waitpid calls reap stuff.  Then on_sigusr1_cb returns G_SOURCE_CONTINUE.  Not sure what that's about yet.  But still, I don't see anything under on_sig_exit that looks like "and now conmon gracefully shuts down".  So now I'm less convinced about my theory than previously ;).

[1]: https://github.com/containers/conmon/blob/df8c6aa516b22ca8fd28903024a90d9093d383e5/src/conmon.c#L1588
[2]: https://github.com/containers/conmon/blob/df8c6aa516b22ca8fd28903024a90d9093d383e5/src/conmon.c#L446
[3]: https://github.com/containers/conmon/blob/df8c6aa516b22ca8fd28903024a90d9093d383e5/src/conmon.c#L408-L441

Comment 7 Dan Mace 2019-12-06 16:58:13 UTC

With Mrunal and Trevor's help, I've confirmed the issue affects DaemonSets generally. Some example DaemonSetSpec which are by default scheduled across nodes which are easily tested:

* openshift/dns
* openshift-monitoring/node-exporter
* openshift-image-registry/node-ca

Find the container ID for any of them, and issue a `reboot` on the node.

Looking at the tail of the previous boot journal, systemd reports signaling these containers for shutdown, but nothing in the journal indicates that an exit file is being created before the shutdown finishes.

Looking at the head of the current boot journal (following reboot), crio reports the status code for those containers as 255 (which Mrunal has suggested could indicate absence of an exit file w/ specific exit details). Oddly, the stdout from the container shutdowns is somehow preserved. The final outcome is crio tells the Kubelet the last container state was exit=255 and message=<stdout indicating graceful shutdown in response to TERM>.

So far, it seems like the absence of exit state files for the DaemonSet containers is a significant mystery. DaemonSet container handling generally seems to be different than other containers, and so it's not clear to me yet what the lifecycle expectations are or how the sequence is deferred pending conman writing exit files, allowing the Kubelet to pick up status, etc.

Comment 8 Dan Mace 2019-12-06 17:05:42 UTC

Spoke with Mrunal and we agreed this should move over to the Node component.

One impact to note is that owners of these daemonsets are implicated as having broken signal handling, when it seems that their signal handling is correct and that the reporting or lifecycling of the containers is wrong. If there are any other bug reports along these lines, we can mark them as dupes of this bug.

Comment 9 Dan Mace 2019-12-06 17:20:47 UTC

Here's a self-contained reproducer.

1. Create a DaemonSet that gets scheduled everywhere:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: reboot-test
  namespace: default
spec:
  selector:
    matchLabels:
      app: reboot-test
  template:
    metadata:
      labels:
        app: reboot-test
    spec:
      containers:
      - name: test
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c51252d457f4975303dee5edcb55564cd5d9304b98b965288cb0ed94eb0ee8a
        imagePullPolicy: IfNotPresent
        command:
        - /bin/bash
        - -c
        - |
          #!/bin/bash
          set -uo pipefail

          trap 'echo gracefully shutting down; exit 0' TERM

          while true; do
            sleep 1
          done
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
 

2. On a node, `kill -TERM` the container pid or `crictl stop` the container to become convinced the process gracefully exits on TERM with exit 0 and correctly propagates through the Kubelet.

3. Issue `reboot`.

4. See the same problematic behavior (systemd kills the container on shutdown, no exit file produced, crio reports exit=255 as the container status after reboot).

Comment 10 Mrunal Patel 2019-12-12 01:10:53 UTC

Opened PR with fixes https://github.com/cri-o/cri-o/pull/3041

Comment 12 Weinan Liu 2020-01-02 09:26:10 UTC

Verified to get fixed on 
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2019-12-30-021042   True        False         90m     Cluster version is 4.3.0-0.nightly-2019-12-30-021042


1. Create a DaemonSet that gets scheduled everywhere:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: reboot-test
  namespace: default
spec:
  selector:
    matchLabels:
      app: reboot-test
  template:
    metadata:
      labels:
        app: reboot-test
    spec:
      containers:
      - name: test
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c51252d457f4975303dee5edcb55564cd5d9304b98b965288cb0ed94eb0ee8a
        imagePullPolicy: IfNotPresent
        command:
        - /bin/bash
        - -c
        - |
          #!/bin/bash
          set -uo pipefail

          trap 'echo gracefully shutting down; exit 0' TERM

          while true; do
            sleep 1
          done
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
 

2. On a node, `oc debug node/<node name> `, `crictl stop` the container

3. Issue `reboot` the node

4. Check the journalctl log, no code 255

Comment 14 errata-xmlrpc 2020-01-23 11:17:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062