The quorum guard pod doesn't respond to TERM (sleep doesn't register a signal handler for TERM as PID 1 and so gets no events), which means it takes 30s to shut down. This will need to be backported to 4.1.z
Does https://github.com/openshift/machine-config-operator/pull/789 address this?
erich -- yes, the pull request referenced does address this. How should I handle this bug (close it, POST, whatnot)?
After deleting ectd quorum guard pod, it restarts within few seconds. Also sending TERM signal to PID of etcd quorum guard container from nodes kills the pod and it restarts in around 3-5 seconds. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-06-27-030910 True False 21h Cluster version is 4.1.0-0.nightly-2019-06-27-030910 NAME READY STATUS RESTARTS AGE etcd-quorum-guard-7f577fc654-dc8gk 1/1 Running 0 16s etcd-quorum-guard-7f577fc654-p58p4 1/1 Running 0 22h etcd-quorum-guard-7f577fc654-tgg2j 1/1 Running 1 22h $ oc describe pod etcd-quorum-guard-7f577fc654-g8dw5 ... Containers: guard: Container ID: cri-o://259a10908a30b07400098b21b29382727a3f33750de8f00536918272cbc17fb2 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a3e0f24b20754f73c9f2a939ff16aebff879d4c74e82faccb56230a1274cac9 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a3e0f24b20754f73c9f2a939ff16aebff879d4c74e82faccb56230a1274cac9 Port: <none> Host Port: <none> Command: /bin/bash Args: -c # properly handle TERM and exit as soon as it is signaled set -euo pipefail trap 'jobs -p | xargs -r kill; exit 0' TERM sleep infinity & wait State: Running Started: Fri, 28 Jun 2019 16:36:02 +0530 Last State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 27 Jun 2019 19:47:03 +0530 Finished: Fri, 28 Jun 2019 16:36:01 +0530 Ready: True Restart Count: 1 Requests: cpu: 10m memory: 5Mi ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1635