Bug 1712507
| Summary: | etcdquorumguard should handle TERM correctly and shut down gracefully | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | Etcd | Assignee: | Robert Krawitz <rkrawitz> |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.0 | CC: | erich, gblomqui, xtian |
| Target Milestone: | --- | Flags: | erich:
needinfo-
|
| Target Release: | 4.1.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | 4.1.4 | ||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-07-04 09:01:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Clayton Coleman
2019-05-21 16:12:32 UTC
Does https://github.com/openshift/machine-config-operator/pull/789 address this? erich -- yes, the pull request referenced does address this. How should I handle this bug (close it, POST, whatnot)? After deleting ectd quorum guard pod, it restarts within few seconds.
Also sending TERM signal to PID of etcd quorum guard container from nodes kills the pod and it restarts in around 3-5 seconds.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.1.0-0.nightly-2019-06-27-030910 True False 21h Cluster version is 4.1.0-0.nightly-2019-06-27-030910
NAME READY STATUS RESTARTS AGE
etcd-quorum-guard-7f577fc654-dc8gk 1/1 Running 0 16s
etcd-quorum-guard-7f577fc654-p58p4 1/1 Running 0 22h
etcd-quorum-guard-7f577fc654-tgg2j 1/1 Running 1 22h
$ oc describe pod etcd-quorum-guard-7f577fc654-g8dw5
...
Containers:
guard:
Container ID: cri-o://259a10908a30b07400098b21b29382727a3f33750de8f00536918272cbc17fb2
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a3e0f24b20754f73c9f2a939ff16aebff879d4c74e82faccb56230a1274cac9
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a3e0f24b20754f73c9f2a939ff16aebff879d4c74e82faccb56230a1274cac9
Port: <none>
Host Port: <none>
Command:
/bin/bash
Args:
-c
# properly handle TERM and exit as soon as it is signaled
set -euo pipefail
trap 'jobs -p | xargs -r kill; exit 0' TERM
sleep infinity & wait
State: Running
Started: Fri, 28 Jun 2019 16:36:02 +0530
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 27 Jun 2019 19:47:03 +0530
Finished: Fri, 28 Jun 2019 16:36:01 +0530
Ready: True
Restart Count: 1
Requests:
cpu: 10m
memory: 5Mi
...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1635 |