Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1712507

Summary:	etcdquorumguard should handle TERM correctly and shut down gracefully
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Etcd	Assignee:	Robert Krawitz <rkrawitz>
Status:	CLOSED ERRATA	QA Contact:	Sunil Choudhary <schoudha>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	erich, gblomqui, xtian
Target Milestone:	---	Flags:	erich: needinfo-
Target Release:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	4.1.4
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-07-04 09:01:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-05-21 16:12:32 UTC

The quorum guard pod doesn't respond to TERM (sleep doesn't register a signal handler for TERM as PID 1 and so gets no events), which means it takes 30s to shut down.

This will need to be backported to 4.1.z

Comment 1 Eric Rich 2019-05-28 13:12:37 UTC

Does https://github.com/openshift/machine-config-operator/pull/789 address this?

Comment 2 Robert Krawitz 2019-06-18 15:02:56 UTC

erich -- yes, the pull request referenced does address this.  How should I handle this bug (close it, POST, whatnot)?

Comment 4 Sunil Choudhary 2019-06-28 12:35:15 UTC

After deleting ectd quorum guard pod, it restarts within few seconds.
Also sending TERM signal to PID of etcd quorum guard container from nodes kills the pod and it restarts in around 3-5 seconds.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-06-27-030910   True        False         21h     Cluster version is 4.1.0-0.nightly-2019-06-27-030910

NAME                                         READY   STATUS    RESTARTS   AGE
etcd-quorum-guard-7f577fc654-dc8gk           1/1     Running   0          16s
etcd-quorum-guard-7f577fc654-p58p4           1/1     Running   0          22h
etcd-quorum-guard-7f577fc654-tgg2j           1/1     Running   1          22h


$ oc describe pod etcd-quorum-guard-7f577fc654-g8dw5
...
Containers:
  guard:
    Container ID:  cri-o://259a10908a30b07400098b21b29382727a3f33750de8f00536918272cbc17fb2
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a3e0f24b20754f73c9f2a939ff16aebff879d4c74e82faccb56230a1274cac9
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9a3e0f24b20754f73c9f2a939ff16aebff879d4c74e82faccb56230a1274cac9
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
    Args:
      -c
      # properly handle TERM and exit as soon as it is signaled
      set -euo pipefail
      trap 'jobs -p | xargs -r kill; exit 0' TERM
      sleep infinity & wait
      
    State:          Running
      Started:      Fri, 28 Jun 2019 16:36:02 +0530
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 27 Jun 2019 19:47:03 +0530
      Finished:     Fri, 28 Jun 2019 16:36:01 +0530
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:      10m
      memory:   5Mi
...

Comment 6 errata-xmlrpc 2019-07-04 09:01:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1635