+++ This bug was initially created as a clone of Bug #1710502 +++ When a container is terminated by the kubelet (graceful, with TERM), all containers that are "RestartAlways" must behave a certain way: 1. Exit promptly (gracefully shutting down) 2. Return exit code 0 if no error was encountered during graceful shutdown From an e2e run this was reported: May 14 21:19:13.355 E ns/openshift-image-registry pod/image-registry-8fd5866b-nn6gr node/ip-10-0-135-167.ec2.internal container=registry container exited with code 137 (Error): May 14 21:19:13.396 E ns/openshift-image-registry pod/node-ca-sw986 node/ip-10-0-135-167.ec2.internal container=node-ca container exited with code 137 (Error): May 14 21:58:52.847 E ns/openshift-image-registry pod/node-ca-42h5w node/ip-10-0-135-167.ec2.internal container=node-ca container exited with code 137 (Error): when the pods were evicted off the node. Two separate problems: 1. node-ca needs to follow the "handle TERM gracefully" pattern for bash in a container: ``` trap 'jobs -p | xargs -r kill; exit 0' TERM ``` at the top of the job, with `sleep 60 & wait` being used (which allows bash to interrupt the sleep when the pod is terminated). This is sufficient for node-ca to satisfy the requirements 2. image-registry must return exit code 0, and SHOULD perform some level of graceful shutdown (this is more of a card, I can accept a card being spawned and prioritized separately for graceful, but the exit code must be fixed). This can be fixed in the 4.1.z release, not GA blocking.
Verified in 4.1.17: $ oc delete pods/image-registry-7bd9f684b9-chg56 time="2019-09-23T03:25:40.657038016Z" level=info msg="shutting down image registry server" go.version=go1.10.8 time="2019-09-23T03:25:40.65735496Z" level=info msg="server shutdown, bye." go.version=go1.10.8 rpc error: code = Unknown desc = container with ID starting with c5079dfe8bf8480eb2f9430619c8bb088bb765bb0a0a43e11c4d8d1dee487e98 not found: ID does not exist $ echo $? 0 $ oc delete pods/node-ca-97ltl image-registry.openshift-image-registry.svc:5000 rpc error: code = Unknown desc = container with ID starting with 3d1225d8c692e7e19b64c741d3b42020a5c5dd1aa9556f820f93543d1f4e1770 not found: ID does not exist $ echo $? 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2856