Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1633651

Summary:	master-restart returns unexplained exit code
Product:	OpenShift Container Platform	Reporter:	jolee
Component:	kube-apiserver	Assignee:	Stefan Schimanski <sttts>
Status:	CLOSED DEFERRED	QA Contact:	Xingxing Xia <xxia>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.10.0	CC:	aos-bugs, arghosh, dmoessne, fabio.martinelli, jokerman, mfojtik, mmccomas
Target Milestone:	---
Target Release:	3.10.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-20 18:58:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 2 jolee 2018-09-27 15:11:41 UTC

note:  local 3.10

https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_control_plane/files/scripts/docker/master-restart#L10

I do not see :L10 `types=( "atomic-openshift" "origin" )` in my local 3.10 installation.

Unsure if I'm referencing the wrong tag or if there is an issue with the 3.10 install/update

Comment 3 jolee 2018-09-28 14:43:58 UTC

I'm raising the priority and severity here as my understanding is that solution and diagnostic changes may be made and vetted against services that were not actually restarted.  This could create a lot of needless spin when troubleshooting.

Is there context (existing documentation) that I'm missing where master-restart is not applicable ?

Comment 5 jolee 2018-09-28 16:25:13 UTC

My initial efforts to capture/file this BZ have been misleading (wrong).  I have renamed and hope this comment explains the concern.

`master-restart api` returns a "2" after a clean install [1].

In this scenario the "2" would be expected to be the result of the `docker wait` command [2].  Which I *believe* is failing as the parent container has already been killed [3].

Is https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_control_plane/files/scripts/docker/master-restart#L33-L37

behaving as intended and can/should the script be adjusted to echo recommendations on results that are not "0" or "1"?


[1]
[root@master-0 bin]# bash -x master-restart api
+ set -euo pipefail
+ [[ -z api ]]
+ types=("atomic-openshift" "origin")
+ for type in '"${types[@]}"'
+ systemctl cat atomic-openshift-master-api.service
+ for type in '"${types[@]}"'
+ systemctl cat origin-master-api.service
++ docker ps -l -q --filter label=io.kubernetes.container.name=api
+ child_container=2e9d869f7dfc
++ docker ps -l -q --filter label=openshift.io/component=api --filter label=io.kubernetes.container.name=POD
+ container=5f13cd7bc77d
+ [[ -z 5f13cd7bc77d ]]
+ docker stop 5f13cd7bc77d --time 30
+ [[ -z 2e9d869f7dfc ]]
+ exec timeout 60 docker -l debug wait 2e9d869f7dfc
2

[2]
NAME
       docker-wait - Block until one or more containers stop, then print their exit codes

SYNOPSIS
       docker wait [--help] CONTAINER [CONTAINER...]

DESCRIPTION
       Block until one or more containers stop, then print their exit codes.



[3]
[root@master-0 bin]# journalctl --no-pager --unit atomic-openshift-node --since "1 minutes ago"
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.042144   16211 kubelet.go:1914] SyncLoop (PLEG): "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)", event: &pleg.PodLifecycleEvent{ID:"49971b2bc841377f9081ba392d37185b", Type:"ContainerDied", Data:"5f13cd7bc77db90aebbd2d04d7cb4b74808e10f861261d1333412eec441d0f06"}
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: W0928 12:04:50.044537   16211 pod_container_deletor.go:77] Container "5f13cd7bc77db90aebbd2d04d7cb4b74808e10f861261d1333412eec441d0f06" not found in pod's containers
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.345478   16211 kuberuntime_manager.go:403] No ready sandbox for pod "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)" can be found. Need to start a new one
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.347910   16211 kuberuntime_container.go:547] Killing container "docker://2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc" with 30 second grace period
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.418610   16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.419565   16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.419766   16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: W0928 12:04:51.105822   16211 prober.go:103] No ref for container "docker://2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc" (master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b):api)
Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:51.105860   16211 prober.go:111] Liveness probe for "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b):api" failed (failure): Get https://10.10.93.197:443/healthz: dial tcp 10.10.93.197:443: getsockopt: connection refused
Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:51.173510   16211 kubelet.go:1914] SyncLoop (PLEG): "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)", event: &pleg.PodLifecycleEvent{ID:"49971b2bc841377f9081ba392d37185b", Type:"ContainerDied", Data:"2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc"}
Sep 28 12:04:52 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:52.870598   16211 kuberuntime_manager.go:757] checking backoff for container "api" in pod "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)"

Comment 8 Stephen Cuppett 2019-11-20 18:58:13 UTC

OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift

Comment 11 Red Hat Bugzilla 2023-09-15 00:12:40 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days