Red Hat Bugzilla – Bug 1633651
master-restart returns unexplained exit code
Last modified: 2018-10-01 07:27:47 EDT
note: local 3.10 https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_control_plane/files/scripts/docker/master-restart#L10 I do not see :L10 `types=( "atomic-openshift" "origin" )` in my local 3.10 installation. Unsure if I'm referencing the wrong tag or if there is an issue with the 3.10 install/update
I'm raising the priority and severity here as my understanding is that solution and diagnostic changes may be made and vetted against services that were not actually restarted. This could create a lot of needless spin when troubleshooting. Is there context (existing documentation) that I'm missing where master-restart is not applicable ?
My initial efforts to capture/file this BZ have been misleading (wrong). I have renamed and hope this comment explains the concern. `master-restart api` returns a "2" after a clean install [1]. In this scenario the "2" would be expected to be the result of the `docker wait` command [2]. Which I *believe* is failing as the parent container has already been killed [3]. Is https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_control_plane/files/scripts/docker/master-restart#L33-L37 behaving as intended and can/should the script be adjusted to echo recommendations on results that are not "0" or "1"? [1] [root@master-0 bin]# bash -x master-restart api + set -euo pipefail + [[ -z api ]] + types=("atomic-openshift" "origin") + for type in '"${types[@]}"' + systemctl cat atomic-openshift-master-api.service + for type in '"${types[@]}"' + systemctl cat origin-master-api.service ++ docker ps -l -q --filter label=io.kubernetes.container.name=api + child_container=2e9d869f7dfc ++ docker ps -l -q --filter label=openshift.io/component=api --filter label=io.kubernetes.container.name=POD + container=5f13cd7bc77d + [[ -z 5f13cd7bc77d ]] + docker stop 5f13cd7bc77d --time 30 + [[ -z 2e9d869f7dfc ]] + exec timeout 60 docker -l debug wait 2e9d869f7dfc 2 [2] NAME docker-wait - Block until one or more containers stop, then print their exit codes SYNOPSIS docker wait [--help] CONTAINER [CONTAINER...] DESCRIPTION Block until one or more containers stop, then print their exit codes. [3] [root@master-0 bin]# journalctl --no-pager --unit atomic-openshift-node --since "1 minutes ago" Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.042144 16211 kubelet.go:1914] SyncLoop (PLEG): "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)", event: &pleg.PodLifecycleEvent{ID:"49971b2bc841377f9081ba392d37185b", Type:"ContainerDied", Data:"5f13cd7bc77db90aebbd2d04d7cb4b74808e10f861261d1333412eec441d0f06"} Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: W0928 12:04:50.044537 16211 pod_container_deletor.go:77] Container "5f13cd7bc77db90aebbd2d04d7cb4b74808e10f861261d1333412eec441d0f06" not found in pod's containers Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.345478 16211 kuberuntime_manager.go:403] No ready sandbox for pod "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)" can be found. Need to start a new one Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.347910 16211 kuberuntime_container.go:547] Killing container "docker://2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc" with 30 second grace period Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.418610 16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.419565 16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.419766 16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: W0928 12:04:51.105822 16211 prober.go:103] No ref for container "docker://2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc" (master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b):api) Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:51.105860 16211 prober.go:111] Liveness probe for "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b):api" failed (failure): Get https://10.10.93.197:443/healthz: dial tcp 10.10.93.197:443: getsockopt: connection refused Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:51.173510 16211 kubelet.go:1914] SyncLoop (PLEG): "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)", event: &pleg.PodLifecycleEvent{ID:"49971b2bc841377f9081ba392d37185b", Type:"ContainerDied", Data:"2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc"} Sep 28 12:04:52 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:52.870598 16211 kuberuntime_manager.go:757] checking backoff for container "api" in pod "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)"