Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1633651 - master-restart returns unexplained exit code
master-restart returns unexplained exit code
Status: NEW
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master (Show other bugs)
3.10.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.10.z
Assigned To: Michal Fojtik
Xingxing Xia
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-09-27 09:09 EDT by jolee
Modified: 2018-10-01 07:27 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3626591 None None None 2018-09-27 09:10 EDT

  None (edit)
Comment 2 jolee 2018-09-27 11:11:41 EDT
note:  local 3.10

https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_control_plane/files/scripts/docker/master-restart#L10

I do not see :L10 `types=( "atomic-openshift" "origin" )` in my local 3.10 installation.

Unsure if I'm referencing the wrong tag or if there is an issue with the 3.10 install/update
Comment 3 jolee 2018-09-28 10:43:58 EDT
I'm raising the priority and severity here as my understanding is that solution and diagnostic changes may be made and vetted against services that were not actually restarted.  This could create a lot of needless spin when troubleshooting.

Is there context (existing documentation) that I'm missing where master-restart is not applicable ?
Comment 5 jolee 2018-09-28 12:25:13 EDT
My initial efforts to capture/file this BZ have been misleading (wrong).  I have renamed and hope this comment explains the concern.

`master-restart api` returns a "2" after a clean install [1].

In this scenario the "2" would be expected to be the result of the `docker wait` command [2].  Which I *believe* is failing as the parent container has already been killed [3].

Is https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_control_plane/files/scripts/docker/master-restart#L33-L37

behaving as intended and can/should the script be adjusted to echo recommendations on results that are not "0" or "1"?


[1]
[root@master-0 bin]# bash -x master-restart api
+ set -euo pipefail
+ [[ -z api ]]
+ types=("atomic-openshift" "origin")
+ for type in '"${types[@]}"'
+ systemctl cat atomic-openshift-master-api.service
+ for type in '"${types[@]}"'
+ systemctl cat origin-master-api.service
++ docker ps -l -q --filter label=io.kubernetes.container.name=api
+ child_container=2e9d869f7dfc
++ docker ps -l -q --filter label=openshift.io/component=api --filter label=io.kubernetes.container.name=POD
+ container=5f13cd7bc77d
+ [[ -z 5f13cd7bc77d ]]
+ docker stop 5f13cd7bc77d --time 30
+ [[ -z 2e9d869f7dfc ]]
+ exec timeout 60 docker -l debug wait 2e9d869f7dfc
2

[2]
NAME
       docker-wait - Block until one or more containers stop, then print their exit codes

SYNOPSIS
       docker wait [--help] CONTAINER [CONTAINER...]

DESCRIPTION
       Block until one or more containers stop, then print their exit codes.



[3]
[root@master-0 bin]# journalctl --no-pager --unit atomic-openshift-node --since "1 minutes ago"
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.042144   16211 kubelet.go:1914] SyncLoop (PLEG): "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)", event: &pleg.PodLifecycleEvent{ID:"49971b2bc841377f9081ba392d37185b", Type:"ContainerDied", Data:"5f13cd7bc77db90aebbd2d04d7cb4b74808e10f861261d1333412eec441d0f06"}
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: W0928 12:04:50.044537   16211 pod_container_deletor.go:77] Container "5f13cd7bc77db90aebbd2d04d7cb4b74808e10f861261d1333412eec441d0f06" not found in pod's containers
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.345478   16211 kuberuntime_manager.go:403] No ready sandbox for pod "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)" can be found. Need to start a new one
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.347910   16211 kuberuntime_container.go:547] Killing container "docker://2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc" with 30 second grace period
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.418610   16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.419565   16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Sep 28 12:04:50 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:50.419766   16211 streamwatcher.go:103] Unexpected EOF during watch stream event decoding: unexpected EOF
Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: W0928 12:04:51.105822   16211 prober.go:103] No ref for container "docker://2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc" (master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b):api)
Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:51.105860   16211 prober.go:111] Liveness probe for "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b):api" failed (failure): Get https://10.10.93.197:443/healthz: dial tcp 10.10.93.197:443: getsockopt: connection refused
Sep 28 12:04:51 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:51.173510   16211 kubelet.go:1914] SyncLoop (PLEG): "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)", event: &pleg.PodLifecycleEvent{ID:"49971b2bc841377f9081ba392d37185b", Type:"ContainerDied", Data:"2e9d869f7dfc330c7e152b1e48d87865240b65ad7381d117e0b88e4699e7ffdc"}
Sep 28 12:04:52 master-0.threeten.lab.rdu2.cee.redhat.com atomic-openshift-node[16211]: I0928 12:04:52.870598   16211 kuberuntime_manager.go:757] checking backoff for container "api" in pod "master-api-master-0.threeten.lab.rdu2.cee.redhat.com_kube-system(49971b2bc841377f9081ba392d37185b)"

Note You need to log in before you can comment on or make changes to this bug.