Bug 1956372
Summary: | openshift-gcp-routes causes disruption during upgrade by stopping before all pods terminate | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | Machine Config Operator | Assignee: | Antonio Ojea <aojeagar> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.8 | CC: | rioliu |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The Openshift script handling the Google Cloud Loadbalancer logic was exiting before the network was down.
Consequence: The Openshift components that depend on loadbalancers were disrupted, so they can not exit gracefully
Fix: Wait until the network is down before exiting
Result: Graceful shutdown works correctly
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:05:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1966595 |
Description
Clayton Coleman
2021-05-03 14:41:27 UTC
Verified on 4.8.0-0.nightly-2021-05-07-075528. openshift-gcp-routes.service is stopped after network online target. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-07-075528 True False 32m Cluster version is 4.8.0-0.nightly-2021-05-07-075528 $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-9wr6012-f76d1-z7bjv-master-0 Ready master 53m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-master-1 Ready master 53m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-master-2 Ready master 53m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-b-gwn8c Ready worker 44m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-c-c2ndb Ready worker 44m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-d-2sc2x Ready worker 44m v1.21.0-rc.0+291e731 $ oc debug node/ci-ln-9wr6012-f76d1-z7bjv-master-0 Starting pod/ci-ln-9wr6012-f76d1-z7bjv-master-0-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# systemctl cat openshift-gcp-routes # /etc/systemd/system/openshift-gcp-routes.service [Unit] Description=Update GCP routes for forwarded IPs. ConditionKernelCommandLine=|ignition.platform.id=gce ConditionKernelCommandLine=|ignition.platform.id=gcp Before=network-online.target [Service] Type=simple ExecStart=/bin/bash /opt/libexec/openshift-gcp-routes.sh start ExecStopPost=/bin/bash /opt/libexec/openshift-gcp-routes.sh cleanup User=root RestartSec=30 Restart=always [Install] WantedBy=multi-user.target # Ensure that network-online.target will not complete until the node has working external LBs. RequiredBy=network-online.target sh-4.4# journalctl ...snip... May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped target Network is Online. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: node-valid-hostname.service: Succeeded. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped Ensure the node hostname is valid for the cluster. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: node-valid-hostname.service: Consumed 0 CPU time May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopping Update GCP routes for forwarded IPs.... May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: NetworkManager-wait-online.service: Succeeded. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped Network Manager Wait Online. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: NetworkManager-wait-online.service: Consumed 0 CPU time May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped target sshd-keygen.target. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: systemd-user-sessions.service: Succeeded. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped Permit User Sessions. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: systemd-user-sessions.service: Consumed 13ms CPU time May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped target Remote File Systems. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopped target Network. May 07 17:15:11 ci-ln-9wr6012-f76d1-z7bjv-master-0.c.openshift-gce-devel-ci.inte systemd[1]: Stopping Network Manager... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |