Bug 1942207
Summary: | [vsphere] hostname are changed when upgrading from 4.6 to 4.7.x causing upgrades to fail | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joseph Callen <jcallen> |
Component: | Machine Config Operator | Assignee: | rvanderp |
Machine Config Operator sub component: | platform-vsphere | QA Contact: | Michael Nguyen <mnguyen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | alexisph, aos-bugs, bjarolim, fiezzi, jhou, jima, mkrejci, openshift-bugs-escalate, rvanderp, wking |
Version: | 4.7 | Keywords: | UpgradeBlocker, Upgrades |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | UpdateRecommendationsBlocked | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Hostname set by the vsphere-hostname service is only applied on installation of the node.
Consequence: If the hostname is not statically set prior to upgrading, the hostname may be lost.
Fix: Remove condition which allowed the vsphere-hostname service to only run when a node is installed.
Result:
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:55:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1943143 |
Description
Joseph Callen
2021-03-23 20:53:14 UTC
This was discovered while trying to work on another BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1935539 # jcallen @ magnesium in ~/go/src/github.com/openshift/machine-config-operator on git:vsphere_offload_47_test x [16:56:50] $ git --no-pager diff release-4.7 diff --git a/templates/common/vsphere/files/vsphere-disable-vmxnet3v4-features.yaml b/templates/common/vsphere/files/vsphere-disable-vmxnet3v4-features.yaml new file mode 100644 index 00000000..1b5daae2 --- /dev/null +++ b/templates/common/vsphere/files/vsphere-disable-vmxnet3v4-features.yaml @@ -0,0 +1,14 @@ +filesystem: "root" +mode: 0744 +path: "/etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl" +contents: + inline: | + #!/bin/bash + # Workaround: + # https://bugzilla.redhat.com/show_bug.cgi?id=1941714 + # https://bugzilla.redhat.com/show_bug.cgi?id=1935539 + if [ "$2" == "up" ]; then + logger -s "99-vsphere-disable-tx-udp-tnl triggered by ${2}." + ethtool -K ${DEVICE_IFACE} tx-udp_tnl-segmentation off + ethtool -K ${DEVICE_IFACE} tx-udp_tnl-csum-segmentation off + fi diff --git a/templates/common/vsphere/files/vsphere-hostname.yaml b/templates/common/vsphere/files/vsphere-hostname.yaml index d9096235..5b79101a 100644 --- a/templates/common/vsphere/files/vsphere-hostname.yaml +++ b/templates/common/vsphere/files/vsphere-hostname.yaml @@ -5,9 +5,6 @@ contents: #!/usr/bin/env bash set -e - # only run if the hostname is not set - test -f /etc/hostname && exit 0 || : - if vm_name=$(/bin/vmtoolsd --cmd 'info-get guestinfo.hostname'); then /usr/bin/hostnamectl set-hostname --static ${vm_name} fi The release image: quay.io/jcallen/origin-release@sha256:81067c5c77dec5d950abf6bcb93edb6e7aea534f45e0a4a144a9f3e39c4acbbe has the above changes ➜ ~ oc adm upgrade --allow-explicit-upgrade --force --allow-upgrade-with-warnings --to-image quay.io/jcallen/origin-release@sha256:81067c5c77dec5d950abf6bcb93edb6e7aea534f45e0a4a144a9f3e39c4acbbe warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image quay.io/jcallen/origin-release@sha256:81067c5c77dec5d950abf6bcb93edb6e7aea534f45e0a4a144a9f3e39c4acbbe ➜ ~ oc get node NAME STATUS ROLES AGE VERSION jcallen2-vkhbn-master-0 Ready master 135m v1.19.0+2f3101c jcallen2-vkhbn-master-1 NotReady,SchedulingDisabled master 135m v1.19.0+2f3101c jcallen2-vkhbn-master-2 Ready master 134m v1.19.0+2f3101c jcallen2-vkhbn-worker-5pcrg NotReady,SchedulingDisabled worker 125m v1.19.0+2f3101c jcallen2-vkhbn-worker-bcqgn Ready worker 124m v1.19.0+2f3101c jcallen2-vkhbn-worker-krqqv Ready worker 124m v1.19.0+2f3101c Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? All vSphere customers leveraging the vSphere cloud providerupgrading from 4.6.z and 4.7.3 What is the impact? Is it serious enough to warrant blocking edges? Nodes may lose node names which can have serious impacts on the stability of the control plane and workloads. How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? Each node must be SSH'ed and have the node name set manually. Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? Yes, this is a regressiong introduced in 4.7.4 Based on comment 3, I've filed [1] to block *->4.7.4 edges. [1]: https://github.com/openshift/cincinnati-graph-data/pull/731 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |