Bug 1920027
Summary: | machine-config-operator consistently failing during 4.6 to 4.7 upgrades and clusters do not install successfully with proxy configuration | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Fabian von Feilitzsch <fabian> | |
Component: | Machine Config Operator | Assignee: | Ben Howard <behoward> | |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 4.7 | CC: | ccoleman, esimard, fdeutsch, gpei, jerzhang, jhou, jima, knarra, lmohanty, mgugino, mkrejci, pmuller, tsze, weinliu, wking, wsun, yanyang, yunjiang | |
Target Milestone: | --- | Keywords: | Regression, TestBlocker | |
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | TechnicalReleaseBlocker | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1933075 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 15:55:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1915235, 1933075, 1978041 |
Description
Fabian von Feilitzsch
2021-01-25 15:15:48 UTC
I think this is most likely a regression from https://github.com/openshift/machine-config-operator/commit/fbf712a5fdd577d07d65bacdfe3c1bb2c46a6df7#diff-9c6641c1f9cfb0c678ea58b1a913a5d0d528e98047e04827dcf28e9d6e51e8ee at a glance, due to: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1353402225170321408/artifacts/e2e-aws-upgrade/pods/openshift-machine-config-operator_machine-config-daemon-549x6_machine-config-daemon.log https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1353402225170321408/artifacts/e2e-aws-upgrade/pods/openshift-machine-config-operator_machine-config-daemon-cf5k6_machine-config-daemon.log Assigning to ben Interesting. The contents of [1] `pivot.service.yaml` is: name: pivot.service dropins: - name: 10-mco-default-env.conf contents: | {{if .Proxy -}} [Service] EnvironmentFile=/etc/mco/proxy.env {{end -}} However, the diff yields `[]unit8{...}` and when converted to text you get "[Unit]". [1] https://github.com/openshift/machine-config-operator/blob/130722159901d909a64fe9781a2ae78d96fd47e3/templates/common/_base/units/pivot.service.yaml#L1-L8 [2] https://play.golang.org/p/3pUfdk-0N8R I think I know what's going on, the diff you see above is due to the [Unit] being removed in https://github.com/openshift/machine-config-operator/commit/fbf712a5fdd577d07d65bacdfe3c1bb2c46a6df7#diff-c70a0bfc46a1f4b7c0898eef8f9f84ae68875eeecd34b7e6ed2c9ef2bfdef802L5, see how it used to have [Unit] still when it was empty, Compound that with https://github.com/openshift/machine-config-operator/commit/0ad77557399cabc276750d35262632e04eae5da9 which skipped the write BUT did not update https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/update.go#L1369 Which means the write was skipped entirely (the file was the same as it was pre-update), skipped the delete (since its technically still there), so it thinks it should be empty but still has [Unit] in it. Either an update to the write (don't skip if empty, just write empty file) or an update to delete (if empty, delete) should be fine. *** Bug 1920483 has been marked as a duplicate of this bug. *** Hit similar issue on profile 14_Disconnected IPI on Azure & Private Cluster. Below is the link to must-gather. http://virt-openshift-05.lab.eng.nay.redhat.com/knarra/1920027/must-gather.local.4604239068622020283.tar.gz *** Bug 1922238 has been marked as a duplicate of this bug. *** *** Bug 1922127 has been marked as a duplicate of this bug. *** We might forget the same separating dropin files work for cri-o service as https://github.com/openshift/machine-config-operator/pull/2365. In proxy enabled cluster, the cri-o service didn't get expected "EnvironmentFile=/etc/mco/proxy.env" configured in the 10-mco-default-env.conf dropin file. [root@control-plane-0 crio.service.d]# cat 10-mco-default-env.conf [Service] Environment="GODEBUG=x509ignoreCN=0,madvdontneed=1" Verified on upi-on-vsphere behind proxy with 4.7.0-0.nightly-2021-02-02-223803. fresh installation is successful. *** Bug 1922187 has been marked as a duplicate of this bug. *** 4.6.13-x86_64 4.7.0-0.nightly-2021-02-03-165316 4.6.16-x86_64 4.7.0-0.nightly-2021-02-03-165316 Verified to be fixed for both of the pathes Upgrade 4.6 cluster with proxy enabled from 4.6.16 to 4.7.0-0.nightly-2021-02-04-012305 finished successfully. No machine-config operator degraded issue. *** Bug 1926474 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |