Description: Install a cluster behind proxy, the proxy setting is missing in MCO, this cause bootstrap process failed. on master machine: [core@ip-10-0-76-55 ~]$ cat /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf [Service] Environment="GODEBUG=x509ignoreCN=0,madvdontneed=1" this change may be introduced by https://github.com/openshift/machine-config-operator/pull/2632 [core@ip-10-0-76-55 ~]$ systemctl status machine-config-daemon-firstboot.service ● machine-config-daemon-firstboot.service - Machine Config Daemon Firstboot Loaded: loaded (/etc/systemd/system/machine-config-daemon-firstboot.service; enabled; vendor preset: enabled) Active: inactive (dead) Condition: start condition failed at Wed 2021-06-30 02:27:59 UTC; 1h 14min ago └─ ConditionPathExists=/etc/ignition-machine-config-encapsulated.json was not met [core@ip-10-0-76-55 ~]$ journalctl -f -u machine-config-daemon-firstboot.service -- Logs begin at Wed 2021-06-30 02:17:16 UTC. -- Jun 30 02:27:24 ip-10-0-76-55 machine-config-daemon[2012]: I0630 02:27:24.042073 2012 rpm-ostree.go:261] Running captured: rpm-ostree status --json Jun 30 02:27:24 ip-10-0-76-55 machine-config-daemon[2012]: I0630 02:27:24.099933 2012 rpm-ostree.go:184] Current origin is not custom Jun 30 02:27:24 ip-10-0-76-55 machine-config-daemon[2012]: I0630 02:27:24.827752 2012 rpm-ostree.go:211] Pivoting to: 46.82.202106211840-0 (e0c0c734343efcd6b24cc771bcbad4beb8fbd556bd3b34df266f7b046fff956f) Jun 30 02:27:24 ip-10-0-76-55 machine-config-daemon[2012]: I0630 02:27:24.827771 2012 rpm-ostree.go:243] Executing rebase from repo path /run/mco-machine-os-content/os-content-888034288/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b141fabe2c3589194d625ccfd7ce503c55c1833cbab238174c26e1148225bcba and checksum e0c0c734343efcd6b24cc771bcbad4beb8fbd556bd3b34df266f7b046fff956f Jun 30 02:27:24 ip-10-0-76-55 machine-config-daemon[2012]: I0630 02:27:24.827779 2012 rpm-ostree.go:261] Running captured: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-888034288/srv/repo:e0c0c734343efcd6b24cc771bcbad4beb8fbd556bd3b34df266f7b046fff956f --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b141fabe2c3589194d625ccfd7ce503c55c1833cbab238174c26e1148225bcba --custom-origin-description Managed by machine-config-operator Jun 30 02:27:37 ip-10-0-76-55 machine-config-daemon[2012]: I0630 02:27:37.165464 2012 update.go:1678] initiating reboot: Completing firstboot provisioning to rendered-master-39d7b607b214247392cd5bb37907d15f Jun 30 02:27:37 ip-10-0-76-55 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=killed, status=15/TERM Jun 30 02:27:37 ip-10-0-76-55 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'signal'. Jun 30 02:27:37 ip-10-0-76-55 systemd[1]: Stopped Machine Config Daemon Firstboot. Jun 30 02:27:37 ip-10-0-76-55 systemd[1]: machine-config-daemon-firstboot.service: Consumed 16.882s CPU time Version: 4.6.0-0.nightly-2021-06-25-031210 Platform: all What happened? Installation failed at bootstrap phase. What did you expect to happen? Install cluster successfully. How to reproduce it (as minimally and precisely as possible)? Install a cluster behind proxy
> Version: > 4.6.0-0.nightly-2021-06-25-031210 To drop in a more recognizable string, that's the one we used for 4.6.37 [1]. [1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.6.37
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the ImpactStatementRequested label has been added to this bug. When responding, please remove ImpactStatementRequested and set the ImpactStatementProposed label. The expectation is that the assignee answers these questions. Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking? * example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet * example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Is it serious enough to warrant blocking edges? * example: Up to 2 minute disruption in edge routing * example: Up to 90 seconds of API downtime * example: etcd loses quorum and you have to restore from backup How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)? * example: Issue resolves itself after five minutes * example: Admin uses oc to fix things * example: Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)? * example: No, it has always been like this we just never noticed * example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
Setting depends on 4.7.0's bug 1920027. That bug had three PRs: * [2363] handling zero-length dropins/units. This made it back to 4.6.21 [2]. * [2365] separating dropins for the kubelet. Tried to take this back to 4.6.z in February, but it didn't apply cleanly [2]. * [2378] separating dropins for CRI-O. Tried to take this back to 4.6.z in February, but it didn't apply cleanly [2] But with bug 1926944 bringing [2632] into 4.6.37, we now have multiple dropins in the same file in 4.6, and need manual backports of 2365 and 2378, which can happen in a single PR or two linked from this bug. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1933075#c8 [2]: https://github.com/openshift/machine-config-operator/pull/2365#issuecomment-785991276 [3]: https://github.com/openshift/machine-config-operator/pull/2378#issuecomment-785991657 [2363]: https://github.com/openshift/machine-config-operator/pull/2363 [2365]: https://github.com/openshift/machine-config-operator/pull/2365 [2378]: https://github.com/openshift/machine-config-operator/pull/2378 [2632]: https://github.com/openshift/machine-config-operator/pull/2632
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.38 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2641
I dunno if we got comment 5's requested impact statement here, but we'd tombstoned 4.6.37 on this bug back in July [1], so no need for an impact statement now. [1]: https://github.com/openshift/cincinnati-graph-data/pull/902#event-4965638902