Description of problem: nodeip-configuration.service contains invalid bash which borked OKD UPI cluster install. Version-Release number of selected component (if applicable): 4.6.0-0.okd-2020-11-03-123207 How reproducible: always Steps to Reproduce: 1. install OKD with UPI 2. 3. Actual results: [root@master-0 ~]# systemctl status nodeip-configuration ● nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services select a valid node IP Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2020-11-04 08:22:50 UTC; 22min ago Process: 980 ExecStart=/bin/bash -c until /usr/bin/podman run --rm --authfile /var/lib/kubelet/config.json --net=host --volume /etc/systemd/system:/etc/systemd/system:z registry.svc.ci.openshift.org/origin/4.6-2020> Main PID: 980 (code=exited, status=1/FAILURE) CPU: 4ms lis 04 08:22:50 master-0 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP... lis 04 08:22:50 master-0 bash[980]: /bin/bash: -c: line 0: syntax error near unexpected token `done' lis 04 08:22:50 master-0 bash[980]: /bin/bash: -c: line 0: ` until /usr/bin/podman run --rm --authfile /var/lib/kubelet/config.json --net=host --volume /etc/systemd/system:/etc/systemd/system:z registry.svc.ci.openshift> lis 04 08:22:50 master-0 systemd[1]: nodeip-configuration.service: Main process exited, code=exited, status=1/FAILURE lis 04 08:22:50 master-0 systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'. lis 04 08:22:50 master-0 systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP. Expected results: success Additional info:
Verified the fix is in 4.7.0-0.nightly-2020-11-18-125028. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-18-125028 True False 38m Cluster version is 4.7.0-0.nightly-2020-11-18-125028 $ oc debug node/ip-10-0-147-250.us-west-2.compute.internal Starting pod/ip-10-0-147-250us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# cat /etc/systemd/system/nodeip-configuration.service [Unit] Description=Writes IP address configuration so that kubelet and crio services select a valid node IP Wants=network-online.target After=network-online.target ignition-firstboot-complete.service Before=kubelet.service crio.service [Service] # Need oneshot to delay kubelet Type=oneshot # Would prefer to do Restart=on-failure instead of this bash retry loop, but # the version of systemd we have right now doesn't support it. It should be # available in systemd v244 and higher. ExecStart=/bin/bash -c " \ until \ /usr/bin/podman run --rm \ --authfile /var/lib/kubelet/config.json \ --net=host \ --volume /etc/systemd/system:/etc/systemd/system:z \ quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:de61a4e12d0893b90aaf170b28eca51504d94be9724b91ac27deed88001d54ee \ node-ip \ set --retry-on-failure; \ do \ sleep 5; \ done" [Install] RequiredBy=kubelet.service sh-4.4# systemctl status nodeip-configuration.service ● nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services sele> Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; disabled; vendor preset: disabled) Active: inactive (dead) sh-4.4# systemctl start nodeip-configuration.service sh-4.4# sh-4.4# sh-4.4# systemctl status nodeip-configuration.service ● nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services sele> Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; disabled; vendor preset: disabled) Active: inactive (dead) Nov 18 21:08:05 ip-10-0-147-250 bash[84158]: Writing manifest to image destination Nov 18 21:08:05 ip-10-0-147-250 bash[84158]: Storing signatures Nov 18 21:08:07 ip-10-0-147-250 bash[84158]: time="2020-11-18T21:08:07Z" level=info msg="Address 10.0.1> Nov 18 21:08:07 ip-10-0-147-250 bash[84158]: time="2020-11-18T21:08:07Z" level=info msg="Chosen Node IP> Nov 18 21:08:07 ip-10-0-147-250 bash[84158]: time="2020-11-18T21:08:07Z" level=info msg="Opening Kubele> Nov 18 21:08:07 ip-10-0-147-250 bash[84158]: time="2020-11-18T21:08:07Z" level=info msg="Writing Kubele> Nov 18 21:08:07 ip-10-0-147-250 bash[84158]: time="2020-11-18T21:08:07Z" level=info msg="Opening CRI-O > Nov 18 21:08:07 ip-10-0-147-250 bash[84158]: time="2020-11-18T21:08:07Z" level=info msg="Writing CRI-O > Nov 18 21:08:08 ip-10-0-147-250 systemd[1]: Started Writes IP address configuration so that kubelet and> Nov 18 21:08:08 ip-10-0-147-250 systemd[1]: nodeip-configuration.service: Consumed 7.990s CPU time sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ...
There are no accepted builds for 4.7 nightly right now. Looking at the release payload, the MCO is at a commit that contains the fix. $ mco-commit registry.svc.ci.openshift.org/origin/release:4.7.0-0.okd-2020-11-18-131704 Name: registry.svc.ci.openshift.org/origin/4.7-2020-11-18-131704@sha256:c2e191d9e3cfd7338809dbcee745016dbc85748efb567d5eaf33051f5503bba3 Media Type: application/vnd.docker.distribution.manifest.v2+json Created: 9h ago Image Size: 143.1MB in 6 layers Layers: 73.86MB sha256:ec1681b6a383e4ecedbeddd5abc596f3de835aed6db39a735f62395c8edbff30 1.789kB sha256:c4d668e229cd131e0a8e4f8218dca628d9cf9697572875e355fe4b247b6aa9f0 4.468MB sha256:93495cb22058662a790cab6ce01d962573f9c7ba350ac36304cc22380294ea38 447kB sha256:3e2f47dba162e3f140c4104888167c7b39a56ace6c866c39a757e8020d7fb984 12.38MB sha256:506373d67132f73bea88e6e527ee035b0140345f7d785e5e150ac9651a593467 51.91MB sha256:64af0388a8325b520a869989e6bf677e25880c6897b7e508a1688e3c23b2a0aa OS: linux Arch: amd64 Entrypoint: /usr/bin/machine-config-operator User: 0 Environment: OPENSHIFT_CI=true PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci GODEBUG=x509ignoreCN=0 foo=bar OPENSHIFT_BUILD_NAME=machine-config-operator OPENSHIFT_BUILD_NAMESPACE=ci-op-ptw2jkjj Labels: architecture=x86_64 build-date=2020-10-17T00:38:27.192916 com.redhat.build-host=cpt-1002.osbs.prod.upshift.rdu2.redhat.com com.redhat.component=openshift-enterprise-base-container com.redhat.license_terms=https://www.redhat.com/agreements description=The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly. distribution-scope=public io.buildah.version=1.16.4 io.k8s.description=This is the base image from which all OpenShift images inherit. io.k8s.display-name=OpenShift Base io.openshift.build.commit.author= io.openshift.build.commit.date= io.openshift.build.commit.id=e02ed51beb57f64bd8a17e2bddf5f5a37af2669f io.openshift.build.commit.message= io.openshift.build.commit.ref=master io.openshift.build.name= io.openshift.build.namespace= io.openshift.build.source-context-dir= io.openshift.build.source-location=https://github.com/openshift/machine-config-operator io.openshift.expose-services= io.openshift.release.operator=true io.openshift.tags=base rhel8 maintainer=Red Hat, Inc. name=openshift/ose-base release=202010170037.13635 summary=Provides the latest release of Red Hat Universal Base Image 8. url=https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.0-202010170037.13635 vcs-ref=e02ed51beb57f64bd8a17e2bddf5f5a37af2669f vcs-type=git vcs-url=https://github.com/openshift/machine-config-operator vendor=Red Hat, Inc. version=v4.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633