Bug 1688321
Summary: | Unhealthy status of machineconfiguration.openshift.io/state=Degraded | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Oleg Nesterov <olnester> |
Component: | RHCOS | Assignee: | Steve Milner <smilner> |
Status: | CLOSED WONTFIX | QA Contact: | Micah Abbott <miabbott> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.1.0 | CC: | bbreard, dustymabe, imcleod, jialiu, jligon, nstielau, sponnaga, wking |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-14 18:47:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Oleg Nesterov
2019-03-13 14:35:08 UTC
I hit similar issue too in my cluster (47.330 + 4.0.0-0.nightly-2019-03-04-234414) # oc logs machine-config-daemon-kl5f5 -n openshift-machine-config-operator I0313 09:16:14.520551 21098 start.go:52] Version: 4.0.15-1-dirty I0313 09:16:14.520825 21098 start.go:88] starting node writer I0313 09:16:14.526389 21098 run.go:22] Running captured: chroot /rootfs rpm-ostree status --json I0313 09:16:14.596006 21098 daemon.go:175] Booted osImageURL: registry.svc.ci.openshift.org/rhcos/maipo@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b (47.330) I0313 09:16:14.596812 21098 daemon.go:247] Managing node: ip-10-0-151-157.us-east-2.compute.internal I0313 09:16:14.621930 21098 node.go:39] Setting initial node config: master-1fd62473a6bce230df3d90a1f6109081 I0313 09:16:14.644060 21098 start.go:146] Calling chroot("/rootfs") I0313 09:16:14.644095 21098 run.go:22] Running captured: rpm-ostree status I0313 09:16:14.671146 21098 daemon.go:577] State: idle AutomaticUpdates: disabled Deployments: * pivot://registry.svc.ci.openshift.org/rhcos/maipo@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b CustomOrigin: Provisioned from oscontainer Version: 47.330 (2019-02-23T04:17:13Z) I0313 09:16:14.671197 21098 daemon.go:477] In bootstrap mode I0313 09:16:14.683197 21098 daemon.go:505] Current+desired config: master-1fd62473a6bce230df3d90a1f6109081 I0313 09:16:14.686404 21098 daemon.go:609] Bootstrap pivot required to: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 I0313 09:16:14.686534 21098 update.go:674] Updating OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 pivot.service: I0313 09:16:20.041972 21151 run.go:16] Running: skopeo inspect docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 pivot.service: time="2019-03-13T09:16:20Z" level=fatal msg="Error reading manifest sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" pivot.service: W0313 09:16:20.390391 21151 run.go:45] skopeo failed: exit status 1; retrying... pivot.service: I0313 09:16:30.390660 21151 run.go:16] Running: skopeo inspect docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 pivot.service: time="2019-03-13T09:16:30Z" level=fatal msg="Error reading manifest sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" pivot.service: W0313 09:16:30.693978 21151 run.go:45] skopeo failed: exit status 1; retrying... pivot.service: I0313 09:16:50.694228 21151 run.go:16] Running: skopeo inspect docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 pivot.service: time="2019-03-13T09:16:50Z" level=fatal msg="Error reading manifest sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" pivot.service: W0313 09:16:50.995356 21151 run.go:45] skopeo failed: exit status 1; retrying... pivot.service: I0313 09:17:30.995530 21151 run.go:16] Running: skopeo inspect docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 pivot.service: time="2019-03-13T09:17:31Z" level=fatal msg="Error reading manifest sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" pivot.service: W0313 09:17:31.315975 21151 run.go:45] skopeo failed: exit status 1; retrying... pivot.service: I0313 09:18:51.316209 21151 run.go:16] Running: skopeo inspect docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 pivot.service: time="2019-03-13T09:18:51Z" level=fatal msg="Error reading manifest sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" pivot.service: W0313 09:18:51.664982 21151 run.go:45] skopeo failed: exit status 1; retrying... pivot.service: F0313 09:18:51.665013 21151 run.go:53] skopeo: timed out waiting for the condition E0313 09:18:51.674651 21098 daemon.go:446] Fatal error checking initial state of node: Checking initial state: Failed to run pivot: error queuing start job; got failed E0313 09:18:51.674908 21098 writer.go:91] Marking degraded due to: Checking initial state: Failed to run pivot: error queuing start job; got failed I0313 09:18:51.717659 21098 daemon.go:448] Entering degraded state; going to sleep This would make all machine get into `Degrade` state, cluster get into unhealty state. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-03-04-234414 True False 7h9m Error while reconciling 4.0.0-0.nightly-2019-03-04-234414: the cluster operator machine-config is failing > pivot.service: I0313 09:17:30.995530 21151 run.go:16] Running: skopeo inspect docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 > pivot.service: time="2019-03-13T09:17:31Z" level=fatal msg="Error reading manifest sha256:399582f711226ab1a0e76d8928ec55436dea9f8dc60976c10790d308b9d92181 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" Sounds like a pull-secret issue. Maybe like bug 1686556, where non-kubelet tooling isn't looking in the right place for the pull secret. > I0313 09:16:14.596006 21098 daemon.go:175] Booted osImageURL: registry.svc.ci.openshift.org/rhcos/maipo@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b (47.330)
RHCOS 47.330 is too old. Try running with a newer RHCOS (I haven't looked up exactly which version pivot#41 landed in though).
(In reply to W. Trevor King from comment #3) > > I0313 09:16:14.596006 21098 daemon.go:175] Booted osImageURL: registry.svc.ci.openshift.org/rhcos/maipo@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b (47.330) > > RHCOS 47.330 is too old. Try running with a newer RHCOS (I haven't looked > up exactly which version pivot#41 landed in though). Just like the initial report, 400.7.20190312.0 rhcos would have no such error. Yeah, probably this issue would not happen using newer rhcos, but after I talked with Oleg, we could reproduce this bug with released beta2. How to fix it for released beta2? As what I mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1677198#c8, QE is running beta2 release version -> beta3 pre-release candidate upgrade testing, this issue seem like bring a big trouble for our testing. |