Description of problem: Cannot install new clusters due to the following error DEBUG Still waiting for the cluster to initialize: Cluster operator machine-config is reporting a failure: Failed to resync 4.1.0-rc.0 because: error pool master is not ready, retrying. Status: (total: 3, updated: 0, unavailable: 0) Version-Release number of the following components: ./openshift-install v4.1.0-201904211700-dirty built from commit f3b726cc151f5a3d66bc7e23e81b3013f1347a7e release image quay.io/openshift-release-dev/ocp-release@sha256:345ec9351ecc1d78c16cf0853fe0ef2d9f48dd493da5fdffc18fa18f45707867 How reproducible: Attempt to preform an install. I'm using east-1 Steps to Reproduce: 1. attempt to deploy 2. wait to see if machine-config is successsful 3. Actual results: DEBUG Still waiting for the cluster to initialize: Working towards 4.1.0-rc.0: 78% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.1.0-rc.0: 89% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.1.0-rc.0: 91% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.1.0-rc.0: 95% complete DEBUG Still waiting for the cluster to initialize: Multiple errors are preventing progress: * Cluster operator authentication is still updating: missing version information for integrated-oauth-server * Cluster operator machine-config is reporting a failure: Failed to resync 4.1.0-rc.0 because: error pool master is not ready, retrying. Status: (total: 3, updated: 0, unavailable: 0) * Cluster operator monitoring is still updating * Cluster operator openshift-samples is still updating DEBUG Still waiting for the cluster to initialize: Working towards 4.1.0-rc.0: 98% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.1.0-rc.0: 99% complete DEBUG Still waiting for the cluster to initialize: Cluster operator machine-config is reporting a failure: Failed to resync 4.1.0-rc.0 because: error pool master is not ready, retrying. Status: (total: 3, updated: 0, unavailable: 0) DEBUG Still waiting for the cluster to initialize: Cluster operator machine-config is reporting a failure: Failed to resync 4.1.0-rc.0 because: error pool master is not ready, retrying. Status: (total: 3, updated: 0, unavailable: 0) Expected results: Clean install Additional info: This happened with the 0.16.1 installer and the 4.0.0-0.9 release
Created attachment 1557952 [details] must gather results
*** Bug 1702518 has been marked as a duplicate of this bug. ***
Linking possibly-related discussion in bug 1671816, bug 1677198, bug 1695721, and bug 1701409 (although they have total: 3, updated: 0, unavailable: 1).
Poking around in comment 1's must-gather, other operators look happy: $ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators.yaml | jq -r '.items[] | (.status.conditions[] | select(.type == "Available").status) + "\t" + (.status.versions[] | select(.name == "operator").version) + "\t" + .metadata.name' True 4.1.0-rc.0 authentication True 4.1.0-rc.0 cloud-credential True 4.1.0-rc.0 cluster-autoscaler True 4.1.0-rc.0 console True 4.1.0-rc.0 dns True 4.1.0-rc.0 image-registry True 4.1.0-rc.0 ingress True 4.1.0-rc.0 kube-apiserver True 4.1.0-rc.0 kube-controller-manager True 4.1.0-rc.0 kube-scheduler True 4.1.0-rc.0 machine-api False 4.1.0-rc.0 machine-config True 4.1.0-rc.0 marketplace True 4.1.0-rc.0 monitoring True 4.1.0-rc.0 network True 4.1.0-rc.0 node-tuning True 4.1.0-rc.0 openshift-apiserver True 4.1.0-rc.0 openshift-controller-manager True 4.1.0-rc.0 openshift-samples True 4.1.0-rc.0 operator-lifecycle-manager True 4.1.0-rc.0 operator-lifecycle-manager-catalog True 4.1.0-rc.0 service-ca True 4.1.0-rc.0 service-catalog-apiserver True 4.1.0-rc.0 service-catalog-controller-manager True 4.1.0-rc.0 storage Time when we started trying to remove the bootstrap crutch: $ yaml2json <namespaces/kube-system/core/configmaps.yaml | jq -r '.items[] | select(.metadata.name == "bootstrap").metadata.creationTimestamp' 2019-04-24T03:07:12Z Times for those logs: $ cat namespaces/openshift-machine-config-operator/pods/machine-config-operator-f868ddcc8-pr64p/machine-config-operator/machine-config-operator/logs/current.log 2019-04-24T03:04:19.520498032Z I0424 03:04:19.520248 1 start.go:42] Version: 4.1.0-201904211700-dirty ... 2019-04-24T03:04:55.445250004Z I0424 03:04:55.445183 1 sync.go:56] Initialization complete 2019-04-24T03:07:24.909684422Z W0424 03:07:24.909172 1 reflector.go:270] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.MachineConfig ended with: too old resource version: 3112 (4473) 2019-04-24T03:07:24.909762056Z W0424 03:07:24.909629 1 reflector.go:270] k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/factory.go:117: watch of *v1beta1.CustomResourceDefinition ended with: too old resource version: 2283 (4016) 2019-04-24T03:07:24.909803993Z W0424 03:07:24.909709 1 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.ServiceAccount ended with: too old resource version: 3178 (4020) 2019-04-24T03:07:24.909922814Z W0424 03:07:24.909887 1 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.DaemonSet ended with: too old resource version: 3365 (4031) 2019-04-24T03:07:24.910079245Z W0424 03:07:24.910051 1 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Deployment ended with: too old resource version: 2931 (4031) 2019-04-24T03:07:24.941899618Z W0424 03:07:24.941869 1 reflector.go:270] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.MachineConfigPool ended with: too old resource version: 3416 (4473) 2019-04-24T03:07:24.996911626Z W0424 03:07:24.996874 1 reflector.go:270] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: watch of *v1.Infrastructure ended with: too old resource version: 1700 (5257) 2019-04-24T03:07:25.017919495Z W0424 03:07:25.015938 1 reflector.go:270] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: watch of *v1.Network ended with: too old resource version: 1573 (5257) 2019-04-24T03:07:25.154733485Z W0424 03:07:25.154690 1 reflector.go:270] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.ControllerConfig ended with: too old resource version: 2936 (5258) 2019-04-24T03:08:59.200733656Z E0424 03:08:59.200068 1 operator.go:279] error pool master is not ready, retrying. Status: (total: 3, updated: 0, unavailable: 0) ... 2019-04-24T03:22:56.025285931Z E0424 03:22:56.025237 1 operator.go:279] error pool master is not ready, retrying. Status: (total: 3, updated: 0, unavailable: 0) So things start to go bad a few seconds after we get into bootstrap teardown. This suggests the production control-plane was not as ready as cluster-bootstrap thought. Poking at the Kubernetes API-server operator: $ tail -n5 namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-7fccf67d7b-55skm/operator/operator/logs/previous.log 2019-04-24T03:05:38.198377996Z W0424 03:05:38.198365 1 builder.go:108] Restart triggered because of file /var/run/secrets/serving-cert/tls.crt was created 2019-04-24T03:05:38.198450313Z I0424 03:05:38.198426 1 observer_polling.go:78] Observed change: file:/var/run/secrets/serving-cert/tls.key (current: "5bd1e7ba20a29623886f148284bcc9447886755b3f277c331cf88f1ba6021f64", lastKnown: "") 2019-04-24T03:05:38.198494543Z F0424 03:05:38.198474 1 leaderelection.go:65] leaderelection lost 2019-04-24T03:05:38.215799103Z I0424 03:05:38.198582 1 node_controller.go:134] Shutting down NodeController 2019-04-24T03:05:38.215799103Z F0424 03:05:38.207396 1 builder.go:217] server exited $ cat namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-7fccf67d7b-55skm/operator/operator/logs/current.log 2019-04-24T03:05:42.773262543Z I0424 03:05:42.773141 1 cmd.go:138] Using service-serving-cert provided certificates ... 2019-04-24T03:05:43.262565928Z I0424 03:05:43.262521 1 leaderelection.go:205] attempting to acquire leader lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock... 2019-04-24T03:07:50.430738178Z I0424 03:07:50.430189 1 leaderelection.go:214] successfully acquired lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock 2019-04-24T03:07:50.430922075Z I0424 03:07:50.430884 1 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator-lock", UID:"a9c8df6d-663d-11e9-a0d9-0e347a4e8faa", APIVersion:"v1", ResourceVersion:"5839", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' d9159df0-663d-11e9-bc7f-0a580a820007 became leader 2019-04-24T03:07:50.449530023Z I0424 03:07:50.449500 1 certrotationcontroller.go:452] Starting CertRotation 2019-04-24T03:07:50.44970853Z I0424 03:07:50.449676 1 targetconfigcontroller.go:315] Starting TargetConfigController Ok, 7:50 is a bit after 7:12, but this still feels like "picking up after the bootstrap operator went away". ... 2019-04-24T03:07:50.552327567Z I0424 03:07:50.552306 1 prune_controller.go:335] No excluded revisions to prune, skipping 2019-04-24T03:07:50.552649463Z I0424 03:07:50.552599 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"901b0c8f-663d-11e9-a0d9-0e347a4e8faa", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'LoglevelChange' Changed loglevel level to "2" 2019-04-24T03:07:52.264619167Z E0424 03:07:52.264568 1 monitoring_resource_controller.go:182] key failed with : the server could not find the requested resource 2019-04-24T03:07:52.264797632Z I0424 03:07:52.264752 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator", UID:"901b0c8f-663d-11e9-a0d9-0e347a4e8faa", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ServiceMonitorCreateFailed' Failed to create ServiceMonitor.monitoring.coreos.com/v1: the server could not find the requested resource No monitor types this early. Probably ok. Summarizing revision transitions: $ grep -A6 NodeStatus namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-7fccf67d7b-55skm/operator/operator/logs/current.log | sed 's/^/ /' 2019-04-24T03:07:54.247361356Z I0424 03:07:54.247313 1 installer_controller.go:305] "ip-10-1-146-153.ec2.internal" moving to (v1.NodeStatus) { 2019-04-24T03:07:54.247361356Z NodeName: (string) (len=28) "ip-10-1-146-153.ec2.internal", 2019-04-24T03:07:54.247361356Z CurrentRevision: (int32) 1, 2019-04-24T03:07:54.247361356Z TargetRevision: (int32) 0, 2019-04-24T03:07:54.247361356Z LastFailedRevision: (int32) 0, 2019-04-24T03:07:54.247361356Z LastFailedRevisionErrors: ([]string) <nil> 2019-04-24T03:07:54.247361356Z } -- 2019-04-24T03:07:56.447392056Z I0424 03:07:56.447344 1 installer_controller.go:305] "ip-10-1-146-153.ec2.internal" moving to (v1.NodeStatus) { 2019-04-24T03:07:56.447392056Z NodeName: (string) (len=28) "ip-10-1-146-153.ec2.internal", 2019-04-24T03:07:56.447392056Z CurrentRevision: (int32) 1, 2019-04-24T03:07:56.447392056Z TargetRevision: (int32) 0, 2019-04-24T03:07:56.447392056Z LastFailedRevision: (int32) 0, 2019-04-24T03:07:56.447392056Z LastFailedRevisionErrors: ([]string) <nil> 2019-04-24T03:07:56.447392056Z } -- 2019-04-24T03:07:58.448236954Z I0424 03:07:58.448188 1 installer_controller.go:346] "ip-10-1-138-229.ec2.internal" moving to (v1.NodeStatus) { 2019-04-24T03:07:58.448236954Z NodeName: (string) (len=28) "ip-10-1-138-229.ec2.internal", 2019-04-24T03:07:58.448236954Z CurrentRevision: (int32) 0, 2019-04-24T03:07:58.448236954Z TargetRevision: (int32) 1, 2019-04-24T03:07:58.448236954Z LastFailedRevision: (int32) 0, 2019-04-24T03:07:58.448236954Z LastFailedRevisionErrors: ([]string) <nil> 2019-04-24T03:07:58.448236954Z } -- 2019-04-24T03:08:31.204416455Z I0424 03:08:31.204361 1 installer_controller.go:305] "ip-10-1-138-229.ec2.internal" moving to (v1.NodeStatus) { 2019-04-24T03:08:31.204416455Z NodeName: (string) (len=28) "ip-10-1-138-229.ec2.internal", 2019-04-24T03:08:31.204416455Z CurrentRevision: (int32) 1, 2019-04-24T03:08:31.204416455Z TargetRevision: (int32) 0, 2019-04-24T03:08:31.204416455Z LastFailedRevision: (int32) 0, 2019-04-24T03:08:31.204416455Z LastFailedRevisionErrors: ([]string) <nil> 2019-04-24T03:08:31.204416455Z } -- 2019-04-24T03:08:42.006662213Z I0424 03:08:42.006613 1 installer_controller.go:346] "ip-10-1-171-77.ec2.internal" moving to (v1.NodeStatus) { 2019-04-24T03:08:42.006662213Z NodeName: (string) (len=27) "ip-10-1-171-77.ec2.internal", 2019-04-24T03:08:42.006662213Z CurrentRevision: (int32) 1, 2019-04-24T03:08:42.006662213Z TargetRevision: (int32) 2, 2019-04-24T03:08:42.006662213Z LastFailedRevision: (int32) 0, 2019-04-24T03:08:42.006662213Z LastFailedRevisionErrors: ([]string) <nil> 2019-04-24T03:08:42.006662213Z } -- 2019-04-24T03:10:07.689588613Z I0424 03:10:07.689549 1 installer_controller.go:305] "ip-10-1-171-77.ec2.internal" moving to (v1.NodeStatus) { ... So that's ip-10-1-146-153 already in revision 1, ip-10-1-138-229 transitioning to revision 1 in ~30s during this bootstrap-teardown window. And then ip-10-1-171-77 showing up still in the window and talking about moving from revision 1 to 2? Dunno what's going on there. But all of these revision rotations mean we have no Kubernetes API-server logs going back to the sensitive time: $ head -n1 namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-*/kube-apiserver-*/kube-apiserver-*/logs/*.log | sed 's/^/ /' | sed 's/ *$//' ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-138-229.ec2.internal/kube-apiserver-5/kube-apiserver-5/logs/current.log <== 2019-04-24T03:15:20.047764603Z I0424 03:15:20.047619 1 plugins.go:84] Registered admission plugin "NamespaceLifecycle" ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-138-229.ec2.internal/kube-apiserver-5/kube-apiserver-5/logs/previous.log <== ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-138-229.ec2.internal/kube-apiserver-cert-syncer-5/kube-apiserver-cert-syncer-5/logs/current.log <== 2019-04-24T03:15:20.207484722Z I0424 03:15:20.207272 1 observer_polling.go:106] Starting file observer ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-138-229.ec2.internal/kube-apiserver-cert-syncer-5/kube-apiserver-cert-syncer-5/logs/previous.log <== ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-146-153.ec2.internal/kube-apiserver-5/kube-apiserver-5/logs/current.log <== 2019-04-24T03:13:31.952974714Z I0424 03:13:31.952830 1 plugins.go:84] Registered admission plugin "NamespaceLifecycle" ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-146-153.ec2.internal/kube-apiserver-5/kube-apiserver-5/logs/previous.log <== ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-146-153.ec2.internal/kube-apiserver-cert-syncer-5/kube-apiserver-cert-syncer-5/logs/current.log <== 2019-04-24T03:13:32.137985264Z I0424 03:13:32.137796 1 observer_polling.go:106] Starting file observer ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-146-153.ec2.internal/kube-apiserver-cert-syncer-5/kube-apiserver-cert-syncer-5/logs/previous.log <== ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-171-77.ec2.internal/kube-apiserver-5/kube-apiserver-5/logs/current.log <== 2019-04-24T03:11:38.148597366Z I0424 03:11:38.148457 1 plugins.go:84] Registered admission plugin "NamespaceLifecycle" ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-171-77.ec2.internal/kube-apiserver-5/kube-apiserver-5/logs/previous.log <== ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-171-77.ec2.internal/kube-apiserver-cert-syncer-5/kube-apiserver-cert-syncer-5/logs/current.log <== 2019-04-24T03:11:38.978312453Z I0424 03:11:38.978119 1 observer_polling.go:106] Starting file observer ==> namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-1-171-77.ec2.internal/kube-apiserver-cert-syncer-5/kube-apiserver-cert-syncer-5/logs/previous.log <== Nodes are all marked Degraded: $ for NODE in cluster-scoped-resources/core/nodes/*; do yaml2json <"${NODE}" | jq -r '.metadata | .creationTimestamp + " " + .annotations["machine.openshift.io/machine"] + " " + .annotations["machineconfiguration.openshift.io/state"]'; done | sort 2019-04-24T03:02:56Z openshift-machine-api/east-1-brld6-master-0 Degraded 2019-04-24T03:03:18Z openshift-machine-api/east-1-brld6-master-1 Degraded 2019-04-24T03:03:18Z openshift-machine-api/east-1-brld6-master-2 Degraded 2019-04-24T03:09:22Z openshift-machine-api/east-1-brld6-worker-us-east-1a-77kwr Degraded and we only have one worker.
$ grep -B1 'Too Many Requests\|Degraded' namespaces/openshift-machine-config-operator/pods/machine-config-daemon-9r8k2/machine-config-daemon/machine-config-daemon/logs/current.log 2019-04-24T03:04:47.841890544Z I0424 03:04:47.431975 4713 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b83c73b6ff400b6afe9ff5e079e1195379872a0571c002ac0a01f1c37f47aa8b 2019-04-24T03:04:51.591795606Z time="2019-04-24T03:04:51Z" level=fatal msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)" -- 2019-04-24T03:04:56.591969019Z I0424 03:04:56.241619 4713 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b83c73b6ff400b6afe9ff5e079e1195379872a0571c002ac0a01f1c37f47aa8b 2019-04-24T03:04:58.841790389Z time="2019-04-24T03:04:58Z" level=fatal msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)" -- 2019-04-24T03:05:08.841855213Z I0424 03:05:08.531882 4713 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b83c73b6ff400b6afe9ff5e079e1195379872a0571c002ac0a01f1c37f47aa8b 2019-04-24T03:05:10.841823636Z time="2019-04-24T03:05:10Z" level=fatal msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)" -- 2019-04-24T03:05:30.841783442Z I0424 03:05:30.515240 4713 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b83c73b6ff400b6afe9ff5e079e1195379872a0571c002ac0a01f1c37f47aa8b 2019-04-24T03:05:33.091810207Z time="2019-04-24T03:05:32Z" level=fatal msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)" -- 2019-04-24T03:06:13.091793091Z I0424 03:06:12.791645 4713 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b83c73b6ff400b6afe9ff5e079e1195379872a0571c002ac0a01f1c37f47aa8b 2019-04-24T03:06:17.341824502Z time="2019-04-24T03:06:16Z" level=fatal msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)" -- 2019-04-24T03:07:37.34180683Z I0424 03:07:36.934406 4713 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b83c73b6ff400b6afe9ff5e079e1195379872a0571c002ac0a01f1c37f47aa8b 2019-04-24T03:07:43.404162266Z E0424 03:07:43.404073 4660 writer.go:119] Marking Degraded due to: failed to run pivot: failed to start pivot.service: exit status 1 Ah, there we go. So this is the recent Quay rate-limit ajustment vs. a maybe overly careful pivot. Assigning to the RHCOS folks about making pivot more robust in the face of registry 429s. Possibly by... I dunno, doesn't look like it's hot-looping or anything. But hopefully something ;).
> Ah, there we go. So this is the recent Quay rate-limit ajustment vs. a maybe overly careful pivot. Assigning to the RHCOS folks about making pivot more robust in the face of registry 429s. Possibly by... I dunno, doesn't look like it's hot-looping or anything. But hopefully something ;). Hm, we could indeed change the MCD+pivot to know to back off more gracefully in the face of 429 and in general treat it as a non-fatal error. Though we need to keep in mind the early pivot case where there's no MCD. Also related: https://github.com/openshift/machine-config-operator/issues/585 which I imagine would be a big win.
Is there any different in the endpoints or the rate limiting in east-1? Last night I had about 4 failures and this morning a coworker in another time zone had close to five. I ask because after his last failure in us-east-1 i deployed in ca-central without incident. I also had success in us-east-2 and us-west-2 without the error.
I filed an issue with `containers/image` to handle HTTP 429 a bit more gracefully - https://github.com/containers/image/issues/618 We'll still need to adjust `pivot` to react appropriately, too.
pivot#51 landed.
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21300941
https://github.com/openshift/installer/pull/1674
installer#1674 landed, and is live in CI: $ oc adm release info --commits --changes-from registry.svc.ci.openshift.org/ocp/release:4.1.0-0.ci-2019-04-25-183706 registry.svc.ci.openshift.org/ocp/release:4.1.0-0.ci-2019-04-25-184318 FROM TO Name: 4.1.0-0.ci-2019-04-25-183706 4.1.0-0.ci-2019-04-25-184318 Created: 1h 1h Version: 4.1.0-0.ci-2019-04-25-183706 4.1.0-0.ci-2019-04-25-184318 Upgrade From: No Images Changed: installer https://github.com/openshift/installer/compare/5f91f75072cdafa4806b38d9c6f65113a4b1b5c0...4097497cd768e2535b211e68bbe2a831f58513f0 installer-artifacts https://github.com/openshift/installer/compare/5f91f75072cdafa4806b38d9c6f65113a4b1b5c0...4097497cd768e2535b211e68bbe2a831f58513f0 Images Rebuilt: That release image still has an old machine-os-content though, so future pivots (e.g. on upgrades) may still hit this if/when Quay brings the limit back down: $ oc image info $(oc adm release info --image-for=machine-os-content registry.svc.ci.openshift.org/ocp/release:4.1.0-0.ci-2019-04-25-184318) Name: registry.svc.ci.openshift.org/ocp/4.1-2019-04-25-184318@sha256:a0cd0eb5ec7f676ea7b64d4d2e48abcad4a7649861d54532fce3fdca9ab19a44 Media Type: application/vnd.docker.distribution.manifest.v2+json Created: 3d ago Image Size: 623.1MB OS: linux Arch: amd64 Entrypoint: /noentry Labels: com.coreos.ostree-commit=125584ea400554bf0a9e743964c3a4b87b0ae8100ba2f4d7c20a1cbadbbb8df5 version=410.8.20190422.0 I'll see if I can figure out how the machine-os-content bump is getting along.
$ oc image info registry.svc.ci.openshift.org/rhcos/machine-os-content:latest Name: registry.svc.ci.openshift.org/rhcos/machine-os-content:latest Digest: sha256:0a799822de435fcff54ea64868607eb8b61861a57f16079f8d95f9bb87068c2c Media Type: application/vnd.docker.distribution.manifest.v2+json Created: 8h ago Image Size: 623.2MB OS: linux Arch: amd64 Entrypoint: /noentry Labels: com.coreos.ostree-commit=0f7d2687f9d9dfcbcd64bc030ced149dd196b09b5e3633a9186915b8c0764cd7 version=410.8.20190425.1 Green promotion from [1,2], but that was before the above image. Not sure why we haven't had another promotion run since. Checking a random recent release-promotion job: $ oc image info $(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.1/221/artifacts/release-images-latest/release-images-latest | jq -r '.spec.tags[] | select(.name == "machine-os-content").from.name')Name: registry.svc.ci.openshift.org/ocp/4.1-2019-04-25-194305@sha256:a0cd0eb5ec7f676ea7b64d4d2e48abcad4a7649861d54532fce3fdca9ab19a44 Media Type: application/vnd.docker.distribution.manifest.v2+json Created: 3d ago Image Size: 623.1MB OS: linux Arch: amd64 Entrypoint: /noentry Labels: com.coreos.ostree-commit=125584ea400554bf0a9e743964c3a4b87b0ae8100ba2f4d7c20a1cbadbbb8df5 version=410.8.20190422.0 So we need a new, green promotion. [1]: https://prow.svc.ci.openshift.org/?job=release-promote-openshift-machine-os-content-e2e-aws-4.1 [2]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.1/46
Green promotion [1], and recent CI runs have the new machine-os-content: $ oc image info $(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.1/267/artifacts/release-images-latest/release-images-latest | jq -r '.spec.tags[] | select(.name == "machine-os-content").from.name') Name: registry.svc.ci.openshift.org/ocp/4.1-2019-04-26-142222@sha256:0a799822de435fcff54ea64868607eb8b61861a57f16079f8d95f9bb87068c2c Media Type: application/vnd.docker.distribution.manifest.v2+json Created: 1d ago Image Size: 623.2MB OS: linux Arch: amd64 Entrypoint: /noentry Labels: com.coreos.ostree-commit=0f7d2687f9d9dfcbcd64bc030ced149dd196b09b5e3633a9186915b8c0764cd7 version=410.8.20190425.1 [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-promote-openshift-machine-os-content-e2e-aws-4.1/49
fwiw we saw one of those failures recently: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.1/472
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1446