Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1781708

Summary: Pull CI: Cluster operator machine-config Degraded is True with RequiredPoolsFailed: machineconfig.machineconfiguration.openshift.io rendered-master-* not found
Product: OpenShift Container Platform Reporter: Jan Chaloupka <jchaloup>
Component: RHCOSAssignee: Colin Walters <walters>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.3.0CC: bbreard, dustymabe, gmontero, imcleod, jligon, jnovy, mfojtik, nstielau, sbatsche, sdodson, spadgett, vrutkovs, walters, wking
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1782153 (view as bug list) Environment:
Last Closed: 2020-05-04 11:19:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1782149, 1782152, 1782153    

Description Jan Chaloupka 2019-12-10 12:54:57 UTC
Description of problem:
Bunch of pull-ci-* jobs are failing with:

level=error msg="Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Failed to resync 0.0.1-2019-12-10-092928 because: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with \"3 nodes are reporting degraded status on sync\": \"Node ip-10-0-129-171.ec2.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-9ffdae4ce3763dbc967f7e9e041d4de1\\\\\\\" not found\\\", Node ip-10-0-145-21.ec2.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-9ffdae4ce3763dbc967f7e9e041d4de1\\\\\\\" not found\\\", Node ip-10-0-140-8.ec2.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-9ffdae4ce3763dbc967f7e9e041d4de1\\\\\\\" not found\\\"\", retrying"

Additionally, mao daemon (logs at https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/2765/pull-ci-openshift-installer-master-e2e-aws/8952/artifacts/e2e-aws/pods/openshift-machine-config-operator_machine-config-daemon-6lqbw_machine-config-daemon.log) complains about (storage.conf file is quite long to share it in its completeness):
```
E1210 09:19:15.517411   13932 daemon.go:1350] content mismatch for file /etc/containers/storage.conf: # 

A: This file is is the configuration file for all tools
# that use the containers/storage library.
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]
...
```

Error message repeated again every minute until 09:50:16.231021

Known jobs:
- https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-api-provider-aws/280/pull-ci-openshift-cluster-api-provider-aws-master-e2e-aws-operator/775
- https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2765/pull-ci-openshift-installer-master-e2e-aws/8952
- https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_console/3559/pull-ci-openshift-console-master-e2e-gcp-console/5884
- https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_telemeter/273/pull-ci-openshift-telemeter-master-e2e-aws/517

You can find the same error message in the remaining five daemon logs.

Version-Release number of selected component (if applicable):
Master branch of installer: registry.svc.ci.openshift.org/ci-op-0l9nfi34/release@sha256:1dc8db6e093e8484d0a75ad69be3d617b0d7502cdebdd892d0119cac43150cc9

How reproducible:
Always

Steps to Reproduce:
- https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2765/pull-ci-openshift-installer-master-e2e-aws/8952

Actual results:
- MCO is Degraded

Expected results:
- MCO is not Degraded and the cluster is installed successfully 

Additional info:

Comment 1 Vadim Rutkovsky 2019-12-10 12:59:00 UTC
Older podman was pulled in due to wrong tagging of slirp4netns

Comment 2 Michal Fojtik 2019-12-10 14:45:03 UTC
Moving to urgent as this is affecting a lot of CI jobs.

Comment 3 Jindrich Novy 2019-12-10 15:03:57 UTC
slirp4netns is now whitelisted by RCM and built to rhaos-4.3-rhel-8-candidate:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=25264758

Comment 4 Colin Walters 2019-12-10 20:49:53 UTC
https://github.com/openshift/machine-config-operator/pull/1320 is a probable fix.

Comment 5 Colin Walters 2019-12-12 15:08:06 UTC
CI is all good now!

Comment 7 errata-xmlrpc 2020-05-04 11:19:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581