Bug 1809007 - Adding blockedRegistries to image.config.openshift.io leads to an endless reboot loop in workers and masters
Summary: Adding blockedRegistries to image.config.openshift.io leads to an endless reb...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: Urvashi Mohnani
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1811144 1838206 (view as bug list)
Depends On:
Blocks: 1822748
TreeView+ depends on / blocked
 
Reported: 2020-03-02 09:27 UTC by Borja Aranda
Modified: 2020-11-12 09:16 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1822748 1822750 (view as bug list)
Environment:
Last Closed: 2020-08-04 18:02:55 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1637 0 None closed Bug 1809007: [ctrcfg controller] Use a struct array instead of map when creating new ignitions 2020-11-30 11:06:51 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:03:00 UTC

Description Borja Aranda 2020-03-02 09:27:48 UTC
Description of problem:
Following the steps to block registries according to documentation [1] leads to an endless loop of restarts in worker and master nodes.

- Edit image.config.openshift.io/cluster

- Add the blockedRegistries in the spec according to docs:

```
spec:
  registrySources:
    blockedRegistries:
    - chaosmonkey.io
```

- Save changes.

- Two new machine configs are created reflecting those changes and including registries.conf and policy.json:

```
99-master-a8ec5eb8-96ef-4be5-94f9-fa6bd03a886f-registries   25bb6aeb58135c38a667e849edf5244871be4992   2.2.0             3d23h
99-worker-2488a31e-36f9-4355-a9b9-e616868a0e6a-registries   25bb6aeb58135c38a667e849edf5244871be4992   2.2.0             3d23h
```

- This makes the MCO create two new rendered-master and two rendered-worker machineconfigs with duplicated entries for the files registries.conf and policy.json

- Probably related to the fact there's existing mkachineconfigs trying to configure the exact files in a fresh cluster:
01-worker-container-runtime
01-master-container-runtime

- Nodes and masters start crashlooping trying to apply the latest machineconfig. In my env worker and nodes has been restarting randomly for five days now (55-70 restarts in every worker/master):

```
I0302 06:29:01.692915    3492 daemon.go:785] journalctl --list-boots:
-57 8b38edd622ea4c2c8694369f4895ed99 Thu 2020-02-27 09:13:11 UTC—Thu 2020-02-27 09:17:29 UTC
-56 d36095415f78493e9243d51d00d54770 Thu 2020-02-27 09:17:58 UTC—Thu 2020-02-27 10:38:47 UTC
  [...]
 -2 a89a72c268aa498c95dbd2835f35812f Mon 2020-03-02 03:09:48 UTC—Mon 2020-03-02 06:24:15 UTC
 -1 de7ce9b440304ce3801f81ac78c21bf6 Mon 2020-03-02 06:24:43 UTC—Mon 2020-03-02 06:28:00 UTC
  0 7984504cdf71487aac7ed7c79c09d487 Mon 2020-03-02 06:28:30 UTC—Mon 2020-03-02 06:29:01 UTC
I0302 06:29:01.692954    3492 daemon.go:528] Starting MachineConfigDaemon
I0302 06:29:01.693080    3492 daemon.go:535] Enabling Kubelet Healthz Monitor
I0302 06:29:58.052864    3492 daemon.go:731] Current config: rendered-master-4a6777d52de7856260377bfb2b41d684
I0302 06:29:58.052891    3492 daemon.go:732] Desired config: rendered-master-06fca62648469333e5b3406f01899301
I0302 06:29:58.062938    3492 update.go:1050] Disk currentConfig rendered-master-06fca62648469333e5b3406f01899301 overrides node annotation rendered-master-4a6777d52de7856260377bfb2b41d684
I0302 06:29:58.065942    3492 daemon.go:955] Validating against pending config rendered-master-06fca62648469333e5b3406f01899301
I0302 06:29:58.077711    3492 daemon.go:971] Validated on-disk state
```

[1] https://docs.openshift.com/container-platform/4.3/openshift_images/image-configuration.html#images-configuration-file_image-configuration

Version-Release number of selected component (if applicable):
4.3

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster
2. Configure a blocked registry according to our current docs [1]

Comment 21 Urvashi Mohnani 2020-05-20 18:15:18 UTC
*** Bug 1811144 has been marked as a duplicate of this bug. ***

Comment 22 Urvashi Mohnani 2020-05-22 12:40:02 UTC
*** Bug 1838206 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2020-08-04 18:02:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.