Details: --- OCP Version at Install Time: 4.8.0-fc.2 (before upgrade) OCP Version after Upgrade (if applicable): 4.8.0-fc.3 Platform: GCP Architecture: x86_64 What are you trying to do? What is your use case? - Upgrade OSD cluster from 4.8.0-fc.2 to 4.8.0-fc.3 What happened? What went wrong or what did you expect? - Two nodes stuck in upgrade and are in degraded state with following error in MCD logs: ``` 2021-05-11T07:47:55.367778447Z E0511 07:47:55.367722 3511319 writer.go:135] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0d5a292e937fd8d16321b0ba43e252629a0914b1b38b8a7dd13ceade55bc7e52 : error running rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-806218783/srv/repo:7ea0c819502408ba4f24bba476cd32e7c73f2e5bedd250c72942e53147f54ca8 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0d5a292e937fd8d16321b0ba43e252629a0914b1b38b8a7dd13ceade55bc7e52 --custom-origin-description Managed by machine-config-operator: 0 metadata, 0 content objects imported; 0 bytes content written 2021-05-11T07:47:55.367778447Z Staging deployment...done 2021-05-11T07:47:55.367778447Z error: Cleaning bootversions: Removing boot/loader.1: unlinkat(ostree-2-rhcos.conf): Read-only file system 2021-05-11T07:47:55.367778447Z : exit status 1 ``` Checked if the issue is reproduced or not but couldn't come across this in OSDE2E and other upgrades yet. Will share more findings soon.
After looking into logs I am not positive that it is actually the same root issue as https://github.com/coreos/fedora-coreos-tracker/issues/819 (unless /boot is 100% full here too). From the journal of `osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal`, there is something gone crazy on the node which results in a storm of systemd daemon-reloads: ``` grep -F 'systemd[1]: Reloading' osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal.log | wc -l 1460 ``` They started to happen just before the first "Read-only file system" failure, and repeat pretty much every second. I'm not sure what component is doing that, and whether it may be related to the upgrade issue.
The problem can be mitigated by running the following command against a node that is stuck. ``` $ oc debug -n default node/osd-v4-fmvr6-w-b-fkg8q.us-west1-b.c.o-bcd57f4f.internal -- nsenter -a -t 1 /bin/mount -o remount,rw /boot ```
We have been passively trying to observe/reproduce this "Read-only file system" upgrade issue outside of that specific cursed GCP cluster, but without any luck so far. This is something that would affect 4.8 -> 4.8 upgrades only for the moment, due to the recent read-only /boot change in RHCOS 4.8. However, other clusters have gone through such upgrades without issues, so we do suspect something specific to this cluster (possibly an addon component or some custom workload). Speaking with Rick, we agreed to close this as a one-off flake, and to have the process/procedures in place to get direct debugging access for developers if this shows up again in the cluster fleet. In that case, some deeper strace-ing of rpm-ostreed.service and some active poking at the system are needed. For casual readers ending up here: please ping this ticket if you observe the exact same symptoms. If it can be reproduced in a non-prod environment, I'd be glad to have a direct look at the cluster. In case of emergency, the "oc debug" oneliner above can be used to gracefully unstuck any blocked upgrades.
Another data point: All three clusters that this bug has been seen on, were originally installed under OpenShift 4.3 (one 4.3.0, two others on 4.3.18).
> From the journal of `osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal`, there is something gone crazy on the node which results in a storm of systemd daemon-reloads: ``` grep -F 'systemd[1]: Reloading' osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal.log | wc -l 1460 ``` Hmm right, potentially there's a race here where due to systemd reload, `/boot` isn't mounted at all, then we check if it's read-only but we just see an empty writable directory, so then the logic proceeds to assume it's writable. Whereas if instead we *always* create a mount namespace then we'll have our own snapshot not affected by pid 1 mountns changes. That said, I tried reproducing this by doing: ``` $ while sleep 0.1; do systemctl daemon-reload; done& $while systemctl restart rpm-ostreed && rpm-ostree kargs --append=foo-bar && (systemctl start ostree-finalize-staged && systemctl stop ostree-finalize-staged || true) && rpm-ostree cleanup -p; do :; done ``` So far I haven't hit the issue. I would also be surprised if reloading actually unmounted though, and I'm not seeing that.
Also, I think we need to get to the bottom of what's causing systemd to reload so frequently. That's going to cause other issues too. In my testing above for example, `systemctl start` sometimes errors out I think because the unit state changed transiently as part of the reload.
Anyone affected by this, you can work around it with: $ oc debug node/$nodename Then: nsenter -m -t 1 mount -o remount,rw /boot
Sanity verification on 4.8.0-fc.9 based on https://bugzilla.redhat.com/show_bug.cgi?id=1959327#c20. The fix requires ostree-2020.7-5.el8_4. Upgrading to the fixed version will not fix the issue. The issue should be fix when updating from a version with the fixed ostree version. $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-gp7hf32-f76d1-59wcr-master-0 Ready master 20m v1.21.0-rc.0+a5ec692 ci-ln-gp7hf32-f76d1-59wcr-master-1 Ready master 20m v1.21.0-rc.0+a5ec692 ci-ln-gp7hf32-f76d1-59wcr-master-2 Ready master 20m v1.21.0-rc.0+a5ec692 ci-ln-gp7hf32-f76d1-59wcr-worker-a-phgg2 Ready worker 13m v1.21.0-rc.0+a5ec692 ci-ln-gp7hf32-f76d1-59wcr-worker-b-xp4qn Ready worker 13m v1.21.0-rc.0+a5ec692 ci-ln-gp7hf32-f76d1-59wcr-worker-c-pt2bz Ready worker 13m v1.21.0-rc.0+a5ec692 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-fc.9 True False 4m20s Cluster version is 4.8.0-fc.9 $ oc debug node/ci-ln-gp7hf32-f76d1-59wcr-worker-a-phgg2 Starting pod/ci-ln-gp7hf32-f76d1-59wcr-worker-a-phgg2-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# ls bin boot dev etc home lib lib64 media mnt opt ostree proc root run sbin srv sys sysroot tmp usr var sh-4.4# rpm -q ostree/ package ostree/ is not installed sh-4.4# rpm -q ostree ostree-2020.7-5.el8_4.x86_64 sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438