1959327 – Degraded nodes on upgrade - Cleaning bootversions: Read-only file system

Bug 1959327 - Degraded nodes on upgrade - Cleaning bootversions: Read-only file system

Summary: Degraded nodes on upgrade - Cleaning bootversions: Read-only file system

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Luca BRUNO
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	1969452
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-11 09:47 UTC by Ravi Trivedi
Modified:	2021-07-27 23:08 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	This fixes an upgrade issue where some machines would fail to finalize an update with the "Cleaning bootversions: Read-only file system".
Clone Of:
Environment:
Last Closed:	2021-07-27 23:07:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 23:08:11 UTC

Internal Links: 1967711

Description Ravi Trivedi 2021-05-11 09:47:36 UTC

Details:
---

OCP Version at Install Time: 4.8.0-fc.2 (before upgrade)
OCP Version after Upgrade (if applicable): 4.8.0-fc.3 
Platform: GCP
Architecture: x86_64


What are you trying to do? What is your use case?
- Upgrade OSD cluster from 4.8.0-fc.2 to 4.8.0-fc.3 

What happened? What went wrong or what did you expect?
- Two nodes stuck in upgrade and are in degraded state with following error in MCD logs:
```
2021-05-11T07:47:55.367778447Z E0511 07:47:55.367722 3511319 writer.go:135] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0d5a292e937fd8d16321b0ba43e252629a0914b1b38b8a7dd13ceade55bc7e52 : error running rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-806218783/srv/repo:7ea0c819502408ba4f24bba476cd32e7c73f2e5bedd250c72942e53147f54ca8 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0d5a292e937fd8d16321b0ba43e252629a0914b1b38b8a7dd13ceade55bc7e52 --custom-origin-description Managed by machine-config-operator: 0 metadata, 0 content objects imported; 0 bytes content written
2021-05-11T07:47:55.367778447Z Staging deployment...done
2021-05-11T07:47:55.367778447Z error: Cleaning bootversions: Removing boot/loader.1: unlinkat(ostree-2-rhcos.conf): Read-only file system
2021-05-11T07:47:55.367778447Z : exit status 1
```

Checked if the issue is reproduced or not but couldn't come across this in OSDE2E and other upgrades yet. Will share more findings soon.

Comment 5 Luca BRUNO 2021-05-11 14:26:24 UTC

After looking into logs I am not positive that it is actually the same root issue as https://github.com/coreos/fedora-coreos-tracker/issues/819 (unless /boot is 100% full here too).

From the journal of `osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal`, there is something gone crazy on the node which results in a storm of systemd daemon-reloads:

```
grep -F 'systemd[1]: Reloading' osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal.log | wc -l
1460
```

They started to happen just before the first "Read-only file system" failure, and repeat pretty much every second.
I'm not sure what component is doing that, and whether it may be related to the upgrade issue.

Comment 6 Rick Rackow 2021-05-12 11:26:15 UTC

The problem can be mitigated by running the following command against a node that is stuck.

```
$ oc debug -n default node/osd-v4-fmvr6-w-b-fkg8q.us-west1-b.c.o-bcd57f4f.internal -- nsenter -a -t 1 /bin/mount -o remount,rw /boot
```

Comment 8 Luca BRUNO 2021-05-17 07:46:59 UTC

We have been passively trying to observe/reproduce this "Read-only file system" upgrade issue outside of that specific cursed GCP cluster, but without any luck so far.

This is something that would affect 4.8 -> 4.8 upgrades only for the moment, due to the recent read-only /boot change in RHCOS 4.8.

However, other clusters have gone through such upgrades without issues, so we do suspect something specific to this cluster (possibly an addon component or some custom workload).

Speaking with Rick, we agreed to close this as a one-off flake, and to have the process/procedures in place to get direct debugging access for developers if this shows up again in the cluster fleet.
In that case, some deeper strace-ing of rpm-ostreed.service and some active poking at the system are needed. 

For casual readers ending up here: please ping this ticket if you observe the exact same symptoms. If it can be reproduced in a non-prod environment, I'd be glad to have a direct look at the cluster. In case of emergency, the "oc debug" oneliner above can be used to gracefully unstuck any blocked upgrades.

Comment 11 Christoph Blecker 2021-06-01 18:57:13 UTC

Another data point:
All three clusters that this bug has been seen on, were originally installed under OpenShift 4.3 (one 4.3.0, two others on 4.3.18).

Comment 17 Colin Walters 2021-06-03 17:03:39 UTC

>  From the journal of `osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal`, there is something gone crazy on the node which results in a storm of systemd daemon-reloads:

```
grep -F 'systemd[1]: Reloading' osd-v4-fmvr6-5-a-88xhw.us-west1-a.c.o-bcd57f4f.internal.log | wc -l
1460
```

Hmm right, potentially there's a race here where due to systemd reload, `/boot` isn't mounted at all, then we check if it's read-only but we just see an empty writable directory, so then the logic proceeds to assume it's writable.
Whereas if instead we *always* create a mount namespace then we'll have our own snapshot not affected by pid 1 mountns changes.

That said,

I tried reproducing this by doing:

```
$ while sleep 0.1; do systemctl daemon-reload; done&
$while systemctl restart rpm-ostreed && rpm-ostree kargs --append=foo-bar && (systemctl start ostree-finalize-staged && systemctl stop ostree-finalize-staged || true) && rpm-ostree cleanup -p; do :; done
```

So far I haven't hit the issue.  

I would also be surprised if reloading actually unmounted though, and I'm not seeing that.

Comment 18 Colin Walters 2021-06-03 17:05:04 UTC

Also, I think we need to get to the bottom of what's causing systemd to reload so frequently.  That's going to cause other issues too.

In my testing above for example, `systemctl start` sometimes errors out I think because the unit state changed transiently as part of the reload.

Comment 31 Colin Walters 2021-06-21 16:20:22 UTC

Anyone affected by this, you can work around it with:

$ oc debug node/$nodename

Then:

nsenter -m -t 1
mount -o remount,rw /boot

Comment 37 Michael Nguyen 2021-06-24 19:16:50 UTC

Sanity verification on 4.8.0-fc.9 based on https://bugzilla.redhat.com/show_bug.cgi?id=1959327#c20.  The fix requires ostree-2020.7-5.el8_4.  Upgrading to the fixed version will not fix the issue.  The issue should be fix when updating from a version with the fixed ostree version.

$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-gp7hf32-f76d1-59wcr-master-0         Ready    master   20m   v1.21.0-rc.0+a5ec692
ci-ln-gp7hf32-f76d1-59wcr-master-1         Ready    master   20m   v1.21.0-rc.0+a5ec692
ci-ln-gp7hf32-f76d1-59wcr-master-2         Ready    master   20m   v1.21.0-rc.0+a5ec692
ci-ln-gp7hf32-f76d1-59wcr-worker-a-phgg2   Ready    worker   13m   v1.21.0-rc.0+a5ec692
ci-ln-gp7hf32-f76d1-59wcr-worker-b-xp4qn   Ready    worker   13m   v1.21.0-rc.0+a5ec692
ci-ln-gp7hf32-f76d1-59wcr-worker-c-pt2bz   Ready    worker   13m   v1.21.0-rc.0+a5ec692
$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-fc.9   True        False         4m20s   Cluster version is 4.8.0-fc.9
$ oc debug node/ci-ln-gp7hf32-f76d1-59wcr-worker-a-phgg2
Starting pod/ci-ln-gp7hf32-f76d1-59wcr-worker-a-phgg2-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# ls
bin  boot  dev	etc  home  lib	lib64  media  mnt  opt	ostree	proc  root  run  sbin  srv  sys  sysroot  tmp  usr  var
sh-4.4# rpm -q ostree/
package ostree/ is not installed
sh-4.4# rpm -q ostree 
ostree-2020.7-5.el8_4.x86_64
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 39 errata-xmlrpc 2021-07-27 23:07:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.