Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2072072

Summary: Error: setxattr /etc/systemd/system/basic.target.wants/coreos-update-ca-trust.service: read-only file system
Product: OpenShift Container Platform Reporter: Martin André <m.andre>
Component: RHCOSAssignee: RHCOS Bug Triage <rhcos-triage>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.11CC: dornelas, jligon, jpolo, jpoulin, jschinta, lucab, lwan, miabbott, mrussell, nstielau, smilner
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-02 16:06:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2074090    
Bug Blocks:    
Attachments:
Description Flags
must-gather none

Description Martin André 2022-04-05 14:24:16 UTC
Created attachment 1870877 [details]
must-gather

OCP Version at Install Time: 4.11.0-0.nightly-2022-04-05-054839
RHCOS Version at Install Time: 411.85.202203181601-0
OCP Version after Upgrade (if applicable):
RHCOS Version after Upgrade (if applicable): 411.86.202204031335-0
Platform: OpenStack
Architecture: x86_64


What are you trying to do? What is your use case?

Deploy OCP. Nothing fancy.

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-parallel/1511030372081602560

What happened? What went wrong or what did you expect?

After RHCOS upgrade, neither kubelet or crio start correctly. Looking at journalctl, we can see errors about read-only file system, such as:

Apr 05 14:21:39 mandre-9mgl2-master-2 systemd[1]: var-lib-containers-storage-overlay-1da4fa1d5716ba68633041af0c61e05324a7a37defb76c5309fac17392cec785-merged.mount: Succeeded.
Apr 05 14:21:39 mandre-9mgl2-master-2 bash[21327]: Error: setxattr /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service: read-only file system

What are the steps to reproduce your issue? Please try to reduce these steps to something that can be reproduced with a single RHCOS node.

Deploy OCP using release 4.11.0-0.nightly-2022-04-05-054839.

Comment 3 Martin André 2022-04-05 14:55:19 UTC
I failed to mention that sssd is crashing in a loop, which could be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2072050.

Comment 4 Luca BRUNO 2022-04-05 15:40:04 UTC
The error that gets logged seems correct: /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service is a symlink to /usr/lib/systemd/system/coreos-ignition-firstboot-complete.service, which lives on /usr, which is a read-only filesystem (part of the OS deployment/commit).

In the logs there are in fact other similar setxattr errors (e.g. on /etc/systemd/system/ctrl-alt-del.target which links to /usr/lib/systemd/system/reboot.target), so I don't think this a generic RHCOS issue as I don't expect services to be tweaking that service unit.
Whatever is the rogue component which is trying to add xattrs to files in /etc should possibly reconsider against doing that (or at least be ready to cope with symlinks and RO files).

From the logs I couldn't easily distinguish which service is involved in this, possibly some bash-in-podman script?
This may need some further evidence gathering on a running node to track down the specific service, and then checking the logic and the observed behavior with the relevant team.

Comment 5 Martin André 2022-04-05 15:56:20 UTC
Decreasing the severity as the blocking issue seems to be https://bugzilla.redhat.com/show_bug.cgi?id=2072050 where SSSD issue causes the boot to hang.

Comment 6 Javi Polo 2022-04-08 19:25:15 UTC
This issue is blocking openshift installation, so I'd say severity should be high enough

I tried a new release of rhcos that does fix the SSSD problem and even so I'm still unable to start a new OCP cluster

I narrowed down the issue to a script that runs a container with podman in a loop until it succeeds, but due to the setxattr error it will never succeed:

root        2158  0.0  0.0  23056  3028 ?        Ss   18:51   0:00 /bin/bash -c     until    /usr/bin/podman run --rm    --authfile /var/lib/kubelet/config.json    --net=host    --volume /etc/systemd/system:/etc/systemd/system:z    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3090aca5bc36bb0baeb6ae84bffbd07a9b7549ef5a70cd466bd4d06bd72b2b3    node-ip    set --retry-on-failure    192.168.122.119;    do    sleep 5;    done

Seems that the problem arises whenever we add the /etc/systemd volume with :z, with any container:

[root@master0 ~]# podman run --volume /etc/systemd/system:/etc/systemd/system:z ubi8/ubi-minimal
Error: setxattr /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service: read-only file system

And digging further I arrive to this issue with podman, that seems a perfect match:
https://github.com/containers/podman/issues/13727

Comment 7 Steve Milner 2022-04-11 13:05:55 UTC
Should there be a bug opened with node/container runtimes or does one exist already?

Comment 8 Javi Polo 2022-04-11 14:10:20 UTC
Just opened a bug on RHEl8.6/podman:
https://bugzilla.redhat.com/show_bug.cgi?id=2074090

Comment 9 Micah Abbott 2022-05-02 13:54:12 UTC
Martin/Javi - is this BZ still an issue?  

I think we root caused it to a bad template in the MCO, fixed here - https://github.com/openshift/machine-config-operator/pull/3079

Comment 10 Martin André 2022-05-02 15:31:10 UTC
We'll know if https://github.com/openshift/machine-config-operator/pull/3079 fixed the issue once the RHEL 8.6 rebase of RHCOS (and podman 4) lands in OCP.

Comment 11 Javi Polo 2022-05-02 15:31:59 UTC
Not an issue AFAIK

I tried on latest RHEL86 based coreos and it works perfectly, also podman version is bumped to the one with the setxattr fix, so even if MCO had the bad template, it will work

Comment 12 Martin André 2022-05-02 16:06:37 UTC
Let's close as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2074613 then.

*** This bug has been marked as a duplicate of bug 2074613 ***