Bug 2072072 - Error: setxattr /etc/systemd/system/basic.target.wants/coreos-update-ca-trust.service: read-only file system
Summary: Error: setxattr /etc/systemd/system/basic.target.wants/coreos-update-ca-trust...
Keywords:
Status: CLOSED DUPLICATE of bug 2074613
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: RHCOS Bug Triage
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 2074090
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-05 14:24 UTC by Martin André
Modified: 2022-05-02 16:06 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-02 16:06:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather (1.86 MB, application/gzip)
2022-04-05 14:24 UTC, Martin André
no flags Details

Description Martin André 2022-04-05 14:24:16 UTC
Created attachment 1870877 [details]
must-gather

OCP Version at Install Time: 4.11.0-0.nightly-2022-04-05-054839
RHCOS Version at Install Time: 411.85.202203181601-0
OCP Version after Upgrade (if applicable):
RHCOS Version after Upgrade (if applicable): 411.86.202204031335-0
Platform: OpenStack
Architecture: x86_64


What are you trying to do? What is your use case?

Deploy OCP. Nothing fancy.

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-parallel/1511030372081602560

What happened? What went wrong or what did you expect?

After RHCOS upgrade, neither kubelet or crio start correctly. Looking at journalctl, we can see errors about read-only file system, such as:

Apr 05 14:21:39 mandre-9mgl2-master-2 systemd[1]: var-lib-containers-storage-overlay-1da4fa1d5716ba68633041af0c61e05324a7a37defb76c5309fac17392cec785-merged.mount: Succeeded.
Apr 05 14:21:39 mandre-9mgl2-master-2 bash[21327]: Error: setxattr /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service: read-only file system

What are the steps to reproduce your issue? Please try to reduce these steps to something that can be reproduced with a single RHCOS node.

Deploy OCP using release 4.11.0-0.nightly-2022-04-05-054839.

Comment 3 Martin André 2022-04-05 14:55:19 UTC
I failed to mention that sssd is crashing in a loop, which could be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2072050.

Comment 4 Luca BRUNO 2022-04-05 15:40:04 UTC
The error that gets logged seems correct: /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service is a symlink to /usr/lib/systemd/system/coreos-ignition-firstboot-complete.service, which lives on /usr, which is a read-only filesystem (part of the OS deployment/commit).

In the logs there are in fact other similar setxattr errors (e.g. on /etc/systemd/system/ctrl-alt-del.target which links to /usr/lib/systemd/system/reboot.target), so I don't think this a generic RHCOS issue as I don't expect services to be tweaking that service unit.
Whatever is the rogue component which is trying to add xattrs to files in /etc should possibly reconsider against doing that (or at least be ready to cope with symlinks and RO files).

From the logs I couldn't easily distinguish which service is involved in this, possibly some bash-in-podman script?
This may need some further evidence gathering on a running node to track down the specific service, and then checking the logic and the observed behavior with the relevant team.

Comment 5 Martin André 2022-04-05 15:56:20 UTC
Decreasing the severity as the blocking issue seems to be https://bugzilla.redhat.com/show_bug.cgi?id=2072050 where SSSD issue causes the boot to hang.

Comment 6 Javi Polo 2022-04-08 19:25:15 UTC
This issue is blocking openshift installation, so I'd say severity should be high enough

I tried a new release of rhcos that does fix the SSSD problem and even so I'm still unable to start a new OCP cluster

I narrowed down the issue to a script that runs a container with podman in a loop until it succeeds, but due to the setxattr error it will never succeed:

root        2158  0.0  0.0  23056  3028 ?        Ss   18:51   0:00 /bin/bash -c     until    /usr/bin/podman run --rm    --authfile /var/lib/kubelet/config.json    --net=host    --volume /etc/systemd/system:/etc/systemd/system:z    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3090aca5bc36bb0baeb6ae84bffbd07a9b7549ef5a70cd466bd4d06bd72b2b3    node-ip    set --retry-on-failure    192.168.122.119;    do    sleep 5;    done

Seems that the problem arises whenever we add the /etc/systemd volume with :z, with any container:

[root@master0 ~]# podman run --volume /etc/systemd/system:/etc/systemd/system:z ubi8/ubi-minimal
Error: setxattr /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service: read-only file system

And digging further I arrive to this issue with podman, that seems a perfect match:
https://github.com/containers/podman/issues/13727

Comment 7 Steve Milner 2022-04-11 13:05:55 UTC
Should there be a bug opened with node/container runtimes or does one exist already?

Comment 8 Javi Polo 2022-04-11 14:10:20 UTC
Just opened a bug on RHEl8.6/podman:
https://bugzilla.redhat.com/show_bug.cgi?id=2074090

Comment 9 Micah Abbott 2022-05-02 13:54:12 UTC
Martin/Javi - is this BZ still an issue?  

I think we root caused it to a bad template in the MCO, fixed here - https://github.com/openshift/machine-config-operator/pull/3079

Comment 10 Martin André 2022-05-02 15:31:10 UTC
We'll know if https://github.com/openshift/machine-config-operator/pull/3079 fixed the issue once the RHEL 8.6 rebase of RHCOS (and podman 4) lands in OCP.

Comment 11 Javi Polo 2022-05-02 15:31:59 UTC
Not an issue AFAIK

I tried on latest RHEL86 based coreos and it works perfectly, also podman version is bumped to the one with the setxattr fix, so even if MCO had the bad template, it will work

Comment 12 Martin André 2022-05-02 16:06:37 UTC
Let's close as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2074613 then.

*** This bug has been marked as a duplicate of bug 2074613 ***


Note You need to log in before you can comment on or make changes to this bug.