Bug 2072072
| Summary: | Error: setxattr /etc/systemd/system/basic.target.wants/coreos-update-ca-trust.service: read-only file system | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Martin André <m.andre> | ||||
| Component: | RHCOS | Assignee: | RHCOS Bug Triage <rhcos-triage> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Michael Nguyen <mnguyen> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.11 | CC: | dornelas, jligon, jpolo, jpoulin, jschinta, lucab, lwan, miabbott, mrussell, nstielau, smilner | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-05-02 16:06:37 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 2074090 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Martin André
2022-04-05 14:24:16 UTC
I failed to mention that sssd is crashing in a loop, which could be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=2072050. The error that gets logged seems correct: /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service is a symlink to /usr/lib/systemd/system/coreos-ignition-firstboot-complete.service, which lives on /usr, which is a read-only filesystem (part of the OS deployment/commit). In the logs there are in fact other similar setxattr errors (e.g. on /etc/systemd/system/ctrl-alt-del.target which links to /usr/lib/systemd/system/reboot.target), so I don't think this a generic RHCOS issue as I don't expect services to be tweaking that service unit. Whatever is the rogue component which is trying to add xattrs to files in /etc should possibly reconsider against doing that (or at least be ready to cope with symlinks and RO files). From the logs I couldn't easily distinguish which service is involved in this, possibly some bash-in-podman script? This may need some further evidence gathering on a running node to track down the specific service, and then checking the logic and the observed behavior with the relevant team. Decreasing the severity as the blocking issue seems to be https://bugzilla.redhat.com/show_bug.cgi?id=2072050 where SSSD issue causes the boot to hang. This issue is blocking openshift installation, so I'd say severity should be high enough I tried a new release of rhcos that does fix the SSSD problem and even so I'm still unable to start a new OCP cluster I narrowed down the issue to a script that runs a container with podman in a loop until it succeeds, but due to the setxattr error it will never succeed: root 2158 0.0 0.0 23056 3028 ? Ss 18:51 0:00 /bin/bash -c until /usr/bin/podman run --rm --authfile /var/lib/kubelet/config.json --net=host --volume /etc/systemd/system:/etc/systemd/system:z quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3090aca5bc36bb0baeb6ae84bffbd07a9b7549ef5a70cd466bd4d06bd72b2b3 node-ip set --retry-on-failure 192.168.122.119; do sleep 5; done Seems that the problem arises whenever we add the /etc/systemd volume with :z, with any container: [root@master0 ~]# podman run --volume /etc/systemd/system:/etc/systemd/system:z ubi8/ubi-minimal Error: setxattr /etc/systemd/system/basic.target.wants/coreos-ignition-firstboot-complete.service: read-only file system And digging further I arrive to this issue with podman, that seems a perfect match: https://github.com/containers/podman/issues/13727 Should there be a bug opened with node/container runtimes or does one exist already? Just opened a bug on RHEl8.6/podman: https://bugzilla.redhat.com/show_bug.cgi?id=2074090 Martin/Javi - is this BZ still an issue? I think we root caused it to a bad template in the MCO, fixed here - https://github.com/openshift/machine-config-operator/pull/3079 We'll know if https://github.com/openshift/machine-config-operator/pull/3079 fixed the issue once the RHEL 8.6 rebase of RHCOS (and podman 4) lands in OCP. Not an issue AFAIK I tried on latest RHEL86 based coreos and it works perfectly, also podman version is bumped to the one with the setxattr fix, so even if MCO had the bad template, it will work Let's close as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2074613 then. *** This bug has been marked as a duplicate of bug 2074613 *** |