Bug 2193168 - useradd/groupmod errors do not terminate %pre script with errors
Summary: useradd/groupmod errors do not terminate %pre script with errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch3.1
Version: FDP 22.L
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Timothy Redaelli
QA Contact: Rick Alongi
URL:
Whiteboard:
: 2196275 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-04 14:29 UTC by Dan Williams
Modified: 2023-07-10 15:29 UTC (History)
8 users (show)

Fixed In Version: openvswitch3.1-3.1.0-27.el9fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-06 19:17:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2837 0 None None None 2023-05-04 14:30:41 UTC
Red Hat Product Errata RHBA-2023:3989 0 None None None 2023-07-06 19:17:57 UTC

Description Dan Williams 2023-05-04 14:29:25 UTC
We have observed that ppc64le RHCOS builds fail to modify the hugetlbfs group to add the openvswitch user, which causes OVS startup to fail because it cannot chown directories to the right group/user.

This appears to be because RPM's /bin/sh does not "set -e" and therefore errors returned by useradd/groupadd/usermod get ignored and do not terminate the %pre script.

%pre
getent group openvswitch >/dev/null || groupadd -r openvswitch
getent passwd openvswitch >/dev/null || \
    useradd -r -g openvswitch -d / -s /sbin/nologin \
    -c "Open vSwitch Daemons" openvswitch

%ifarch %{dpdkarches}
    getent group hugetlbfs >/dev/null || groupadd hugetlbfs
    usermod -a -G hugetlbfs openvswitch
%endif
exit 0

This causes issues like:

[2023-05-02T18:30:05.713Z] openvswitch3.1.prein: usermod.rpmostreesave: /etc/passwd.6: lock file already used
[2023-05-02T18:30:05.713Z] openvswitch3.1.prein: usermod.rpmostreesave: cannot lock /etc/passwd; try again later.

If we add "|| exit 1" behind those it'll supposedly help surface errors.

Comment 1 Mark Hamzy 2023-05-04 14:45:06 UTC
Is it safer to add set -euo pipefail at the top of the pre script?

Comment 2 Dan Williams 2023-05-04 14:53:56 UTC
(In reply to Mark Hamzy from comment #1)
> Is it safer to add set -euo pipefail at the top of the pre script?

I'm not a bash expert so it may well look/work nicer to do that instead. I'll leave it to OVS team.

Comment 3 Flavio Leitner 2023-05-04 17:43:29 UTC
I wonder if there is anything related to that in the packaging guidelines.

Comment 4 Dan Williams 2023-05-08 15:05:13 UTC
(In reply to Flavio Leitner from comment #3)
> I wonder if there is anything related to that in the packaging guidelines.

There isn't; OVS appears to use it correctly.

Unfortunately it looks like a long-running shadow-utils issue that we may only see in RHCOS/FCOS: https://github.com/coreos/fedora-coreos-tracker/issues/1250

Comment 5 Dan Williams 2023-05-08 15:06:29 UTC
Applicable Fedora packaging guidelines are https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/#_rationale_for_some_of_the_implementation_choices which says:

---
The exit 0 at the end will result in the %pre scriptlet passing through even if the user/group creation fails for some reason. This is suboptimal but has less potential for system wide breakage than allowing it to fail. If the user/group aren't available at the time the package's payload is unpacked, rpm will fall back to setting those files owned by root.
---

so there is a tradeoff that may/may not be appropriate for OVS packages in a *non*-RHCOS/FCOS context.

Comment 6 Timothy Redaelli 2023-05-10 14:42:32 UTC
On Fedora openvswitch package uses sysusers file in order to create the group and the user "dynamically".
I guess I can update RHEL spec file to use that too that should works on your scenario.
What do you think?
Can you try to build Fedora openvswitch spec file and see if you still have the problem?

Comment 7 Timothy Redaelli 2023-05-25 17:11:06 UTC
*** Bug 2196275 has been marked as a duplicate of this bug. ***

Comment 8 Timothy Redaelli 2023-06-12 09:36:06 UTC
Added openvswitch.sysusers and openvswitch-hugetlbfs.sysusers to follow the new Fedora guidelines (https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/#_dynamic_allocation)

Comment 11 Rick Alongi 2023-06-22 18:49:43 UTC
Hi,

I have a couple of questions:

- Is this issue specific to ppc64le?
- Is something that can be reproduced manually and, if so, what are the steps?

Thanks,
Rick

Comment 12 Timothy Redaelli 2023-06-26 16:16:50 UTC
Moving needinfo to the reportee

Comment 13 Dan Williams 2023-06-29 13:24:38 UTC
The fix was confirmed to work fine in RHEL, but does not work in RHCOS due to some missing systemd-sysusers macros bits described in https://github.com/openshift/os/issues/1274#issuecomment-1597742858

https://github.com/openshift/os/pull/1318 was the openshift workaround until the systemd RPM macros can be fixed.

We can call this bug VERIFIED as the problem is not with OVS on RHEL, but with systemd macros on RHCOS.

Comment 14 Rick Alongi 2023-06-29 13:49:41 UTC
Marking BZ Verified per comment 13.

Comment 16 errata-xmlrpc 2023-07-06 19:17:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch3.1 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3989


Note You need to log in before you can comment on or make changes to this bug.