Bug 1901449
Summary: | Disabling service via Ansible `service` module does not work, systemctl show aborts with "Failed to parse bus message: Invalid argument" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Miroslav Vadkerti <mvadkert> |
Component: | systemd | Assignee: | David Tardon <dtardon> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Frantisek Sumsal <fsumsal> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.4 | CC: | abokovoy, ajb, arachman, bperkins, bstinson, bugzilla-redhat, carl, dkarpele, dtardon, farrotin, fsumsal, ftrivino, jpena, lmiksik, msekleta, ngompa13, nhosoi, pcahyna, rbeyel, rpm, sgoveas, skozina, stefw, systemd-maint-list, todoleza, toracat, twoerner, ykaliuta |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 8.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | systemd-239-43.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-04-07 10:16:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1921845 |
Description
Miroslav Vadkerti
2020-11-25 09:34:47 UTC
Hmm downgrading `firewalld` does not help, moving to `systemd` Probably because systemctl show aborts. 8.4: # systemctl show firewalld (...) TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SyslogLevel=6 SyslogFacility=3 LogLevelMax=-1 LogRateLimitIntervalUSec=0 LogRateLimitBurst=0 SecureBits=0 Failed to parse bus message: Invalid argument # echo $? 1 8.3: # systemctl show firewalld AssertTimestampMonotonic=3105543614 Transient=no Perpetual=no StartLimitIntervalUSec=10s StartLimitBurst=5 StartLimitAction=none FailureAction=none SuccessAction=none InvocationID=88d31d5c400d4a219ca6387cc0474332 CollectMode=inactive # echo $? 0 8.4 ./systemctl show firewalld | grep LoadState Failed to parse bus message: Invalid argument 8.3 systemctl show firewalld | grep LoadState LoadState=loaded The Ansible systemd module expects a LoadState key in systemctl output. Minimal Ansible reproducer: ansible -m systemd -a 'name=firewalld state=started' localhost In particular, the above means that ignoring the return code will not work, as the module will hit a problem in a different place and fail with "Could not find the requested service firewalld: host". I don't see the problem on Rawhide - some backport gone wrong? Not just firewalld: # systemctl show sshd | grep LoadState Failed to parse bus message: Invalid argument *** Bug 1898591 has been marked as a duplicate of this bug. *** I forgot to mention that this issues is blocking currently RHEL CI. Please let us know how serious this issue is (in terms of breaking a lot of tests), so we can plan accordingly. The current nightly will expire in 2 days. This is caused by a kernel change. I get the failure with 4.18.0-252.el8 but not with 4.18.0-240.1.1.el8_3. Interestingly, strace does not show any syscall returning EINVAL (except the very first one in the process: arch_prctl, but this is the same for working kernels.) This blocks any ability to run gating tests for RHEL IdM while attempting to integrate to RHEL 8.4. (In reply to Pavel Cahyna from comment #9) > Interestingly, strace does not show any syscall returning EINVAL (except the > very first one in the process: arch_prctl, but this is the same for working > kernels.) I see the following error after comparing strace output from both rhel8.4 (4.18.0-252.el8.x86_64) and rhel8.3 (4.18.0-240.1.1.el8_3.x86_64) recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=24, tv_nsec=999880000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=24, tv_nsec=992329163}) I'm not sure which sysfs attribute file is the one that is busy... I played around a bit and I think I know where the problem is. The issue happens when systemctl is trying to format the CapabilityBoundingSet= property: # systemctl show tuned --no-pager -p CapabilityBoundingSet Failed to parse bus message: Invalid argument The ltrace shows that it dies somewhere around checking the last valid capability number from /proc/sys/kernel/cap_last_cap: bus_message_print_all_properties(0x563e0dbe6820, 0x563e0c1330d0, 0x563e0dbe2c60, 0 <unfinished ...> gettid@SYS(2, 0xffffffff, 0, 0x7ffd2adc5e30) = 5975 SYS_318@SYS(0x7f0156fb68b0, 16, 1, 0) = 16 SYS_318@SYS(0x7f0156fb6890, 16, 1, 0x563e0dbe8260) = 16 SYS_318@SYS(0x7f0156fb6890, 16, 1, 0x563e0dbe8260) = 16 SYS_318@SYS(0x7f0156fb6890, 16, 1, 0x563e0dbe8260) = 16 SYS_318@SYS(0x7f0156fb6890, 16, 1, 0x563e0dbe8490) = 16 SYS_318@SYS(0x7f0156fb6890, 16, 1, 0x563e0dbe8490) = 16 sd_bus_message_peek_type(0x563e0dbe6820, 0x7ffd2adc5d37, 0x7ffd2adc5d38, 0) = 1 openat@SYS(AT_FDCWD, "/proc/sys/kernel/cap_last_cap", 0x80000, 00) = 4 fstat@SYS(4, 0x7ffd2adc5a70) = 0 read@SYS(4, "39\n", 1024) = 3 close@SYS(4) = 0 <... bus_message_print_all_properties resumed> ) = 0xffffffea bus_log_parse_error(0xffffffea, 2, 0xffffffea, 0 <unfinished ...> Comparing the capability number with 8.3, there are clearly two new capabilities present in the new kernel: ## 8.3 # cat /proc/sys/kernel/cap_last_cap 37 ## 8.4 # cat /proc/sys/kernel/cap_last_cap 39 After a brief poking around it's clear that the systemd internal capability structure is missing the newly introduced capabilities, since the list is built during compilation and the compilation was done with an older kernel: Breakpoint 1, capability_to_name (id=id@entry=37) at ../src/basic/cap-list.c:20 20 const char *capability_to_name(int id) { (gdb) Run till exit from #0 capability_to_name (id=id@entry=37) at ../src/basic/cap-list.c:20 0x00007ffff79f2d5b in capability_set_to_string_alloc (set=set@entry=18446744073709551615, s=s@entry=0x7fffffffd6d8) at ../src/basic/cap-list.c:70 70 p = capability_to_name(i); Value returned is $38 = 0x7ffff7adcf0d "cap_audit_read" (gdb) p capability_names $39 = {0x7ffff7adccfe "cap_chown", 0x7ffff7adcd08 "cap_dac_override", 0x7ffff7adcd19 "cap_dac_read_search", 0x7ffff7adcd2d "cap_fowner", 0x7ffff7adcd38 "cap_fsetid", 0x7ffff7adcd43 "cap_kill", 0x7ffff7adcd4c "cap_setgid", 0x7ffff7adcd57 "cap_setuid", 0x7ffff7adcd62 "cap_setpcap", 0x7ffff7adcd6e "cap_linux_immutable", 0x7ffff7adcd82 "cap_net_bind_service", 0x7ffff7adcd97 "cap_net_broadcast", 0x7ffff7adcda9 "cap_net_admin", 0x7ffff7adcdb7 "cap_net_raw", 0x7ffff7adcdc3 "cap_ipc_lock", 0x7ffff7adcdd0 "cap_ipc_owner", 0x7ffff7adcdde "cap_sys_module", 0x7ffff7adcded "cap_sys_rawio", 0x7ffff7adcdfb "cap_sys_chroot", 0x7ffff7adce0a "cap_sys_ptrace", 0x7ffff7adce19 "cap_sys_pacct", 0x7ffff7adce27 "cap_sys_admin", 0x7ffff7adce35 "cap_sys_boot", 0x7ffff7adce42 "cap_sys_nice", 0x7ffff7adce4f "cap_sys_resource", 0x7ffff7adce60 "cap_sys_time", 0x7ffff7adce6d "cap_sys_tty_config", 0x7ffff7adce80 "cap_mknod", 0x7ffff7adce8a "cap_lease", 0x7ffff7adce94 "cap_audit_write", 0x7ffff7adcea4 "cap_audit_control", 0x7ffff7adceb6 "cap_setfcap", 0x7ffff7adcec2 "cap_mac_override", 0x7ffff7adced3 "cap_mac_admin", 0x7ffff7adcee1 "cap_syslog", 0x7ffff7adceec "cap_wake_alarm", 0x7ffff7adcefb "cap_block_suspend", 0x7ffff7adcf0d "cap_audit_read"} (gdb) p sizeof(capability_names)/sizeof(*capability_names) $41 = 38 (gdb) p capability_names[37] $43 = 0x7ffff7adcf0d "cap_audit_read" (gdb) p capability_names[38] $44 = 0x0 So, the loop in capability_set_to_string_alloc() [src/basic/cap-list.c] is trying to loop over two capabilities, which are not present in the lookup table, causing capability_to_name() [src/basic/cap-list.c] to return NULL and subsequently capability_set_to_string_alloc() to return -EINVAL, leading up to the systemctl error: (gdb) frame #0 capability_to_name (id=id@entry=38) at ../src/basic/cap-list.c:20 20 const char *capability_to_name(int id) { (gdb) s 25 if (id >= (int) ELEMENTSOF(capability_names)) (gdb) s capability_set_to_string_alloc (set=set@entry=18446744073709551615, s=s@entry=0x7fffffffd6d8) at ../src/basic/cap-list.c:71 71 if (!p) (gdb) p p $46 = 0x0 (gdb) s Failed to parse bus message: Invalid argument I suspect the issue should be easily resolvable by just rebuilding the systemd RPM with the new kernel. (In reply to Frantisek Sumsal from comment #14) > I suspect the issue should be easily resolvable by just rebuilding the > systemd RPM with the new kernel. Right, it is. Yes, kernel-4.18.0-251.el8 adds CAP_BPF (at least). Thanks Frantisek! (In reply to David Tardon from comment #15) > (In reply to Frantisek Sumsal from comment #14) > > I suspect the issue should be easily resolvable by just rebuilding the > > systemd RPM with the new kernel. > > Right, it is. Well, as a workaround. This compile-time dependency of build system kernel headers must be removed. New capabilities can come, AFAIK one more is coming soon. After change of the component keywords were removed. The bug still prevents proper gating testing. (In reply to Alexander Bokovoy from comment #18) > After change of the component keywords were removed. The bug still prevents > proper gating testing. systemd was rebuilt yesterday against kernel-headers-4.18.0-252.el8. That should mitigate the problem for now. Can you please retest and report back? (In reply to Michal Sekletar from comment #19) > (In reply to Alexander Bokovoy from comment #18) > > After change of the component keywords were removed. The bug still prevents > > proper gating testing. > > systemd was rebuilt yesterday against kernel-headers-4.18.0-252.el8. That > should mitigate the problem for now. Can you please retest and report back? I asked RHEL IdM QE to verify it today. > Well, as a workaround. This compile-time dependency of build system kernel
> headers must be removed.
> New capabilities can come, AFAIK one more is coming soon.
Indeed -- I suppose it would also be required for eg. UBI container running on a non-RHEL kernel?
That's not a supported scenario, but we say it should work.
Michal, Frantisek, do we plan proper fix for this?
(In reply to Stanislav Kozina from comment #21) > > Well, as a workaround. This compile-time dependency of build system kernel > > headers must be removed. > > New capabilities can come, AFAIK one more is coming soon. > > Indeed -- I suppose it would also be required for eg. UBI container running > on a non-RHEL kernel? > That's not a supported scenario, but we say it should work. Why only non-RHEL? Shouldn't running container images of older RHEL releases (userland) be working and even supported on newer RHEL kernels? Wouldn't the issue affect this scenario too? (In reply to Pavel Cahyna from comment #22) > Why only non-RHEL? Shouldn't running container images of older RHEL releases > (userland) be working and even supported on newer RHEL kernels? Wouldn't the > issue affect this scenario too? Yes, I think you should be able to run RHEL-7 UBI on RHEL-8 host / kernel, and vice versa, and in such case I think it's even fully supported installation. Sorry, the non-RHEL example for misleading simplification, it really is any other kernel. This was seen a few months ago in Fedora [1]. There is a systemd fix [2], and also a workaround from the Ansible point of view [3]. We are also hitting the same issue in CentOS 8 Stream, with systemd 239-43 and kernel 4.18.0-257. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1853736 [2] - https://github.com/systemd/systemd/pull/16420/files [3] - https://github.com/ansible/ansible/issues/71528 CentOS Stream bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1908275 *** Bug 1910430 has been marked as a duplicate of this bug. *** *** Bug 1908275 has been marked as a duplicate of this bug. *** I've encountered the same issue with systemd-239-43.el8.x86_64 on CentOS Stream 8 but using systemd-239-44.el8.x86_64 fixed that. |