Bug 2111742
| Summary: | pcp-selinux 5.3.5-8.el8 breaks selinux | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | John <john.sincock> | ||||
| Component: | pcp | Assignee: | Nathan Scott <nathans> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> | ||||
| Severity: | urgent | Docs Contact: | Jacob Taylor Valdez <jvaldez> | ||||
| Priority: | unspecified | ||||||
| Version: | 8.6 | CC: | agerstmayr, jkurik, nathans | ||||
| Target Milestone: | rc | Keywords: | Bugfix, Triaged | ||||
| Target Release: | 8.7 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pcp-5.3.7-15.el8 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-05-16 08:13:26 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
John
2022-07-28 04:58:52 UTC
Also leads to hopeless failures like this: [root@audccfots809 07-28 14:16:21 ~]# semanage fcontext -a -t insights_client_etc_t -s system_u /etc/insights-client/machine-id libsepol.context_from_record: type insights_client_etc_t is not defined (No such file or directory). libsepol.context_from_record: could not create context structure (Invalid argument). libsemanage.validate_handler: invalid context system_u:object_r:insights_client_etc_t:s0 specified for /etc/insights-client/machine-id [all files] (Invalid argument). libsemanage.dbase_llist_iterate: could not iterate over records (Invalid argument). OSError: [Errno 22] Invalid argument Absolutely disgraceful for this garbage to be released into an "Enterprise" operating system. I have better things to do than come in here logging bugs that should have never been released into the wild, or if released, should have been noticed and FIXED before my time is wasted. I have no doubt the next comment in this bug will be to reprimand me for being, so let me get in first and say I don't care. I'm a Red hat customer. It's not my job to debug your operating system, my hands are full trying to USE it. Hi John, Really sorry to hear about this problem you've encountered. This is the first I'm hearing about this particular issue and I appreciate you taking the time to report it. I can assure you that extensive QE is performed prior to each release, and this includes manual verification by a separate department (from mine) of each issue that has been reported as "fixed" in a release. The earlier BZs you found that are similar all have subtle differences unfortunately, and I suspect this case you're encountering is a new wrinkle on an old problem (note that the line numbers in the pcpupstream/cil:XXX reports are different each time - this is indicating different selinux policy interaction problems). Could you help me understand the problem you are seeing further? That "cil" file where the error message is generated is a temporary file, generated specially for your system during rpm installation, and then discarded. This makes it difficult for me to see the root cause straight away - could you run: # /usr/libexec/selinux/hll/pp /var/lib/pcp/selinux/pcpupstream.pp /tmp/pcpupstream.cil and if you could then attach the output /tmp/pcpupstream.cil file to this issue, that'd help me immensely with getting to the bottom of this issue. I've started discussions with other developers here and in the upstream PCP community to look into further defensive measures we can take to prevent this class of problems (selinux policy mismatches between packages) from having the kinds of impact you're seeing. Again, apologies that this has adversely impacted on your systems and wasted your time. I already removed pcp & pcp-selinux from most affected systems, and when i did so, i found these problems were immediately fixed. But now when I reinstall them on a test system, the issue does not come back... I did not remove pcp-selinux from a colleague's test VM though, so it still has the same issue, eg if i run: [root@audctstmr003 07-28 17:42:30 ~]# semanage fcontext -a -t insights_client_etc_t -s system_u /etc/insights-client/machine-id libsepol.context_from_record: type insights_client_etc_t is not defined (No such file or directory). libsepol.context_from_record: could not create context structure (Invalid argument). libsemanage.validate_handler: invalid context system_u:object_r:insights_client_etc_t:s0 specified for /etc/insights-client/machine-id [all files] (Invalid argument). libsemanage.dbase_llist_iterate: could not iterate over records (Invalid argument). OSError: [Errno 22] Invalid argument [root@audctstmr003 07-28 17:44:59 ~]# I have used /usr/libexec/selinux/hll/pp to dump to a cil file on this VM (audctstmr003), and attached as requested. Unfortunately this VM has selinux disabled, so i cannot generate a policy to load with semodule and see if it errors with a cil line number, like my other VMs were doing (at least not until I check with colleague that its ok to enable selinux and reboot his VM tomorrow). Not sure the attached cil will be any use to you without knowing whether semodule hits an error on this vm, and the particular line number for this VM, but i may be able to get that info tomorrow. I also have a different copy of the actual /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil file from one of the other VMs that had the issue before I uninstalled. Unfortunately I don't know what line numbers in the cil semodule was comlaining about on this specific VM. It may well have been line 63 (as it was on other VM audctstmr002), but i can't be sure. If it was line 63 of this file on this VM, then that would be: (typeattributeset cil_gen_require glusterd_log_t) Anyway, you say this file is a temporary file "generated specially for your system during rpm installation, and then discarded", but if that is the case, why did it still exist on my VMs and why was it being referred to in error messages? Also, since this problem goes away when pcp-selinux is reinstalled, that suggests it will be a pig to track down how it arose. It sounds like the sort of bug that should be avoided in the first place, by making your rpms more consistent and static. Relying on some supposedly temporary file generated by some unknown past version, sounds like a recipe for disaster, and i'd say bugs like this prove it. Created attachment 1899877 [details]
cil file from colleagues vm audctstmr003
(In reply to John from comment #3) > ... > I have used /usr/libexec/selinux/hll/pp to dump to a cil file on this VM > (audctstmr003), and attached as requested. Thanks! > Unfortunately this VM has selinux disabled, so i cannot generate a policy to > load with semodule and see if it errors with a cil line number, like my > other VMs were doing (at least not until I check with colleague that its ok > to enable selinux and reboot his VM tomorrow). Not sure the attached cil > will be any use to you without knowing whether semodule hits an error on > this vm, and the particular line number for this VM, but i may be able to > get that info tomorrow. OK, appreciate it. > I also have a different copy of the actual > /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil file from one of > the other VMs that had the issue before I uninstalled. > Unfortunately I don't know what line numbers in the cil semodule was > comlaining about on this specific VM. It may well have been line 63 (as it > was on other VM audctstmr002), but i can't be sure. > If it was line 63 of this file on this VM, then that would be: > (typeattributeset cil_gen_require glusterd_log_t) > FWIW, here's a sed one-liner to extract a specific line N - $ sed 'N,N! d' ~/pcpupstream.cil If it turns out line 63 is the problem line on the VM 003 too, from the attached file that's pointing us toward: $ sed '63,63! d' ~/pcpupstream.cil (typeattributeset cil_gen_require sbd_exec_t) there's also mention of line 42 in your report (#c1, line 5 of this BZ). If that happens to be the same location in the file from this VM, thats: $ sed '42,42! d' ~/pcpupstream.cil (typeattributeset cil_gen_require numad_t) ... but let's wait and see if we can find a problematic line number that is specific to this VM. Here we go, on colleague's VM audctstmr003: When reinstalling selinux-policy, for example, we see this: ... Running scriptlet: selinux-policy-3.14.3-95.el8.noarch 1/2 Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil:42 semodule: Failed! And we also have this: [root@audctstmr003 07-29 13:26:30 z_cil]# ausearch -x /usr/sbin/ifconfig --raw | audit2allow -D -M my-ifconfig ******************** IMPORTANT *********************** To make this policy package active, execute: semodule -i my-ifconfig.pp [root@audctstmr003 07-29 13:26:42 z_cil]# semodule -X 300 -i my-ifconfig.pp Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil:42 semodule: Failed! [root@audctstmr003 07-29 13:27:02 z_cil]# And we have: [root@audctstmr003 07-29 13:20:55 ~]# /usr/libexec/selinux/hll/pp /var/lib/pcp/selinux/pcpupstream.pp /tmp/pcpupstream.audctstmr003.cil [root@audctstmr003 07-29 13:23:24 z_cil]# sed '42,42! d' ./pcpupstream.audctstmr003.cil (typeattributeset cil_gen_require numad_t) So on this VM, it seems to be this numad_t causing grief. Thanks John, that's an interesting data point. One other question - can you describe the RHEL installation method that you're using there? I'm wondering if there's a different approach being used which is making it more likely for some folks to encounter this problem than others. In particular, I'm wondering if it involves a "simultaneous" rpm install of both selinux-policy and pcp-selinux? (I think its more common for tests to run on a scenario where there's an initial RHEL install, and then PCP is installed "on top" subsequently - perhaps there's a race in the first approach not present in the second. Do you use kickstart? virt-manager with ISOs? Something else? Thanks. My VMs are all created from a VMWare template built from an early el8 release, maybe el8.0, but more likely 8.1 i think. Over time, as releases come out, the template has been upgraded through all releases to 8.6 Basically the template has followed the same process that any production VM (deployed early in EL8 lifecycle) would undergo. Deployed, then updated periodically as patches & releases come out. I'm not sure if PCP was installed as part of the base package selection during original installation, or if it was installed manually after installation from the ISO. It was installed quite early, as I have already seen PCP cause a number of problems, which i have tolerated for too long already: 1) filling filesystems due to excessive rate of data collection, which I've had to cut back by manually updating the configuration. 2) excessive and extended CPU usage at midnight to zip logs. 3) the final straw recently is the excessive and extended CPU consumption which occurs when VMs are rebooted. When you reboot a production VM after patching, you have dracut & other business going on, and you may have services also attempting to startup. The last thing you need at this point is some useless piece of garbage like PCP insisting upon indexing & compressing logs. It should defer this nonsense until after midnight, but no, if you reboot a VM, PCP thinks its important enough to slow your VM to a crawl, and will chew CPU for 10 mins or more. I have had enough of this bad behaviour from PCP. I won't have it on my VMs anymore, it will be removed from every one at the earliest opportunity, and it won't ever be going back on. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pcp bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2745 |