Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1977740

Summary: [pmdakvm] cannot unload kvm kernel module
Product: Red Hat Enterprise Linux 8 Reporter: smitterl
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED ERRATA QA Contact: Jan Kurik <jkurik>
Severity: low Docs Contact: Apurva Bhide <abhide>
Priority: unspecified    
Version: 8.5CC: agerstmayr, cohuck, coli, jinzhao, jkurik, nathans, thuth, virt-qe-z
Target Milestone: betaKeywords: Documentation, Triaged
Target Release: 8.6Flags: pm-rhel: mirror+
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-5.3.3-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 13:30:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description smitterl 2021-06-30 11:56:06 UTC
Description of problem:
With pcp packages installed, the kvm kernel module cannot be unloaded. pcp has to be uninstalled for this to happen. This blocks kvm configuration update, e.g. enabling certain features like hugepages, nested virtualization etc.

Version-Release number of selected component (if applicable):
pcp-5.3.1-2.el8.x86_64
pcp-5.3.1-2.el8.s390x
++pcp-* and dependencies

How reproducible:
100%

Steps to Reproduce:
1. Confirm kernel module kvm can be unloaded. On some platforms additional modules have to be unloaded first (s. below). Confirm that kvm is loaded.
2. Install pcp packages 'yum install pcp-*'
3. modprobe -r kvm_intel (on x86_64)
4. modprobe -r kvm


Actual results:
modprobe: FATAL: Module kvm is in use.

lsof|grep /dev/kvm is empty

lsof|grep kvm
lists entries from pmdakvm, but no reference to either /dev/kvm or sysfs
e.g.
"pmdakvm   1310                   root  txt       REG              253,0     26696     847776 /usr/libexec/pcp/pmdas/kvm/pmdakvm"

'yum remove -y pcp' solves the issue

Expected results:
The user can identify which process uses KVM resources and take action to unload the kvm module successfully.

Additional info:
Installation of only pcp-pmda* and/or pcp-export* (and dependencies) doesn't show the same behavior. pcp-* looks like the necessary set. I couldn't identify which exact package causes this behavior.
I don't know how pcp-* got pulled in fully originally on my system.

Comment 1 smitterl 2021-06-30 11:57:25 UTC
I could also reproduce with RHEL 8.4 and RHEL 9 Beta.

Comment 2 Jan Kurik 2021-06-30 14:27:20 UTC
I can reproduce this as well. However, I am not sure if this is a bug.

There is a kvm pmda installed as one of the defaults pmdas (part of pcp rpm). The kvm pmda needs the kvm kernel module to be able to provide kvm metrics.

If this blocks your kvm update, then you do not need to uninstall pcp. Stopping pmcd is sufficient.

I.e. this works for me on my test bed: 

# systemctl stop pmcd
# modprobe -r kvm
# systemct start pmcd


If you do not need the kvm metrics, there is also a possibility to uninstall the kvm pmda:

# systemctl start pmcd ### Make sure PMCD is running
# cd /var/lib/pcp/pmdas/kvm
# ./Remove

This will uninstall the kvm pmda and even with running pcp (pmcd) the kvm kernel module can be removed.

Comment 3 Andreas Gerstmayr 2021-07-02 14:42:11 UTC
(In reply to smitterl from comment #0)
> Expected results:
> The user can identify which process uses KVM resources and take action to
> unload the kvm module successfully.

I'm not sure what we should do about that. We don't have control over the 'lsof' tool.
Afaics the kvm PMDA works by using the perf_event subsystem, which explains why it's not using /dev/kvm.

As Jan explained, the kvm PMDA is part of the default PMDAs installed by PCP (the 'pcp' rpm package). If this PMDA is not wanted on the system, it can be uninstalled.

Comment 4 smitterl 2021-07-02 15:58:45 UTC
(In reply to Andreas Gerstmayr from comment #3)
> (In reply to smitterl from comment #0)
> > Expected results:
> > The user can identify which process uses KVM resources and take action to
> > unload the kvm module successfully.
> 
> I'm not sure what we should do about that. We don't have control over the
> 'lsof' tool.
> Afaics the kvm PMDA works by using the perf_event subsystem, which explains
> why it's not using /dev/kvm.
> 
> As Jan explained, the kvm PMDA is part of the default PMDAs installed by PCP
> (the 'pcp' rpm package). If this PMDA is not wanted on the system, it can be
> uninstalled.

I understand. I'm wondering how a customer could find out they can stop the pmcd temporarily to unblock resources. I would never have occurred to me but then I didn't install pcp on purpose either.

I think it's not unlikely the customer would go through the following steps:

1. discovers can't unload kvm
2. runs 'lsof|grep kvm'
3. sees pmda and pmdakvm in the output
4. runs 'man pmda'
==> No manual entry for pmda

The pmda manpage is in the pcp-doc package which is not pulled in when installing pcp.

# rpm -q --whatrequires pcp-doc
no package requires pcp-doc

I'm wondering if making sure pcp-doc is pulled in when installing pcp could help the customer. At least they would be able to read in the pmda manpage:

"       A  PMDA  is responsible for a set of performance metrics, in the sense that it must respond to requests from pmcd(1) for informa‐
       tion about performance metrics, instance domains, and instantiated values.
"

So, there's some reference to pmcd.


Finally, if additionally either pmda or pmcd manpage could mention that a PMDA might block resources and that they can be released temporarily by stopping the daemon, that could be helpful - but it could also be too much. I don't know. Maybe getting a result for pmda per default could be enough.

Comment 5 smitterl 2021-07-02 16:01:33 UTC
(Setting lower severity because the workaround is documented on this BZ. Please shout if you disagree. Thanks.)

Comment 6 Nathan Scott 2021-07-05 08:21:09 UTC
The only thing we can do here is add more information to the pmdakvm(1) man page about this situation - as Andreas mentioned, the PMDA has file descriptors registered with perf_event that increase the device driver reference count.

There is a slightly better way to tackle the problem than the earlier-mentioned pmcd restart.  Its best to use the ./Remove and ./Install scripts for this task as that is less disruptive to all PCP client tools connected to pmcd - when we use those scripts, pmcd is not restarted it is sent SIGHUP to remove/install the agent.  This way only clients actively using the kvm metrics are affected, and even those see localized errors relating to kvm, not complete loss of pmcd metrics service for all metrics.

I'll improve the docs as a low priority task (most likely for 8.6 since its getting late in the 8.5 release cycle for us now).

Comment 7 Nathan Scott 2021-07-05 08:28:46 UTC
Oh, regarding:

| I think it's not unlikely the customer would go through the following steps:
|
| 1. discovers can't unload kvm
| 2. runs 'lsof|grep kvm'
| 3. sees pmda and pmdakvm in the output
| 4. runs 'man pmda'
| ==> No manual entry for pmda


I think we went sideways a bit from point 3 - looking at the lsof output, its all references to 'pmdakvm' - I'd expect (hope!) customers / support would look specifically for that man page - pmdakvm(1).  This already has a section describing the Install and Remove process I mentioned - but it would help if this specifically mention this is a good approach to allow kernel module reloading.  This is what I will add more details about.

Comment 8 Nathan Scott 2021-07-06 03:26:30 UTC
Fixed upstream via ...

commit f97db1304440eb7d5b580f355808698fa716bcc0
Author: Nathan Scott <nathans>
Date:   Tue Jul 6 13:24:05 2021 +1000

    docs: add a CAVEATS section to pmdakvm(1) about module reloading
    
    Resolves Red Hat BZ #1977740

Comment 11 Jan Kurik 2021-09-16 10:23:39 UTC
The CAVEATS section is present in the man page of the pmdakvm and describes the situation as well as a solution.

Comment 14 errata-xmlrpc 2022-05-10 13:30:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcp bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1765

Comment 15 smitterl 2022-07-04 11:02:54 UTC
Thank you!