Hide Forgot
Description of problem: libvirt does not use the -b option of modprobe to honor the system blacklist and will therefore load modules regardless of whether they've been blacklisted. Version-Release number of selected component (if applicable): libvirt-1.1.1-14.el7.x86_64 How reproducible: always Steps to Reproduce: 1. add a blacklist entry for vfio-pci 2. attempt to assign a device (with vfio loaded) 3. Actual results: vfio-pci is loaded and device assignment works Expected results: blacklist honored and unable to assign device Additional info: If we enable vfio module auto-loading then the admin needs some what to prevent vfio from being used. Module blacklists seem like the most obvious way to do that, but they must be honored.
I have a good idea of where to change within libvirt, but the instructions for this bz don't help me figure out how to "A" reproduce and "B" ensure the fix works. I'm assuming that by simply adding "-b" to our MODPROBE command found in virPCIProbeStubDriver() that we'll be all set, but I would like to verify that! Suffice to say I have no exposure to vfio-pci and I've never added a blacklist device. Based on what I've read there seems to be a subset of hardware that can be used to support the feature. Then there's a specific set of instructions in order to configure. Finding an available system in beaker is a challenge. So Alex - if there's any more details that will help - could you please add them. Are there specific devices I can look for in the (say) boston beaker lab? Or is there some other system I could use to verify? Thanks.
I can test it if you'd like to send me a patch or a pointer to a brew build with the change.
Started with libvirt 1.1.1-19.el7 installed, edited /etc/modprobe.d/local.conf and added: blacklist vfio-pci Reboot, attempt to start VM which makes use of assigned devices. Result: VM started successfully, vfio-pci module loaded. Built 1.1.1-19.el7.1 from src.rpm, installed and rebooted. Attempt to start same VM results in: Error starting domain: internal error: Failed to load PCI stub module vfio-pci Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 91, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 127, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/domain.py", line 1260, in startup self._backend.create() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 696, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: internal error: Failed to load PCI stub module vfio-pci So the blacklist is now honored, the VM fails to start, and no vfio modules are loaded. I suppose the only question remains whether this is the desired error for such a condition. Thanks
IOW: A better error message indicating the reason for failure was because the module was blacklisted? When the failure occurs, there is a VIR_WARN() done which should log somewhere - I think /var/log/messages or perhaps the /var/log/libvirt/qemu/$domname.log Search on "failed to load driver" there will be the driver name and the errno error message included. The question becomes is telling the user the module is blacklisted the "right" thing to do or is logging it elsewhere sufficient with the message as shown? I'm not sure what the best answer is.
(In reply to John Ferlan from comment #7) > IOW: A better error message indicating the reason for failure was because > the module was blacklisted? > > When the failure occurs, there is a VIR_WARN() done which should log > somewhere - I think /var/log/messages or perhaps the > /var/log/libvirt/qemu/$domname.log > > Search on "failed to load driver" there will be the driver name and the > errno error message included. I don't see any such message in the host log files. > The question becomes is telling the user the module is blacklisted the > "right" thing to do or is logging it elsewhere sufficient with the message > as shown? > I'm not sure what the best answer is. modprobe will return success if asked to load blacklisted module with the -b option, it just doesn't load them. If the module fails to load or is not found, modprobe returns error. So it seems like we have: if (success) if (/sys/bus/pci/drivers/$DRIVER) success else fail, administratively prohibited else fail, broken Ideally the error message could give some indication whether we suspect the "administratively prohibited" case.
Well since modprobe returns success, sure I understand why you don't see the message - it'll only be spit out when modprobe fails. Not getting any indication that the failure to actually load is because of the blacklist means perhaps assuming what the reason may be... So the code currently does this: probed = false recheck: if "/sys/bus/pci/drivers/$DRIVER" exists return 0 if not probed probed = true modprobe -b $DRIVER if fail VIR_WARN("failed to load driver"...) return -1 goto recheck return -1 If I place something above that last return -1 that indicates to check if administratively prohibited, that solves the message issue. However, can modprobe "succeed", but yet not load the driver for any other reason? On non RHEL releases? I suppose if I did a subsequent modprobe without the -b, found that it succeeds, then use "-r" to remove it, then at least I'd know if the "-b" was the reason or not loading. Seems a bit excessive though. Would there be a downside to having to ensure "-r" would work?
(In reply to John Ferlan from comment #9) > Well since modprobe returns success, sure I understand why you don't see the > message - it'll only be spit out when modprobe fails. > > Not getting any indication that the failure to actually load is because of > the blacklist means perhaps assuming what the reason may be... > > So the code currently does this: > > probed = false > recheck: > if "/sys/bus/pci/drivers/$DRIVER" exists > return 0 > > if not probed > probed = true > modprobe -b $DRIVER > if fail > VIR_WARN("failed to load driver"...) > return -1 > goto recheck > > return -1 > > > If I place something above that last return -1 that indicates to check if > administratively prohibited, that solves the message issue. However, can > modprobe "succeed", but yet not load the driver for any other reason? On > non RHEL releases? > > I suppose if I did a subsequent modprobe without the -b, found that it > succeeds, then use "-r" to remove it, then at least I'd know if the "-b" was > the reason or not loading. Seems a bit excessive though. Would there be a > downside to having to ensure "-r" would work? That seems dangerous, what happens if another VM is started while we're testing whether the module can be loaded without -b? What happens if the module causes problems on the system? I notice that there's also a '-c' option to modprobe which will dump the configuration, including blacklist. 'modprobe -c | grep "blacklist vfio_pci"' would tell us that the module is blacklisted. Note that - and _ are interchangeable for these utilities and _ seems to get used here.
Dang - wasn't paying attention I guess and pressed the wrong bz button... Sigh. It's Friday. Anyway - a 'modprobe -c | grep -E "blacklist vfio_pci|blacklist vfio-pci"' would do the trick... In any case, we know we can avoid loading now - it's just how/what to message to the user that needs to be answered. Let me bounce some questions of others in the group and I'll hopefully have a mechanism early next week.
virt-manager still shows me the same error as in comment 6 when vfio-pci is blacklisted, I don't know where I should be looking to see either of the errors from virPCIProbeStubDriver.
<sigh> error message madness... I guess I just assumed both would be printed. Looks like I need to add "virErrorPtr err = virGetLastError();", then in virPCIDeviceDetach() change: virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to load PCI stub module %s"), dev->stubDriver); to be virErrorPtr err = virGetLastError(); virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to load PCI stub module %s"), err ? err->message : dev->stubDriver); and change my new one to be just "%s - administratively prohibited"... That way the failure is either: Failed to load PCI stub module %s or Failed to load PCI stub module %s - administratively prohibited (or something similar)
Blacklisted: Error starting domain: internal error: Failed to load PCI stub module vfio-pci: administratively prohibited Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 91, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 127, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/domain.py", line 1260, in startup self._backend.create() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 696, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: internal error: Failed to load PCI stub module vfio-pci: administratively prohibited (Confirmed either vfio_pci or vfio-pci blacklisting works the same) Rename vfio-pci.ko to vfio-pci2.ko (force modprobe error): Error starting domain: internal error: Child process (/sbin/modprobe -b vfio-pci) unexpected exit status 1: modprobe: ERROR: could not insert 'vfio_pci': Unknown symbol in module, or unknown parameter (see dmesg) Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 91, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 127, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/domain.py", line 1260, in startup self._backend.create() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 696, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: internal error: Child process (/sbin/modprobe -b vfio-pci) unexpected exit status 1: modprobe: ERROR: could not insert 'vfio_pci': Unknown symbol in module, or unknown parameter (see dmesg) Not blacklisted, correct name: Works Now the bad news, ln -s /bin/true /sbin/modprobe (return true, but doesn't do anything, -c doesn't include vfio-pci blacklist): same administratively prohibited error as first test.
Tested with build libvirt-1.1.1-23.el7.x86_64, this bug is fixed. Scenario 1: start one guest with pci 1. add the vfio module to the blacklist. # cat /etc/modprobe.d/local.conf options vfio_iommu_type1 allow_unsafe_interrupts=1 blacklist vfio_pci 2. make sure there isn't any vfio module on host. # lsmod|grep vfio 3. prepare one shutoff guest with pci device. # virsh dumpxml a ...... <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </source> </hostdev> ...... 4. start the guest. It's failed as expected. # virsh start a error: Failed to start domain a error: internal error: Failed to load PCI stub module vfio-pci: administratively 5. make sure the vfio module didn't be loaded automatically. # lsmod|grep vfio Scenario 2: hot-plug one pci device to the running guest. 1. add the vfio module to the blacklist # cat /etc/modprobe.d/local.conf options vfio_iommu_type1 allow_unsafe_interrupts=1 blacklist vfio_pci 2. make sure there isn't any vfio module on host. # lsmod|grep vfio 3. prepare one pci device xml for hot-plug. # cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </source> </hostdev> 4. start one guest. # virsh start a Domain a started 5. hot-plug the pci device to the guest. It's failed as expected. # virsh attach-device a hostdev.xml error: Failed to attach device from hostdev.xml error: internal error: Failed to load PCI stub module vfio-pci: administratively 6. make sure the vfio module didn't be loaded automatically. # lsmod|grep vfio
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request.