Bug 675698
Summary: | libvirt dies a quickly - udevGetSCSIType:753 : Failed to find SCSI device type 12 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Mark Wagner <mwagner> | ||||||
Component: | libvirt | Assignee: | Daniel Veillard <veillard> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.1 | CC: | berrange, ccui, dallan, dyuan, eblake, jdenemar, jyang, mjenner, mzhan, xen-maint, yoyzhang | ||||||
Target Milestone: | rc | Keywords: | TestBlocker | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-0.8.7-6.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-05-19 13:26:45 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 675756 | ||||||||
Bug Blocks: | 684655, 698208 | ||||||||
Attachments: |
|
Description
Mark Wagner
2011-02-07 12:32:49 UTC
Created attachment 477401 [details]
Dump from debug output
Here is a dump of the output when running in debug mode. Note this case passes and libvirtd is running.
There's something very nasty going on here, it is very non-deterministic, but one thing I am seeing is quite a few BADF errors. In one trace I saw that something closed our syslog() socket behind our back ! sendto(15, "<11>Feb 7 09:13:26 libvirtd: 09:13:26.000: 6056: error : virStateInitialize:1022 : Initialization of QEMU state driver failed\n", 127, MSG_NOSIGNAL, NULL, 0) = -1 EBADF (Bad file descriptor) close(15) = -1 EBADF (Bad file descriptor) socket(PF_FILE, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 15 connect(15, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0 sendto(15, "<11>Feb 7 09:13:26 libvirtd: 09:13:26.000: 6056: error : virStateInitialize:1022 : Initialization of QEMU state driver failed\n", 127, MSG_NOSIGNAL, NULL, 0) = 127 So there must be some code doing a close() on a file descriptor it doesn't own. This seems to occurr in middle of dealing with udev driver startup. The problem here is libpciaccess. libvirt calls the following 3 functions in a sequence: pci_system_init() pci_get_strings() pci_system_cleanup() The 'pci_get_strings' method iterates over PCI devices opening the 'config' file in sysfs. For some reason it caches the 'fd' for the config file in a global variable. 'pci_system_cleanup' closes that cached fd, but doesn't reset it to -1, so when libvirt repeats that sequence of 3 libpciaccess calls, the FD is closed for a second time. This can be trivially demonstrated pci_system_init() pci_get_strings() pci_system_cleanup() nullfd = open('/dev/null', O_RDONLY) pci_system_init() pci_get_strings() pci_system_cleanup() close(nullfd); The final close() call will get an error, because the FD associated with nullfd was closed by the second pci_get_strings() call. Most of the time in libvirt we are lucky, but if something run from the method udevProcessPCI(), issues a libvirt logging API call, and this is the first logging API call in the libvirtd daemon, then the libvirt logging FD is the one that ends up getting closed. We need to get libpciaccess fixed. We should also be more efficient and not call pci_system_init/cleanup multiple times. We can just do it once at startup and shutdown of the udev driver, not once per PCI device. Workaround for libpciaccess brokenness in libvirt: http://www.redhat.com/archives/libvir-list/2011-February/msg00159.html Created attachment 477465 [details]
Only initialize libpciaccess once
Hi Mark, According to comments, this bug is fixed in libvirt-0.8.7-6.el6. But It is difficult to reproduce it with our existing test environment in libvirt-0.8.7-4.el6. Can this bug be found with your test environment in libvirt-0.8.7-6.el6? Do you have any idea to how to reproduce it with root cause? Thanks Chun Cui Mark, Any update for this bug? I don't have time to go back and check that specific version. Running libvirtd 0.8.7-10.el6 and I do not see the problem. Thanks Mark's information. Change it to verified according to comment 11. looks like this patch caused a regression in the ability to connect to qemu:///session as non-root, per bug 684655 Hi, Mark As now we are doing rhel6.1 snapshot4 full regression testing and we haven't got this bug reproduce environment and exact steps, could you please do a favor to verify this bug with libvirt-0.8.7-18.el6.x86_64 to see whether it is still working properly? Because per to comment 13, we are worrying the patch fix for bug 684655 may cause some regression to this bug. Thanks a lot in advance. Regards, Yoyo This patch also caused the regression in bug 698208 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |