RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 675698 - libvirt dies a quickly - udevGetSCSIType:753 : Failed to find SCSI device type 12
Summary: libvirt dies a quickly - udevGetSCSIType:753 : Failed to find SCSI device typ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Daniel Veillard
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 675756
Blocks: 684655 698208
TreeView+ depends on / blocked
 
Reported: 2011-02-07 12:32 UTC by Mark Wagner
Modified: 2011-05-19 13:26 UTC (History)
11 users (show)

Fixed In Version: libvirt-0.8.7-6.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 13:26:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Dump from debug output (69.51 KB, text/plain)
2011-02-07 12:34 UTC, Mark Wagner
no flags Details
Only initialize libpciaccess once (2.11 KB, patch)
2011-02-07 17:40 UTC, Daniel Berrangé
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description Mark Wagner 2011-02-07 12:32:49 UTC
Description of problem:
When trying to run libvirtd, it appears to start but dies quickly. /var/log/messages 

Version-Release number of selected component (if applicable):
Happens with libvirt based on 0.8.7-4.el6 and 0.8.7-5.el6 

How reproducible:
Works when logging in enabled but fails otherwise

Steps to Reproduce:
1. service libvirtd restart 
2. service libvirtd status
3.
  
Actual results:
[root@perf22 ~]# service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]
[root@perf22 ~]# 
[root@perf22 ~]# service libvirtd status
libvirtd dead but subsys locked


Expected results:
libvirtd should show up as OK


Additional info:

Comment 2 Mark Wagner 2011-02-07 12:34:51 UTC
Created attachment 477401 [details]
Dump from debug output

Here is a dump of the output when running in debug mode.  Note this case passes and libvirtd is running.

Comment 3 Daniel Berrangé 2011-02-07 14:28:37 UTC
There's something very nasty going on here, it is very non-deterministic, but one thing I am seeing is quite a few BADF errors. In one trace I saw that something closed our syslog() socket behind our back !

sendto(15, "<11>Feb  7 09:13:26 libvirtd: 09:13:26.000: 6056: error : virStateInitialize:1022 : Initialization of QEMU state driver failed\n", 127, MSG_NOSIGNAL, NULL, 0) = -1 EBADF (Bad file descriptor)
close(15)                               = -1 EBADF (Bad file descriptor)
socket(PF_FILE, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 15
connect(15, {sa_family=AF_FILE, path="/dev/log"}, 110) = 0
sendto(15, "<11>Feb  7 09:13:26 libvirtd: 09:13:26.000: 6056: error : virStateInitialize:1022 : Initialization of QEMU state driver failed\n", 127, MSG_NOSIGNAL, NULL, 0) = 127


So there must be some code doing a close() on a file descriptor it doesn't own. This seems to occurr in middle of dealing with udev driver startup.

Comment 4 Daniel Berrangé 2011-02-07 16:15:22 UTC
The problem here is libpciaccess.

libvirt calls the following 3 functions in a sequence:

 pci_system_init()
 pci_get_strings()
 pci_system_cleanup()


The 'pci_get_strings' method iterates over PCI devices opening the 'config' file in sysfs. For some reason it caches the 'fd' for the config file in a global variable. 'pci_system_cleanup' closes that cached fd, but doesn't reset it to -1, so when libvirt repeats that sequence of 3 libpciaccess calls, the FD is closed for a second time.

This can be trivially demonstrated

 pci_system_init()
 pci_get_strings()
 pci_system_cleanup()
 nullfd = open('/dev/null', O_RDONLY)
 pci_system_init()
 pci_get_strings()
 pci_system_cleanup()
 close(nullfd);

The final close() call will get an error, because the FD associated with nullfd was closed by the second pci_get_strings() call.

Most of the time in libvirt we are lucky, but if something run from the method udevProcessPCI(), issues a libvirt logging API call, and this is the first logging API call in the libvirtd daemon, then the libvirt logging FD is the one that ends up getting closed.

We need to get libpciaccess fixed. We should also be more efficient and not call pci_system_init/cleanup multiple times. We can just do it once at startup and shutdown of the udev driver, not once per PCI device.

Comment 5 Daniel Berrangé 2011-02-07 17:06:41 UTC
Workaround for libpciaccess brokenness in libvirt:

http://www.redhat.com/archives/libvir-list/2011-February/msg00159.html

Comment 6 Daniel Berrangé 2011-02-07 17:40:34 UTC
Created attachment 477465 [details]
Only initialize libpciaccess once

Comment 9 Cui Chun 2011-02-15 02:33:46 UTC
Hi Mark,

According to comments, this bug is fixed in libvirt-0.8.7-6.el6. 
But It is difficult to reproduce it with our existing test environment in libvirt-0.8.7-4.el6.
Can this bug be found with your test environment in libvirt-0.8.7-6.el6? 
Do you have any idea to how to reproduce it with root cause?

Thanks
Chun Cui

Comment 10 Cui Chun 2011-03-01 02:30:36 UTC
Mark,

Any update for this bug?

Comment 11 Mark Wagner 2011-03-15 14:27:39 UTC
I don't have time to go back and check that specific version. Running libvirtd  0.8.7-10.el6 and I do not see the problem.

Comment 12 Cui Chun 2011-03-16 02:03:50 UTC
Thanks Mark's information. 

Change it to verified according to comment 11.

Comment 13 Eric Blake 2011-03-16 19:43:52 UTC
looks like this patch caused a regression in the ability to connect to qemu:///session as non-root, per bug 684655

Comment 14 zhanghaiyan 2011-04-20 08:44:13 UTC
Hi, Mark

As now we are doing rhel6.1 snapshot4 full regression testing and we haven't got this bug reproduce environment and exact steps, could you please do a favor to verify this bug with libvirt-0.8.7-18.el6.x86_64 to see whether it is still working properly?
Because per to comment 13, we are worrying the patch fix for bug 684655 may cause some regression to this bug.
Thanks a lot in advance.

Regards,
Yoyo

Comment 15 Eric Blake 2011-05-03 13:55:40 UTC
This patch also caused the regression in bug 698208

Comment 18 errata-xmlrpc 2011-05-19 13:26:45 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.