Bug 796451

Summary: Virsh hangs when connecting to local qemu-kvm (FC16 running as VMware guest)
Product: [Fedora] Fedora Reporter: Matt <matt>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: ajia, berrange, clalancette, crobinso, dallan, dougsland, itamar, jforbes, laine, libvirt-maint, matt, orthostatic, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.6-5.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-17 23:45:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gdb backtrace of virsh
none
libvirtd backtrace (all threads) none

Description Matt 2012-02-22 21:55:47 UTC
Created attachment 565119 [details]
gdb backtrace of virsh

Description of problem:
Using FC16 as VMware Workstation 8 guest with Intel VT-x virtualisation so that I can test KVM.  When installing libvirt & qemu-kvm I am unable to connect to the local hypervisor with virsh (or virt-manager for that matter).

Running fallback Gnome desktop environment and latest updates

Have tried disabling auth (set to none) in the libvirtd.conf and disabling selinux (setenforce 0).  Also tried with std user & root user.

Version-Release number of selected component (if applicable):
* FC16 stock with all updates (also tested with testing updates)
* Kernel 3.2.6-3.fc16.x86_64
* libvirt 0.9.6-4.fc16

How reproducible:
Have reproduced on another system, using fresh FC16 install as VMware Workstation 8 guest.  Same results.

Steps to Reproduce:
1. Install FC16 as VMware guest with Intel VT-x virtualisation
2. Install qemu-kvm & libvirt
3. Type qemu --connect qemu:///system

  
Actual results:
Process hangs until ^C

Expected results:
Virsh prompt connected to local hypervisor


Additional info:
In the hope that it is useful, I have attached a gdb backtrace while it is hanging.  I ran debuginfo-install libvirt then:

virsh --connect qemu:///system &
gdb
attach [processid]
backtrace
See attachment for backtrace

Comment 1 Dave Allan 2012-02-22 22:03:43 UTC
Can you provide a backtrace of all the libvirtd threads with bt -a when this problem is occurring?

Comment 2 Cole Robinson 2012-02-22 22:05:07 UTC
And when you reproduce the hang, is dmidecode running?

ps axwww | grep dmide

Comment 3 Matt 2012-02-22 22:18:23 UTC
Hi,

1. Attachment created: backtrace of libvirtd attached
I did not fully understand your instructions, I hope this is the information that you require, let me know if there's anything more that you want - the gdb commands that I used are in the attachment,

2. Results of ps axwww | grep dmide:

1484 ?        S      0:00 /usr/sbin/dmidecode -q -t 0,1,4,17

Matt

Comment 4 Matt 2012-02-22 22:19:31 UTC
Created attachment 565125 [details]
libvirtd backtrace (all threads)

Comment 5 Cole Robinson 2012-02-22 22:25:47 UTC
Yeah I've heard of this issue before, the dmidecode hang in vmware guests. I think there's a patch upstream for it

Eric, do you know more about this?

Comment 6 Dave Allan 2012-02-22 22:30:44 UTC
Matt, that's what I was looking for.  I have the same thought Cole did which is that this is dmidecode related.

Comment 7 Dave Allan 2012-02-22 22:33:45 UTC
Are you willing to try building upstream libvirt to see if it makes the problem go away?  I'm not convinced it's fixed upstream yet, but if you can repro this at will and test builds I'm sure we can figure it out.

Comment 9 Matt 2012-02-22 22:37:07 UTC
Sure Dave.  Can you provide me some high-level instructions, or point me to a site that might have something similar?

Thanks,

Matt

Comment 10 Eric Blake 2012-02-22 22:50:36 UTC
bug 783453 is another example of a dmidecode hang; F16 does not (yet) have the two patches mentioned in that bug:

commit 06b9c5b9231ef4dbd4b5ff69564305cd4f814879
Author: Michal Privoznik <mprivozn>
Date:   Tue Jan 3 18:40:55 2012 +0100

    virCommand: Properly handle POLLHUP
    
    It is a good practise to set revents to zero before doing any poll().
    Moreover, we should check if event we waited for really occurred or
    if any of fds we were polling on didn't encountered hangup.

commit d19149dda888d36cea58b6cdf7446f98bd1bf734
Author: Laszlo Ersek <lersek>
Date:   Tue Jan 24 15:55:19 2012 +0100

    virCommandProcessIO(): make poll() usage more robust
    
    POLLIN and POLLHUP are not mutually exclusive. Currently the following
    seems possible: the child writes 3K to its stdout or stderr pipe, and
    immediately closes it. We get POLLIN|POLLHUP (I'm not sure that's possible
    on Linux, but SUSv4 seems to allow it). We read 1K and throw away the
    rest.

But it is not certain whether those two patches are all that's needed, or whether we need yet a third patch backported to the F16 build.

Comment 11 Matt 2012-02-22 23:24:16 UTC
After a bit of investigation - I am currently building the fc17 version of libvirt from src RPM.

Comment 12 Matt 2012-02-22 23:47:58 UTC
During the build of libvirt-0.9.10-1 from the fc17 source repo, the test for virsh-all hung.  It seems that dmidecode was the issue again - the build continued once I have terminated the dmidecode process.

Once the new RPM was installed - and once I had disabled TLS auth :) - the problem is solved.  Both virsh and virt-manager connect without issue.

P.S.
There was a sanlock=>0.8 dependency that I ignored for now as I don't have shared storage.

Comment 13 Dave Allan 2012-02-23 00:39:14 UTC
So now the question is, are the two patches Eric mentioned sufficient, or is there some other required commit?  Osier, I'm about to go offline for the day, would you mind spinning an F16 test build with just the two patches and see if it still fixes the problem?

Comment 14 Osier Yang 2012-02-23 01:48:22 UTC
(In reply to comment #13)
> So now the question is, are the two patches Eric mentioned sufficient, or is
> there some other required commit?  Osier, I'm about to go offline for the day,
> would you mind spinning an F16 test build with just the two patches and see if
> it still fixes the problem?

Let me do it.

Comment 15 Osier Yang 2012-02-23 09:52:47 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > So now the question is, are the two patches Eric mentioned sufficient, or is
> > there some other required commit?  Osier, I'm about to go offline for the day,
> > would you mind spinning an F16 test build with just the two patches and see if
> > it still fixes the problem?
> 
> Let me do it.

Tested with installing VMware Workstation 8, and fc16 guest, the problem was resolved exactly with those two patches applied in the testing build.

Comment 18 Fedora Update System 2012-03-04 16:21:28 UTC
libvirt-0.9.6-5.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/libvirt-0.9.6-5.fc16

Comment 19 Fedora Update System 2012-03-06 19:33:55 UTC
Package libvirt-0.9.6-5.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-0.9.6-5.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-3067/libvirt-0.9.6-5.fc16
then log in and leave karma (feedback).

Comment 20 Fedora Update System 2012-03-17 23:45:09 UTC
libvirt-0.9.6-5.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 21 Garrett Ellis 2012-04-26 04:03:15 UTC
I apologize for the noise, devs. I'm posting this to benefit those searching for RHEL solutions to this very problem. :)

This problem with libvirt exists in RHEL 6.2, and I stumbled upon it while preparing for RHCSA/RHCE recertification. My study environment consists of VMWare Workstation 8.0.2-591240 and RHEL 6.2. 

This is fixed in RHEL 6.3 beta as of 2012/04/25.