Bug 528754

Summary: qemu-kvm segfault caused by -soundhw es1370
Product: [Fedora] Fedora Reporter: Gene Czarcinski <gczarcinski>
Component: qemuAssignee: Justin M. Forbes <jforbes>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 12CC: awilliam, berrange, don, dwmw2, gcosta, itamar, jaswinder, jforbes, markmc, quintela, virt-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-04 07:29:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/libvirt/qemu/f11test1.log
none
guest log file for most recent (20091019) problem
none
backtrace .. qemu-kvm-0.11.0-6.fc12.x86_64 none

Description Gene Czarcinski 2009-10-13 15:35:57 UTC
Description of problem:

Every "so often", a qemu-kvm guest will suddenly crash (act like I just did a hard system shutdown.  This is happening with F12 and F11 guests.  This last time, I found a message in /var/log/messages:

Oct 13 11:22:45 localhost kernel: qemu-kvm[5517]: segfault at 35824d2630 ip 0000003c0e683a80 sp 00007fff01cc4a78 error 4 in libc-2.10.90.so[3c0e600000+176000]

Version-Release number of selected component (if applicable):
F12 rawhide, qemu-kvm-0.11.0-6.fc12.x86_64

How reproducible:
happens ... do not know cause

Comment 1 Mark McLoughlin 2009-10-16 13:10:10 UTC
Gene: thanks for the report

First, could you attach /var/log/libvirt/qemu/$guest.log so we know how the guest is configured ?

Next, we really need to get a stack trace of the crashes. I've just experimented a little with abrt, and that's probably the easiest way to get the stack trace.

Try this:

  $> yum install --enablerepo=rawhide-debuginfo qemu-debuginfo
  $> yum install -y abrt abrt-cli abrt-addon-ccpp abrt-plugin-logger
  $> service abrtd restart

then when qemu-kvm crashes, you should be able to do e.g.:

  $> abrt-cli --get-list
  $> echo n | abrt-cli --report $uuid > t.log

and attach the stack trace here

If that works well, we should add those steps to the wiki:

  https://fedoraproject.org/wiki/Reporting_virtualization_bugs

Comment 2 Gene Czarcinski 2009-10-16 16:17:46 UTC
OK, this is not going to be easy to actually identify a guest (I currently have 13 guests defined and there have been more since deleted).  I have been "playing" with NIC definitions troubleshooting a problem with NetworkManager.  I believe that the guest that failed is f11test1 but I am not sure -- I am attaching the log for that guest.  At the time of the crash, I believe that one of the NICs was a bridge interface but it may have been a NAT interface.

I grep'ed /var/log/messages* and over the period from 6 Octr to 13 Oct, I got five hits of "qemu-kvm" ... "segfault".

Since I have no idea what exactly caused the problem, I will do your installs and then wait to see if it occurs again.

Comment 3 Gene Czarcinski 2009-10-16 16:18:39 UTC
Created attachment 365062 [details]
/var/log/libvirt/qemu/f11test1.log

Comment 4 Mark McLoughlin 2009-10-16 16:44:44 UTC
Cool, that's why I suggested configuring abrt - you can just set it up and it'll catch any segfault if/when it happens

Comment 5 Gene Czarcinski 2009-10-19 18:18:54 UTC
OK, it happened again.  The relevant section of /var/log/messages is:
--------------------------------
Oct 19 13:06:32 localhost kernel: __ratelimit: 3542 callbacks suppressed
Oct 19 13:06:32 localhost kernel: qemu-kvm[2849]: segfault at 7f1dcc003770 ip 00007f25fcd10f70 sp 00007fff1b68e368 error 4 in libc-2.10.90.so.#prelink#.RZ9OQz (deleted)[7f25fcc8d000+177000]
Oct 19 13:06:32 localhost abrtd: Directory 'ccpp-1255971992-2849' creation detected
Oct 19 13:06:32 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:33 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:33 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:34 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:34 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:35 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:35 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:36 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:36 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:37 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:37 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:38 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:38 localhost abrtd: Lock file '/var/cache/abrt/ccpp-1255971992-2849.lock' is locked by process 26913
Oct 19 13:06:38 localhost abrt: saved core dump of pid 2849 to /var/cache/abrt/ccpp-1255971992-2849/coredump
Oct 19 13:06:39 localhost abrtd: Hmm, stray warn_client: 'CPluginManager::GetDatabase():Database plugin: 'SQLite3' is not registered.'
Oct 19 13:06:39 localhost kernel: virbr2: port 2(vnet2) entering disabled state
Oct 19 13:06:39 localhost kernel: device vnet2 left promiscuous mode
Oct 19 13:06:39 localhost kernel: virbr2: port 2(vnet2) entering disabled state
Oct 19 13:06:39 localhost kernel: virbr0: port 2(vnet3) entering disabled state
Oct 19 13:06:39 localhost libvirtd: 13:06:39.265: error : qemudDomainGetMemoryBalloon:3518 : operation failed: could not query memory balloon allocation
Oct 19 13:06:39 localhost kernel: device vnet3 left promiscuous mode
Oct 19 13:06:39 localhost kernel: virbr0: port 2(vnet3) entering disabled state
Oct 19 13:06:39 localhost avahi-daemon[1535]: Withdrawing address record for fe80::a467:b0ff:fe4a:5812 on vnet2.
Oct 19 13:06:39 localhost avahi-daemon[1535]: Withdrawing address record for fe80::780d:a1ff:fe82:7721 on vnet3.
Oct 19 13:09:56 localhost abrtd: Getting crash infos...
Oct 19 13:10:36 localhost abrtd: Getting crash infos...
Oct 19 13:15:46 localhost abrtd: Getting crash infos...
Oct 19 13:15:58 localhost abrtd: Getting crash infos...
Oct 19 13:17:51 localhost abrtd: Error: CPluginManager::GetDatabase():Database plugin: 'SQLite3' is not registered.
Oct 19 13:17:51 localhost abrtd: UnRegistered plugin Bugzilla(Reporter)
Oct 19 13:17:51 localhost abrtd: Plugin Bugzilla successfully unloaded
Oct 19 13:17:51 localhost abrtd: UnRegistered plugin CCpp(Analyzer)
Oct 19 13:17:51 localhost abrtd: Plugin CCpp successfully unloaded
Oct 19 13:17:51 localhost abrtd: UnRegistered plugin Kerneloops(Analyzer)
Oct 19 13:17:51 localhost abrtd: Plugin Kerneloops successfully unloaded
Oct 19 13:17:51 localhost abrtd: UnRegistered plugin KerneloopsReporter(Reporter)
Oct 19 13:17:51 localhost abrtd: Plugin KerneloopsReporter successfully unloaded
Oct 19 13:17:51 localhost abrtd: UnRegistered plugin KerneloopsScanner(Action)
Oct 19 13:17:51 localhost abrtd: Plugin KerneloopsScanner successfully unloaded
Oct 19 13:17:51 localhost abrtd: UnRegistered plugin Logger(Reporter)
Oct 19 13:17:51 localhost abrtd: Plugin Logger successfully unloaded
Oct 19 13:17:51 localhost abrtd: Plugin TicketUploader successfully unloaded
Oct 19 13:17:51 localhost abrtd: Exiting
-------------------------

Doing ls -l /var/cache/abrt/ I get:
drwx------. 2 qemu qemu 4096 2009-10-19 13:06 ccpp-1255971992-2849

doing abrt-cli --get-list results in:

The most recent guest (f12x5) log file is attached.
array expected in dbus message, but not found ('')
dbus call GetCrashInfos: return type mismatch

doing: echo n | abrt-cli --report 1255971992-2849 > t.log -- I get:
Error sending DBus message

BUT t.log is a NULL file!!

Thus, not traceback!

There is a 888+ megabyte coredump but I assume you DO NOT want me to upload that file (if I was willing which I am not).

Comment 6 Gene Czarcinski 2009-10-19 18:20:05 UTC
Created attachment 365263 [details]
guest log file for most recent (20091019) problem

Comment 7 Mark McLoughlin 2009-10-21 14:56:57 UTC
(In reply to comment #5)

> There is a 888+ megabyte coredump but I assume you DO NOT want me to upload
> that file (if I was willing which I am not).  

Gah, that's no fun that abrt didn't work.

Well, at least it kept the core file for you

To get a stack trace you can do:

$> yum --enablerepo=rawhide-debuginfo qemu-debuginfo
$> echo "thread apply all bt full" > gdb.cmd
$> gdb -batch -x gdb.cmd /usr/bin/qemu-kvm <path-to-core-file>

Thanks

Comment 8 Gene Czarcinski 2009-10-21 15:58:07 UTC
Oops ... almost a gatcha ... I had updated to "qemu-kvm-...-7" since this report was last updated but ... I keep a local mirror of updates so I downgraded back to "-6" which was the version when the problem occurred.

The backtrace is rather long so I am attaching it rather than putting it inline.

Comment 9 Gene Czarcinski 2009-10-21 15:59:33 UTC
Created attachment 365541 [details]
backtrace .. qemu-kvm-0.11.0-6.fc12.x86_64

Comment 10 Mark McLoughlin 2009-10-21 16:36:39 UTC
Great, thanks!

Okay, here's the interesting bit:

Thread 1 (Thread 2849):
#0  0x00007f25fcd10f70 in memset () from /lib64/libc.so.6
No symbol table info available.
#1  0x00000000004babc6 in audio_capture_mix_and_clear (samples=-1099358712, 
    rpos=<value optimized out>, hw=<value optimized out>) at audio/audio.c:1290
        n = -1099358712
#2  audio_run_out (samples=-1099358712, rpos=<value optimized out>, 

As a workaround, you can remove <sound> from the guest's configuration

There's no obvious Fedora developer to deal with audio stuff - Justin, maybe you could take a poke?

Comment 11 Justin M. Forbes 2009-10-21 16:59:20 UTC
Happy to.

Comment 12 Gene Czarcinski 2009-10-21 19:37:36 UTC
Since sound does not current work for guests, I have no problem removing the virtual hardware if it will stop the crashes.

I do hope that sound does work in F13 since this would make guests more complete.  I understand the difficulty of implementing sound for guests correctly so, even if I would like to see it sooner rather than later, I too want it done correctly.

Comment 13 Justin M. Forbes 2009-10-22 20:24:58 UTC
It appears that this is with F12 guests, is there anything specific you are doing in the guest to make things crash?  Does it only happen with more than one guest active at a time? I am not seeing it by simply playing an internet radio station from rhythmbox on a single guest.

Comment 14 Gene Czarcinski 2009-10-22 21:49:54 UTC
"just running the guest" ... nothing special!  In fact, when I "install" the guest, I almost always "unclick" the sound applications since it is not currently supported.

Mostly, I have been "playing" with the network definitions (multiple NICs) and debugging NetworkManager ... or rather getting it to work the way I want it to work.  qemu-kvm guests are real nice for playing with network configurations ... I currently have the default NAT network, two private/non-routed networks and a bridge interface to my local network.  Sound ... I have been ignoring ... I do not turn it on or off but there may be some stuff turned on by default.

Oops ... I do have between two and four guests running .. one F11 and from one to three F12-Beta.

BTW, I do have a few (four?) more crashes with data collected.  They may be (are likely to be) with different guests.  I believe that they were all running qemu-kvm-0.11.0-6.fc12.x86_64 installed on the host.  Since I am running a fresh install of the F12-Beta on that system, the "old" F12-Alpha with lots and lots of updates (F12-Beta level) is only a re-boot away.  If you need additional backtraces, I can get them.

Comment 15 Adam Williamson 2009-10-23 15:32:16 UTC
This issue was discussed at this morning's blocker bug review meeting. Justin Forbes summarized 'adamw: though with this bug, it is qemu-kvm crashing, so all running guests crash.  Like I said, it is a high priority bug, but could just as easily be fixed with a zero day since it is the host that needs the fix'. Adam Williamson felt it should be addressed in some way before final. Conclusion: ideally the bug should be fixed before final, if that isn't managed, sound devices in virtual guests should be disabled by default (as apparently this feature doesn't currently work anyway). If that can't be done for some reason, the issue should be documented in common bugs and users recommended to disable sound in virtual guests themselves.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Gene Czarcinski 2009-10-23 15:52:02 UTC
Oops ... sorry if I gave the impression that all guests crash.  IIRC, each time this occurred, two guests were running and only ONE guest crashed ... virt-manage and the other guests were alive and well!

I sure would like to know if anyone else has had this happen to them and, if not, what makes my setup unique.

As far as I know, sound devices were disabled although there was a sound device defined and did nothing with sound.

Comment 17 Mark McLoughlin 2009-10-23 16:18:48 UTC
Gene, if you can figure out how to reproduce this reliably, that would help loads

And if you got that, then you could edit the guest XML to remove the <sound> configuration and see if that makes the bug go away (check that -soundhw is removed from the qemu command line in /var/log/libvirt/qemu)

Comment 18 Justin M. Forbes 2009-10-23 16:27:40 UTC
If this isn't happening so far in the latest F-12 beta, would you mind stressing it a bit more and updating us with results?  I am going to remove this as a blocker unless we hear of a crash with current packages.

Comment 19 Gene Czarcinski 2009-10-23 16:41:35 UTC
Like I said, I sure would like to know if anyone else has seen this problem!

If removing the sound definition and/or it does not happen again, then I cannot see this as an F12blocker.

You might want to consider when creating new guests (such as with virt-manager) to NOT automatically define sound hardware (for now).  Until sound support is implemented (hopefully in F13), this might be prudent.

Unfortunately, my "old" F12 system and my "new" F12-beta system are on the same hardware so I can only run one at a time.

I have started the F12-Beta system and started up an F11 and F12 qemu-kvm guests plus the F11 guest cloned and running just qemu (qemu-system-x86_64) ... boy slow is not the word for this guest.  If there was not qemu-kvm, qemu by itself would not be competition for vmware.

I will try playing with the network definitions (which is what I was doing before) and see what happens.

Comment 20 Gene Czarcinski 2009-11-12 20:32:22 UTC
OK, this problem is still occurring.  I have updated to F12-RC4 + updates as the host.  I have also created an F12-rc4 qemu-kvm guest which is the guest which crashed.

There was only the one guest running and I have no idea what I did which caused the crash ... I was typing into a terminal window on the guest.

abrt is still not cooperating about reporting these bugs since I cannot report them as root.  However, the backtrace look pretty much the same.  If you want it, I can attach it.

Comment 21 Bug Zapper 2009-11-16 13:36:50 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 22 Fedora Admin XMLRPC Client 2010-03-09 17:18:54 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 23 Bug Zapper 2010-11-04 09:29:05 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 24 Bug Zapper 2010-12-04 07:29:20 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.