Bug 626544

Summary: cannot perform a "virsh dump" of a crashed KVM guest
Product: Red Hat Enterprise Linux 6 Reporter: Dave Anderson <anderson>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: berrange, caiqian, eblake, lihuang, lwang, mkenneth, phan, virt-maint, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-25 13:25:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Dave Anderson 2010-08-23 19:44:22 UTC
Description of problem:

I am trying to perform a "virsh dump" of a KVM guest that has crashed,
but the command times out and then fails with the error message:

  error: Failed to shutdown domain rhel6-smp-1g
  error: Timed out during operation: cannot acquire state change lock

Version-Release number of selected component (if applicable):

 KVM host:

  kernel-2.6.32-68.el6.x86_64
  libvirt-0.8.1-27.el6.x86_64
  qemu-kvm-0.12.1.2-2.112.el6.x86_64

 KVM guest:

  kernel-2.6.32-68.el6.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. start guest session
2. log into KVM guest, and force a panic:

   # echo 1 > /proc/sys/kernel/sysrq
   # echo c > /proc/sysrq-trigger

3. from KVM host:

   # virsh dump <KVM-guest-name> /var/crash/vmcore

Actual results:

  # virsh dump rhel6-smp-1g /var/crash/vmcore
  error: Failed to core dump domain rhel6-smp-1g to /var/crash/vmcore
  error: Timed out during operation: cannot acquire state change lock
  #

And FWIW, nothing seems to be able to communicate with the guest:
  
  # virsh shutdown rhel6-smp-1g
  error: Failed to shutdown domain rhel6-smp-1g
  error: Timed out during operation: cannot acquire state change lock
  # virsh suspend rhel6-smp-1g
  error: Failed to suspend domain rhel6-smp-1g
  error: Timed out during operation: cannot acquire state change lock
  #

Expected results:

Create vmcore.

Additional info:

  The same error occurs if I try to "Pause", "Shutdown", or "Force Off"
  the guest from the virt-manager GUI.  It just continues to show the
  guest as "Running", but it's unresponsive to any commands.

Comment 2 Dave Anderson 2010-08-23 21:00:11 UTC
This looks to be a duplicate of BZ #623903:

  https://bugzilla.redhat.com/show_bug.cgi?id=623903
  623903 - query-balloon commmand didn't return on pasued guest cause virt-manger hang

But that one is currently only flagged as: rhel‑6.1.0?

I would think that an issue of this magnitude would have to addressed
in RHEL6.0?

Adding lwang to the cc: list for her input.

Comment 4 Daniel Berrange 2010-08-24 09:08:26 UTC
> This looks to be a duplicate of BZ #623903:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=623903
>   623903 - query-balloon commmand didn't return on pasued guest cause
> virt-manger hang

It isn't technically a duplicate, but it is a pretty closely related problem. That BZ is refering to the case where QEMU has explicitly paused guest execution, so it *knows* that the guest won't respond. In this case the QEMU still has the guest running, but the guest OS has crashed. QEMU doesn't know this, so it'll never get a response. We can likely generalize the solution for 623903 to cope with this 'guest crashed' case too though.