Bug 1099514 - watchdog dump action hangs
Summary: watchdog dump action hangs
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 812809
TreeView+ depends on / blocked
 
Reported: 2014-05-20 13:48 UTC by Richard W.M. Jones
Modified: 2016-04-14 17:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-14 17:29:48 UTC
Embargoed:


Attachments (Terms of Use)
Complete guest XML. (1.48 KB, text/plain)
2014-05-20 13:51 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2014-05-20 13:48:37 UTC
Description of problem:

As far as I can tell, the watchdog dump action just hangs.  The
guest is paused, the dump directory and file is created (zero-sized)
and that's it.

Version-Release number of selected component (if applicable):

libvirt-daemon-1.2.3-2.fc21.x86_64
qemu-2.0.0-3.fc21.x86_64

How reproducible:

100%

Steps to Reproduce:

Create a Linux guest.

In the libvirt XML, set:

  <!-- only a small amount of memory because dump takes a long time -->
  <memory unit='MiB'>768</memory>
  <currentMemory unit='MiB'>768</currentMemory>
  ...
    <!-- Add an ib700 watchdog for testing -->
    <watchdog model='ib700' action='dump'/>

Inside the guest, clone the watchdog test framework (http://git.annexia.org/?p=watchdog-test-framework.git)
and compile it.  It only requires glibc-static and gcc.

Inside the guest: sudo modprobe ib700wdt

Inside the guest: sudo ./watchdog-test

After 90 seconds, the guest will pause.

Now observe the /var/lib/libvirt/qemu/dump/ directory on the host.

Actual results:

The guest is paused and never resumes:

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 5     bz812809-ib700                 paused

There is a zero-sized dump file which never grows:

# ls -l /var/lib/libvirt/qemu/dump/
total 0
-rw-------. 1 root root 0 May 20 14:34 bz812809-ib700-1400592853

Libvirtd has a libvirt_iohelper child process which does nothing:

 1135 ?        Ssl    0:00 /usr/sbin/libvirtd
 2202 ?        S      0:00  \_ /usr/libexec/libvirt_iohelper /var/lib/libvirt/qemu/dump/bz812809-ib700-1400592853 0 1

# ls -l /proc/2202/fd 
total 0
lr-x------. 1 root root 64 May 20 14:46 0 -> pipe:[28935]
l-wx------. 1 root root 64 May 20 14:46 1 -> /var/lib/libvirt/qemu/dump/bz812809-ib700-1400592853
l-wx------. 1 root root 64 May 20 14:46 2 -> pipe:[28936]

# strace -p 2202
Process 2202 attached
read(0, 

Qemu is running and consuming some CPU, but not appearing to
do very much.

Nothing is printed in the log file.

Expected results:

Guest dumps and resumes.

Additional info:

Bug 1004400 is slightly different from this.

The bug I'm actually trying to diagnose is bug 812809.

Comment 1 Richard W.M. Jones 2014-05-20 13:51:33 UTC
Created attachment 897577 [details]
Complete guest XML.

Comment 2 Cole Robinson 2016-04-13 15:33:47 UTC
Nice reproducer :)

action=dump is broken upstream but for an entirely different reason. Once I fixed that, action=dump seems to work, at least it produces large files in /var/lib/libvirt/qemu/dump/$vmname-$timestamp but I didn't verify they are useful.

Here's the other dump fix:

http://www.redhat.com/archives/libvir-list/2016-April/msg00726.html

Comment 3 Cole Robinson 2016-04-14 17:29:48 UTC
commit a91177c8f7b432e67d2e232650d7debbbfc694da
Author: Cole Robinson <crobinso>
Date:   Wed Apr 13 11:20:19 2016 -0400

    qemu: command: don't overwrite watchdog dump action
    
    The watchdog cli refactoring in 4666b762 dropped the temporary variable
    we use to convert to action=dump to action=pause for the qemu cli, and
    stored the converted value in the domain structure. Our other watchdog
    handling code then treated it as though the user requested action=pause,
    which broke action=dump handling.
    
    Revive the temporary variable to fix things.


Note You need to log in before you can comment on or make changes to this bug.