Bug 1264781 - Failed to save guest and virsh hangs after save/restore guests several times.
Failed to save guest and virsh hangs after save/restore guests several times.
Product: Fedora
Classification: Fedora
Component: xen (Show other bugs)
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Michael Young
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2015-09-21 04:22 EDT by Lin Liu
Modified: 2016-07-19 16:38 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-07-19 16:38:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
libvirtd-log (9.31 KB, text/plain)
2015-09-23 02:38 EDT, Lin Liu
no flags Details
libvirtd-debug.log (144.76 KB, text/plain)
2015-09-24 03:19 EDT, Lin Liu
no flags Details

  None (edit)
Description Lin Liu 2015-09-21 04:22:53 EDT
Description of problem:
Failed to save guest and virsh hangs after save/restore guests more than 6 times.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a RHEL7.2 guest on Fedora22 Xen (without PCI device assigned).

2. Save this guest to a checkpoint file
   # virsh save ${DomU_ID} ${checkpoint_filename}

3. Restore this guest from the checkpoint file
   # virsh restore ${checkpoint_filename}

4. Do step2/3 on the same guest for 10 times.

Actual results:
After save/restore guests for about 6 times, save guest failed with error message as below, and virsh hangs (e.g., using "virsh list" command, it has no respons).
[root@dhcp-66-73-92 ~]# virsh save 9 check
2015-09-17 08:17:10.003+0000: 2648: info : libvirt version:, package: 2.fc22 (Fedora Project, 2015-06-06-15:21:32, buildvm-13.phx2.fedoraproject.org)
2015-09-17 08:17:10.003+0000: 2648: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f1b5c219320 after 6 keepalive messages in 36 seconds
error: Failed to save domain 9 to check
2015-09-17 08:17:10.004+0000: 2647: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f1b5c219320 after 6 keepalive messages in 36 seconds
2015-09-17 08:17:10.004+0000: 2646: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f1b5c219320 after 6 keepalive messages in 36 seconds

Expected results:
Save and restore guests successfully for each time.

Additional info:
1. Using xl to save/restore guests for above 10 times, it success every time.
2. I have tested save/restore guests with xen-4.5.1-6.fc22.x86_64 and libvirt- 10 times, this bug cannot be reproduced.
3. There is a similar bug (report the same error message) with RHEL7.2 libvirt during migration: https://bugzilla.redhat.com/show_bug.cgi?id=1256213
Comment 1 Cole Robinson 2015-09-21 17:47:01 EDT
Thanks for the report. Can you install libvirt-debuginfo, restart libvirtd, reproduce the hang, and attach the output of:    sudo pstack `pidof libvirtd`
Comment 2 Lin Liu 2015-09-23 02:38 EDT
Created attachment 1076097 [details]
Comment 3 Cole Robinson 2015-09-23 09:05:04 EDT
Relevant stack bit below appears normal but I don't know libxl much.

Lin can clear out /var/log/libvirt/libxl/libxl-driver.log and /var/log/libvirt/libxml/$vmname.log, restart libvirtd, reproduce the issue, and attach those files as well?

Seems to be doing the correct thing:

Thread 10 (Thread 0x7f0b1046c700 (LWP 3731)):
#0  0x00007f0b1e7b654d in read () from /lib64/libpthread.so.0
#1  0x00007f0b1f244700 in libxl_read_exactly () from /lib64/libxenlight.so.4.5
#2  0x00007f0b1f24a353 in helper_stdout_readable () from /lib64/libxenlight.so.4.5
#3  0x00007f0b1f24f211 in afterpoll_internal () from /lib64/libxenlight.so.4.5
#4  0x00007f0b1f24f4bc in eventloop_iteration () from /lib64/libxenlight.so.4.5
#5  0x00007f0b1f25099b in libxl.ao_inprogress () from /lib64/libxenlight.so.4.5
#6  0x00007f0b1f21e71a in libxl_domain_suspend () from /lib64/libxenlight.so.4.5
#7  0x00007f0b065fa715 in libxlDoDomainSave (driver=driver@entry=0x7f0b0010f170, vm=vm@entry=0x7f0ad8005f60, to=to@entry=0x7f0ae0008080 "/root/check") at libxl/libxl_driver.c:1345
#8  0x00007f0b065fad42 in libxlDomainSaveFlags (dom=0x7f0ae0002dd0, to=0x7f0ae0008080 "/root/check", dxml=<optimized out>, flags=<optimized out>) at libxl/libxl_driver.c:1413
#9  0x00007f0b221093e4 in virDomainSave (domain=domain@entry=0x7f0ae0002dd0, to=0x7f0ae00015d0 "/root/check") at libvirt-domain.c:841
#10 0x00007f0b22bf8d2a in remoteDispatchDomainSave (server=<optimized out>, msg=<optimized out>, args=0x7f0ae0006c40, rerr=0x7f0b1046bc30, client=<optimized out>) at remote_dispatch.h:7264
#11 remoteDispatchDomainSaveHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f0b1046bc30, args=0x7f0ae0006c40, ret=<optimized out>) at remote_dispatch.h:7242
#12 0x00007f0b221788e9 in virNetServerProgramDispatchCall (msg=0x7f0b24b483e0, client=0x7f0b24b48940, server=0x7f0b24b1c1b0, prog=0x7f0b24b44f40) at rpc/virnetserverprogram.c:437
#13 virNetServerProgramDispatch (prog=0x7f0b24b44f40, server=server@entry=0x7f0b24b1c1b0, client=0x7f0b24b48940, msg=0x7f0b24b483e0) at rpc/virnetserverprogram.c:307
#14 0x00007f0b22c09ba8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f0b24b1c1b0) at rpc/virnetserver.c:172
#15 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f0b24b1c1b0) at rpc/virnetserver.c:193
#16 0x00007f0b220765b6 in virThreadPoolWorker (opaque=opaque@entry=0x7f0b24b28aa0) at util/virthreadpool.c:144
#17 0x00007f0b22075f5e in virThreadHelper (data=<optimized out>) at util/virthread.c:197
#18 0x00007f0b1e7ae555 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f0b1e4e8f3d in clone () from /lib64/libc.so.6
Comment 4 Lin Liu 2015-09-24 02:52:13 EDT
Hi Cole,

I have tried as you said several times, there is not any record in /var/log/libvirt/libxl/libxl-driver.log or /var/log/libvirt/libxl/$vmname.log when the bug occurred, also I have open the libvirtd debuginfo, there isn't any record in debuginfo, either. When the guest is saved succesfully, there is record in log files.

Sorry for no userfull information. I will keep watch about this issue, if got some log files, I will attach it.

Lin Liu
Comment 5 Lin Liu 2015-09-24 03:19 EDT
Created attachment 1076370 [details]
Comment 6 Lin Liu 2015-09-24 03:22:07 EDT
Hi Cole,

Please ignore the comment 4, log from /var/log/libvirt/libxl/$vmname.log is as below:
libxl: debug: libxl.c:950:libxl_domain_suspend: ao 0x7f1784005ff0: create: how=(nil) callbac        k=(nil) poller=0x7f17ac1c29e0
libxl: debug: libxl_dom.c:1570:libxl__toolstack_save: domain=26 toolstack data size=8
libxl: debug: libxl.c:972:libxl_domain_suspend: ao 0x7f1784005ff0: inprogress: poller=0x7f17        ac1c29e0, flags=i
libxl-save-helper: debug: starting save: Success
xc: detail: xc_domain_save: starting save of domid 26
libxl: debug: libxl_dom.c:1275:domain_suspend_callback_common: issuing PVHVM suspend request         via XenBus control node
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f17840091a0 wpath=/lo        cal/domain/26/control/shutdown token=2/3: register slotnum=2

And in comment 5 I attached the libvirtd debug log.
No new record in /var/log/libvirt/libxl/libxl-driver.log when the error occurred.

Lin Liu
Comment 7 Cole Robinson 2015-09-24 08:52:38 EDT
Reassigning to xen since it doesn't seem like libvirt is doing anything wrong here
Comment 8 Fedora End Of Life 2016-07-19 16:38:40 EDT
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.