1193826 – Dump progress only show up when memory-only dump finish

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1193826 - Dump progress only show up when memory-only dump finish

Summary: Dump progress only show up when memory-only dump finish

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	7.4
Assignee:	Peter Xu
QA Contact:	hachen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	916061 1401400
TreeView+	depends on / blocked

Reported:	2015-02-18 10:43 UTC by Jaroslav Suchanek
Modified:	2017-08-02 03:22 UTC (History)
CC List:	20 users (show)
Fixed In Version:	qemu-2.8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	916061
Environment:
Last Closed:	2017-08-01 23:27:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:2392	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2017-08-01 20:04:36 UTC

Description Jaroslav Suchanek 2015-02-18 10:43:45 UTC

+++ This bug was initially created as a clone of Bug #916061 +++

Description of problem:
Virsh dump --verbose option can display the progress of dump, but if do memory-only dump, the progress of dump will only show up when dump finish(100%). 

Version-Release number of selected component (if applicable):
libvirt-0.10.2-18.el6.x86_64 

How reproducible:
100%

Steps to Reproduce:
1.
# virsh dump vm1 /tmp/vm1.core --verbose
Dump: [ 40 %]
Dump: [ 67 %]
Dump: [ 78 %]
Dump: [ 91 %]
Dump: [100 %]
Domain vm1 dumped to /tmp/vm1.core

#virsh dump vm1 /tmp/vm1.core --memory-only --verbose

Dump: [100 %]
Domain vm1 dumped to /tmp/vm1.core 
2.
3.
  
Actual results:
Dump progress only show up when memory-only dump finish. 

Expected results:
Dump progress will display in time during memory-only dump. 

Additional info:

Comment 1 Peter Xu 2015-10-20 10:10:40 UTC

Seems libvirt-client bug.

Comment 2 Peter Xu 2015-10-20 15:59:57 UTC

Sorry to have triaged this bug with incaution. Will confirm this bug and re-assignning back to me.

Comment 3 Peter Xu 2015-10-27 07:17:37 UTC

After diagnose this issue, here is the things I got. 

First, let me introduce the scenario here. When we trigger the command:

# virsh dump vm core1 --verbose

At QEMU side, one "migrate" command is received on qemu side. While for the 2nd command:

# virsh dump vm core1 --verbose --memory-only

one "dump-guest-memory" command is received on qemu side. Only the 2nd command will hang virsh command.

There might be two reasons that could hang the dump guest memory command:

1. QEMU side: qemu handles "dump-guest-memory" synchronously, which means that the command will not return before all VM memories dumped. Please refers to function qmp_dump_guest_memory() in QEMU codes.

2. libvirtd side: when libvirtd handles dump request in this case, one process is created (libvirt_iohelper program) to sync the core file (make sure data is written to disk), and the libvirtd thread would not return before child reaped (or say, sync() finish). Please refers to doCoreDump() and virFileWrapperFdClose() in libvirt codes. 

When virsh hangs (100% reproduce-able on my test machine), it is caused by (2) not (1). QEMU dumps the results very quickly (to memory actually, which is XFS cache, and my machine has lots of memory), but libvirtd is always waiting for child, guessing that's why virsh hanged:

#0  0x00007efc81970ca9 in waitpid () from /lib64/libpthread.so.0
#1  0x00007efc842edd1a in virProcessWait (pid=pid@entry=7762, exitstatus=exitstatus@entry=0x7efc73ef3884, raw=raw@entry=true) at util/virprocess.c:247
#2  0x00007efc842aeb0d in virCommandWait (cmd=0x7efc6800f890, exitstatus=exitstatus@entry=0x0) at util/vircommand.c:2523
#3  0x00007efc842be35a in virFileWrapperFdClose (wfd=wfd@entry=0x7efc68003730) at util/virfile.c:327
#4  0x00007efc47730f64 in doCoreDump (driver=driver@entry=0x7efc2411e5b0, vm=0x7efc24207950, path=path@entry=0x7efc68003bb0 "/root/peter/bug/core1", compress=<optimized out>, dump_flags=dump_flags@entry=16, dumpformat=dumpformat@entry=0) at qemu/qemu_driver.c:3649
#5  0x00007efc477313e2 in qemuDomainCoreDumpWithFormat (dom=0x7efc6800c8d0, path=0x7efc68003bb0 "/root/peter/bug/core1", dumpformat=0, flags=16) at qemu/qemu_driver.c:3749
#6  0x00007efc84390b9d in virDomainCoreDump (domain=domain@entry=0x7efc6800c8d0, to=0x7efc680037a0 "/root/peter/bug/core1", flags=16) at libvirt-domain.c:1264
#7  0x00007efc84fecf92 in remoteDispatchDomainCoreDump (server=0x7efc85514ed0, msg=0x7efc8552da00, args=0x7efc68004190, rerr=0x7efc73ef3c30, client=<optimized out>) at remote_dispatch.h:3501
#8  remoteDispatchDomainCoreDumpHelper (server=0x7efc85514ed0, client=<optimized out>, msg=0x7efc8552da00, rerr=0x7efc73ef3c30, args=0x7efc68004190, ret=0x7efc68013fe0) at remote_dispatch.h:3477
#9  0x00007efc84409342 in virNetServerProgramDispatchCall (msg=0x7efc8552da00, client=0x7efc8552dc70, server=0x7efc85514ed0, prog=0x7efc85528310) at rpc/virnetserverprogram.c:437
#10 virNetServerProgramDispatch (prog=0x7efc85528310, server=server@entry=0x7efc85514ed0, client=0x7efc8552dc70, msg=0x7efc8552da00) at rpc/virnetserverprogram.c:307
#11 0x00007efc844045bd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7efc85514ed0) at rpc/virnetserver.c:135
#12 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7efc85514ed0) at rpc/virnetserver.c:156
#13 0x00007efc842ff4c5 in virThreadPoolWorker (opaque=opaque@entry=0x7efc85514a10) at util/virthreadpool.c:145
#14 0x00007efc842fe9e8 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#15 0x00007efc81969dc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007efc816971cd in clone () from /lib64/libc.so.6

Yes the parent is waiting for its child to go back (libvirt_iohelper), and if you see the libvirt_iohelper program, it hangs in kernel like:

[<ffffffff812c9b08>] get_request+0x218/0x780
[<ffffffff812cbdc6>] blk_queue_bio+0xc6/0x3a0
[<ffffffff812c71a2>] generic_make_request+0xe2/0x130
[<ffffffff812c7261>] submit_bio+0x71/0x150
[<ffffffffa03a6323>] xfs_submit_ioend_bio.isra.12+0x33/0x40 [xfs]
[<ffffffffa03a641f>] xfs_submit_ioend+0xef/0x130 [xfs]
[<ffffffffa03a70f2>] xfs_vm_writepage+0x2a2/0x5d0 [xfs]
[<ffffffff81173a63>] __writepage+0x13/0x50
[<ffffffff81174581>] write_cache_pages+0x251/0x4d0
[<ffffffff8117484d>] generic_writepages+0x4d/0x80
[<ffffffffa03a6993>] xfs_vm_writepages+0x43/0x50 [xfs]
[<ffffffff811758fe>] do_writepages+0x1e/0x40
[<ffffffff8116a685>] __filemap_fdatawrite_range+0x65/0x80
[<ffffffff8116a79a>] filemap_write_and_wait_range+0x2a/0x70
[<ffffffffa03b1386>] xfs_file_fsync+0x66/0x200 [xfs]
[<ffffffff8120f8f5>] do_fsync+0x65/0xa0
[<ffffffff8120fbe3>] SyS_fdatasync+0x13/0x20
[<ffffffff816457c9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

So to finally fix this (tiny) issue, I think both issues (1) and (2) should be solved. For (1), I think that might not be hard to async the dump memory operation in QEMU. Any libvirt team guys could help confirm and provide comments for (2)?

Comment 4 Peter Krempa 2015-11-30 12:32:08 UTC

AFAIK if the iohelper hangs due to waiting for a sync it's correct, since we want to make sure that the data was properly written before returning to the user.

Comment 5 Peter Xu 2015-12-14 04:06:43 UTC

(In reply to Peter Krempa from comment #4)
> AFAIK if the iohelper hangs due to waiting for a sync it's correct, since we
> want to make sure that the data was properly written before returning to the
> user.

I have posted one patch from QEMU side to be able to detach dump on upstream, which is still during review:

https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg01299.html

Regarding to libvirt part, if it's by design (and we decide to keep this), how should we proceed with this bug?

Comment 6 Peter Krempa 2015-12-14 09:43:10 UTC

Well, for libvirt the first step will be to wire up QMP introspection so that we'll be able to detect the new argument for the command. Once that is done the new capability bit will be used to supply the 'detach': 'true' argument to that command and the new event will be used to notify it's finish.

The first step in this case is the QMP introspection though since that's the new correct approach to detect support of the argument. (Previously we'd try the argument and detect errors but since we have the new interface it should be used)

Comment 7 Peter Xu 2016-03-03 04:43:27 UTC

(In reply to Peter Krempa from comment #6)
> Well, for libvirt the first step will be to wire up QMP introspection so
> that we'll be able to detect the new argument for the command. Once that is
> done the new capability bit will be used to supply the 'detach': 'true'
> argument to that command and the new event will be used to notify it's
> finish.
> 
> The first step in this case is the QMP introspection though since that's the
> new correct approach to detect support of the argument. (Previously we'd try
> the argument and detect errors but since we have the new interface it should
> be used)

Hi, Peter, 

Agree that we should do this after QMP introspection work.

Just an update to the bz that detached dump support on QEMU side has been merged upstream already (commit 1fbeff72c2ba17aedfbf0d57caf8945862725c54). Also: 

1. "query-dump" command is added to query latest dump status asynchronously (commit 39ba2ea61f510512764e6f9063a8396d05fe57fe)

2. QMP event DUMP_COMPLETED is added to notify the completion of the dump (commit d42a0d1484f03d05184ed9bb867fee42476a588f)

Would you mind if I move this bz component to libvirt for further triage?

--peterx

Comment 8 John Ferlan 2016-03-03 11:39:24 UTC

There already is a bz for libvirt on this (see the blocks bz 916061).  Once this is completely ready, then I get the pleasure of working on the libvirt side.

Comment 9 Peter Xu 2016-03-04 03:12:44 UTC

(In reply to John Ferlan from comment #8)
> There already is a bz for libvirt on this (see the blocks bz 916061).  Once
> this is completely ready, then I get the pleasure of working on the libvirt
> side.

Thanks for the info. I am still trying to figure out the workflow...

So I guess I will first need to wait for upstream dump detach fully tested (maybe several months?), and also wait for rhel-7.3 pending flag of current bz. 

Sorry for the troublesome. Removing needinfo.

Comment 10 Peter Xu 2017-01-12 06:49:01 UTC

Hi,

QEMU should be supporting detached dump now after the 7.4 rebase. Since it's been a long time since last post, I'll do a summary below.

A new parameter "detach" is added to "dump-guest-memory" QMP command:

# @detach: #optional if true, QMP will return immediately rather than
#          waiting for the dump to finish. The user can track progress
#          using "query-dump". (since 2.6).

Also, another new command "query-dump" is added to query detached dump status:

##
# @query-dump:
#
# Query latest dump status.
#
# Returns: A @DumpStatus object showing the dump status.
#
# Since: 2.6
##
{ 'command': 'query-dump', 'returns': 'DumpQueryResult' }

Now when dump completes, we'll emit a new event to notify that:

##
# @DUMP_COMPLETED:
#
# Emitted when background dump has completed
#
# @result: DumpQueryResult type described in qapi-schema.json.
#
# @error: #optional human-readable error string that provides
#         hint on why dump failed. Only presents on failure. The
#         user should not try to interpret the error string.
#
# Since: 2.6
##
{ 'event': 'DUMP_COMPLETED' ,
  'data': { 'result': 'DumpQueryResult', '*error': 'str' } }

That's all I know that can be done on QEMU side. Thanks.

Comment 11 Peter Xu 2017-01-12 06:52:10 UTC

Hmm, if I understand it correctly (and no one disagree), should I mark this one as MODIFIED?

Comment 17 hachen 2017-03-13 10:44:39 UTC

HOST:
kernel-3.10.0-588.el7.x86_64
kernel-debuginfo-common-x86_64-3.10.0-514.el7.x86_64
kernel-debuginfo-3.10.0-514.el7.x86_64

qemu-kvm-rhev-2.8.0-6.el7.x86_64

GUEST:
kernel-3.10.0-514.el7.x86_64


Boot up guest, test "dump-guest-memory" with param "detach" and new "query-dump".

Results are as expected.

 {"execute": "dump-guest-memory", "arguments": { "detach": true, "paging": false, "protocol": "file:/home/dump.normal"}}
{"timestamp": {"seconds": 1489391067, "microseconds": 147976}, "event": "STOP"}
{"return": {}}
{"timestamp": {"seconds": 1489391068, "microseconds": 219429}, "event": "DUMP_COMPLETED", "data": {"result": {"total": 2164457472, "status": "completed", "completed": 2164457472}}}
{"timestamp": {"seconds": 1489391068, "microseconds": 219876}, "event": "RESUME"}

 {"execute": "query-dump"}
{"return": {"total": 2164457472, "status": "completed", "completed": 2164457472}}

Comment 21 errata-xmlrpc 2017-08-01 23:27:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 22 errata-xmlrpc 2017-08-02 01:04:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 23 errata-xmlrpc 2017-08-02 01:56:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 24 errata-xmlrpc 2017-08-02 02:37:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 25 errata-xmlrpc 2017-08-02 03:02:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 26 errata-xmlrpc 2017-08-02 03:22:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.

chayang
coli
cwei
dyuan
hhuang
huding
jen
jferlan
jsuchane
juzhang
michen
mrezanin
mzhan
peterx
pkrempa
rbalakri
virt-bugs
virt-maint
yafu
ydu