RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1530130 - Target host in nova DB got updated to new compute while migration failed and qemu-kvm process was still running on source host. [rhel-7.4.z]
Summary: Target host in nova DB got updated to new compute while migration failed and ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: zhe peng
URL:
Whiteboard:
Depends On: 1401173
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-02 06:58 UTC by Oneata Mircea Teodor
Modified: 2021-06-10 14:04 UTC (History)
26 users (show)

Fixed In Version: libvirt-3.2.0-14.el7_4.9
Doc Type: Bug Fix
Doc Text:
Cause: Libvirt advertised migration as completed in migration statistics report immediately after QEMU finished sending data to the destination. Consequence: Management software monitoring migration may see a migration finished even though the domain may fail to start on the destination. Fix: Libvirt was patched to report migration as completed only after the domain is already running on the destination. Result: Management software won't react strangely to a failed migration.
Clone Of: 1401173
Environment:
Last Closed: 2018-03-06 21:41:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 409260 0 None ABANDONED libvirt: update the active migrations DS to support migration result 2020-07-31 01:06:19 UTC
Red Hat Knowledge Base (Solution) 2832101 0 None None None 2018-01-02 06:59:55 UTC
Red Hat Product Errata RHBA-2018:0403 0 normal SHIPPED_LIVE libvirt bug fix update 2018-03-07 02:53:58 UTC

Description Oneata Mircea Teodor 2018-01-02 06:58:57 UTC
This bug has been copied from bug #1401173 and has been proposed to be backported to 7.4 z-stream (EUS).

Comment 3 Jiri Denemark 2018-01-11 21:54:26 UTC
The patch mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1401173#c34 caused a regression in reporting statistics of a completed job. See bug 1523036 for more details and an additional patch which will need to be backported to avoid the regression in 7.4.z.

Comment 6 zhe peng 2018-01-17 09:27:47 UTC
I can reproduce this with build : 
libvirt-3.2.0-14.el7.x86_64

verify with build:
libvirt-3.2.0-14.el7_4.8.x86_64

step:
1. prepare migration env.(two hosts)
2. on the destination host attach gdb to libvirtd, set breakpoint to
    qemuMigrationFinish, and let the daemon continue:
    # gdb -p $(pidof libvirtd)
    (gdb) br qemuMigrationFinish
    (gdb) c

2. migrate a domain to the destination host
# virsh migrate rhel --live qemu+ssh://$target_host/system --verbose

3. once gdb stops at the breakpoint check 'virsh domjobinfo DOM' on the source
   host
on source host:
# virsh domjobinfo rhel
Job type:         Unbounded   
Operation:        Outgoing migration
Time elapsed:     5773         ms
Data processed:   169.265 MiB
Data remaining:   0.000 B
Data total:       1.102 GiB
Memory processed: 169.265 MiB
Memory remaining: 0.000 B
Memory total:     1.102 GiB
Memory bandwidth: 109.149 MiB/s
Dirty rate:       0            pages/s
Iteration:        3           
Constant pages:   742625      
Normal pages:     127638      
Normal data:      498.586 MiB
Expected downtime: 20           ms
Setup time:       9            ms


4. kill the qemu-kvm process on the destination host

5. let gdb continue with executing libvirtd (this will likely need to be done
   twice since gdb may stop at SIGPIPE after the first one)
    (gdb) c

6. check migration failed and the domain is still running on the source

Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2018-01-17T09:21:38.632270Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0)

7. check guest on source 
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     rhel                           running

Comment 7 zhe peng 2018-01-18 03:27:54 UTC
Hi jirka,
  I found an issue when i do some free testing for this patch, please help check if it is a regression?
  below is the output of domjobinfo with --completed 
# virsh domjobinfo rhel --completed
Job type:         Completed   
Operation:        Outgoing migration
Time elapsed:     2053         ms
Time elapsed w/o network: 2041         ms
Total downtime:   80           ms
Downtime w/o network: 68           ms

but with libvirt-3.2.0-14.el7.x86_64, it is
# virsh domjobinfo rhel --completed
Job type:         Completed   
Operation:        Outgoing migration
Time elapsed:     5822         ms
Time elapsed w/o network: 5817         ms
Data processed:   595.598 MiB
Data remaining:   0.000 B
Data total:       1.102 GiB
Memory processed: 595.598 MiB
Memory remaining: 0.000 B
Memory total:     1.102 GiB
Memory bandwidth: 111.518 MiB/s
Dirty rate:       0            pages/s
Iteration:        16          
Constant pages:   193151      
Normal pages:     151752      
Normal data:      592.781 MiB
Total downtime:   383          ms
Downtime w/o network: 378          ms
Setup time:       12           ms

some output didn't show up.

Comment 8 Jiri Denemark 2018-01-19 13:13:04 UTC
Yeah, it's a regression. When backporting the patches I intentionally skipped some refactoring patches and didn't properly adjust the rest.

Comment 9 Jiri Denemark 2018-01-19 13:15:55 UTC
The patch mentioned in comment 3, which was supposed to fix a regression, may crash libvirtd in some cases. See bug 1536351 for more details. In other words, one more patch is needed here.

Comment 11 yafu 2018-01-29 06:43:03 UTC
Verified the issue in the comment 9 with libvirt-3.2.0-14.el7_4.9.

Test steps:
1.Do migration with '--persistent' and '--offline' options:
# virsh migrate rhel qemu+ssh://10.66.4.116/system --offline --verbose  --persistent
root.4.116's password: 
Migration: [100 %]

Comment 12 zhe peng 2018-01-29 07:55:32 UTC
Verified comment 7 with build libvirt-3.2.0-14.el7_4.9

# virsh migrate rhel --live qemu+ssh://$target_host/system --verbose 
Migration: [100 %]
# virsh domjobinfo rhel --completed
Job type:         Completed   
Operation:        Outgoing migration
Time elapsed:     1124         ms
Time elapsed w/o network: 1122         ms
Data processed:   3.305 MiB
Data remaining:   0.000 B
Data total:       1.102 GiB
Memory processed: 3.305 MiB
Memory remaining: 0.000 B
Memory total:     1.102 GiB
Memory bandwidth: 38.463 MiB/s
Dirty rate:       0            pages/s
Iteration:        2           
Constant pages:   288783      
Normal pages:     211         
Normal data:      844.000 KiB
Total downtime:   59           ms
Downtime w/o network: 57           ms
Setup time:       6            ms

Comment 13 zhe peng 2018-01-29 07:56:11 UTC
per comment 11 & comment 12, move to verified.

Comment 16 errata-xmlrpc 2018-03-06 21:41:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0403


Note You need to log in before you can comment on or make changes to this bug.