1441684 – Re-enable op blocker assertions

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1441684 - Re-enable op blocker assertions

Summary: Re-enable op blocker assertions

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Kevin Wolf
QA Contact:	xianwang
Docs Contact:
URL:
Whiteboard:
Depends On:	1452148
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-12 13:13 UTC by Kevin Wolf
Modified:	2018-04-11 00:16 UTC (History)
CC List:	12 users (show)
Fixed In Version:	qemu-kvm-rhev-2.10.0-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1452148 (view as bug list)
Environment:
Last Closed:	2018-04-11 00:16:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:1104	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2018-04-10 22:54:38 UTC

Description Kevin Wolf 2017-04-12 13:13:49 UTC

In commit e3e0003a, upstream qemu disabled the op blocker assertions for the
2.9 release because some bugs could not be fixed in time. After rebasing to
2.9, we'll want to revert the commit and include proper fixes for the bugs.
Without the bugs fixed, op blockers can't keep the promises they are making.

Known problems with op blockers so far that need to be fixed before the commit
can be safely reverted:

* Old style block migration (migrate -b) triggers an assertion because it
  reuses the guest device's BlockBackend. During migration, this BlockBackend
  is not ready to be used yet (its real permissions are only enabled in
  blk_resume_after_migration() immediately before the guest starts to run).
  Block migration needs to use its own BlockBackend here.

* Postcopy migration. Commit d35ff5e6 added blk_resume_after_migration() in two
  places, but postcopy migration uses loadvm_postcopy_handle_run_bh(), which is
  the third one. In order to avoid assertion failures, the call needs to be
  added there as well. Without this fix, the guest device's op blockers are
  ineffective after postcopy migration.

Comment 1 juzhang 2017-06-08 11:05:38 UTC

Hi Cong,

Free to update the QE contact.

Comment 2 Kevin Wolf 2017-10-09 08:08:59 UTC

This is fixed in upstream qemu 2.10.

Postcopy migration was fixed with commit 0042fd36.
Old-style block migration was fixed with the series leading to commit 49695eeb.
Assertions were re-enabled in commit 362b3786.

Comment 4 xianwang 2017-12-13 07:34:05 UTC

Hi,Kevin,
could you help to give steps about how to veriy this bug? Thanks

Comment 5 Kevin Wolf 2017-12-13 09:07:36 UTC

Please just verify that old style block migration (migrate -b) and postcopy migration are working and not causing any assertion failure.

Comment 6 Dr. David Alan Gilbert 2017-12-13 10:39:14 UTC

Kevin: Note that being rhel7 we disable outgoing old style block migration anyway

Comment 7 xianwang 2017-12-14 07:03:42 UTC

As Dave said, the 'migrate -b'(old style block migration) is not supported now as following:
(qemu) migrate -b -d tcp:10.16.47.10:5801 
migrate: unsupported option -d

After confirming with Kevin, for this bug, it just needs to test general postcopy migration, the bug is verified pass on qemu-kvm-rhev-2.10.0-12.el7, as following:
Host:
3.10.0-820.el7.ppc64le
qemu-kvm-rhev-2.10.0-12.el7.ppc64le
SLOF-20170724-5.git89f519f.el8.ppc64le

Guest:
3.10.0-800.el7.ppc64le

1.Boot a guest on src host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.5.0 \
    -vga std \
    -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
    -device spapr-vscsi,id=scsi2 \
    -chardev socket,id=console0,path=/tmp/console0,server,nowait \
    -device spapr-vty,chardev=console0 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \
    -drive file=/home/rhel75-ppc64le-virtio-scsi.qcow2,format=qcow2,if=none,cache=none,id=drive_blk1,werror=stop,rerror=stop \
    -device virtio-blk-pci,drive=drive_blk1,id=blk-disk1,bootindex=0,bus=pci.0,addr=06 \
    -drive file=/home/r1.qcow2,format=qcow2,if=none,cache=none,id=drive_data1,werror=stop,rerror=stop \
    -device virtio-blk-pci,drive=drive_data1,id=blk-data,bus=pci.0,addr=07 \
    -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=10 \
    -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -m 4G \
    -smp 4 \
    -device usb-kbd \
    -device usb-mouse \
    -qmp tcp:0:8881,server,nowait \
    -vnc :1  \
    -msg timestamp=on \
    -rtc base=localtime,clock=vm,driftfix=slew  \
    -monitor stdio \
    -boot order=cdn,once=c,menu=on,strict=off \
    -enable-kvm

in guest, format the data disk, mount it to /mnt and do "dd":
#mkfs.ext4 /dev/vdb
#mount /dev/vdb /mnt
#while true;do dd if=/dev/zero of=/mnt/file2 bs=10M count=10;done

2.Launch listening mode on dst host, qemu cli is same with src host appending "-ingcoming tcp:0:5801"

3.on src host, do postcopy migration
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.16.67.19:5801

4.result
migration complete, vm works well and "dd" is ongoing all the time in guest after postcopy migration, I also can stop "dd" and re-write to that disk again.

src->dst:postcopy migration succeed and vm works well including writing to disk;
dst->src:postcopy migration succeed and vm works well including writing to disk;

so, I think this bug is fixed.
Kevin, do you thinks this verification is ok? thanks

Comment 8 Kevin Wolf 2017-12-14 08:58:19 UTC

Don't you need a "migrate_start_postcopy" command on the source to actually
switch into postcopy mode?

Dave, the important thing is that loadvm_postcopy_handle_run_bh() runs, so that we actually test commit 0042fd36. Is the "migrate_start_postcopy" necessary for that? I assume so, but you can probably say something definite.

Comment 9 Dr. David Alan Gilbert 2017-12-14 09:43:36 UTC

Kevin is correct, that test hasn't actually done postcopy.
You need to:
  a) Start a heavy memory using job in the guest, e.g. the 'stress' command
  b) migrate_set_capability postcopy-ram on
  c) Start the migrate;   migrate -d tcp:host:port
  d) do an 'info migrate' you should see the status as 'active'
  e) migrate_start_postcopy
  f) another 'info migrate' you should see the status as 'postcopy-active'
  g) You should find the destination is responsive
  h) Wait until 'info migrate' returns complete

Include the output of (h) in the results, you should see a count of postcopy-requests.

Comment 10 xianwang 2017-12-14 11:02:46 UTC

(In reply to Dr. David Alan Gilbert from comment #9)
> Kevin is correct, that test hasn't actually done postcopy.
> You need to:
>   a) Start a heavy memory using job in the guest, e.g. the 'stress' command
>   b) migrate_set_capability postcopy-ram on
>   c) Start the migrate;   migrate -d tcp:host:port
>   d) do an 'info migrate' you should see the status as 'active'
>   e) migrate_start_postcopy
>   f) another 'info migrate' you should see the status as 'postcopy-active'
>   g) You should find the destination is responsive
>   h) Wait until 'info migrate' returns complete
> 
> Include the output of (h) in the results, you should see a count of
> postcopy-requests.

Hi, Kevin and Dave,
I am very sorry, I do that as Dave said, but I just forgot to write them to bug comment, I have done it again with "stress", result is also pass, just as following:

stress in guest:
# stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
#while true;do dd if=/dev/zero of=/dev/vdb bs=10M count=10;done

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.16.67.19:5801
(qemu) info migrate
Migration status: active
(qemu) migrate_start_postcopy
(qemu) info migrate
Migration status: postcopy-active
(qemu) info migrate
Migration status: postcopy-active
....
(qemu) info migrate
Migration status: completed

migration succeeds and vm works well on destination.

Comment 11 xianwang 2017-12-14 11:06:32 UTC

Kevin and Dave
so, if you two don't have other problems and agree this verification, I would modify this bug to "verified", thanks

Comment 12 Dr. David Alan Gilbert 2017-12-14 11:53:51 UTC

As a postcopy test I think that's fine.

Comment 14 errata-xmlrpc 2018-04-11 00:16:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104

Note You need to log in before you can comment on or make changes to this bug.