Bug 2107466

Summary: zerocopy capability can be enabled when set migrate capabilities with multifd and compress/xbzrle together
Product: Red Hat Enterprise Linux 9 Reporter: Li Xiaohui <xiaohli>
Component: qemu-kvmAssignee: Leonardo Bras <leobras>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: chayang, chdong, coli, dgilbert, fjin, jinzhao, juzhang, lcheng, leobras, lijin, mdean, peterx, quintela, virt-maint
Version: 9.1Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-7.0.0-11.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2110203 (view as bug list) Environment:
Last Closed: 2022-11-15 09:54:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2110203    

Description Li Xiaohui 2022-07-15 06:59:27 UTC
Description of problem:
For zerocopy, we don't support zerocopy enabled when compress is on. 
But when we set migrate capabilities multifd, zerocopy, compress together, they can succeed;
when set these capability separately, zerocopy can't be enabled under compress enabled, it's the expectation


Version-Release number of selected component (if applicable):
hosts info: hosts info: kernel-5.14.0-121.el9.x86_64 & qemu-kvm-7.0.0-8.el9.x86_64
guest info: kernel-5.14.0-125.el9.x86_64


How reproducible:
100%


Steps to Reproduce:
1.Boot a guest
2.Set multifd, zerocopy, compress capabilities on together:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":true},{"capability":"pause-before-switchover","state":false},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":true},{"capability":"dirty-bitmaps","state":false},{"capability":"return-path","state":false},{"capability":"zero-copy-send","state":true}]}}
{"return": {}}


Actual results:
Same as Steps, enable compress, multifd, zerocopy capabilities together succeed


Expected results:
Can't enable zerocopy and compress capabilities together

If we enable compress, multifd, zerocopy separately, zerocopy will fail to enable:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"compress","state":true}]}}
{"return": {}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"multifd","state":true}]}}
{"return": {}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"zero-copy-send","state":true}]}}
{"error": {"class": "GenericError", "desc": "Zero copy only available for non-compressed non-TLS multifd migration"}}


Additional info:

Comment 1 Li Xiaohui 2022-07-15 07:04:30 UTC
As libvirt always set all migrate capabilities together through one qmp command "migrate-set-capabilities", we should fix this bug.

Comment 2 Li Xiaohui 2022-07-15 07:23:44 UTC
BTW, shall we also avoid to enable compress capability successfully if zerocopy enabled like:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"multifd","state":true}]}}
{"return": {}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"zero-copy-send","state":true}]}}
{"return": {}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"compress","state":true}]}}
{"return": {}}

Comment 3 Leonardo Bras 2022-07-15 20:41:12 UTC
Thanks for reporting Li Xiaohui!

I think I have an idea on why this happens, and just wrote a probable fix.

I tested on a scratch build. Could you please give it a try?
https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1295372

Comment 4 Li Xiaohui 2022-07-24 09:13:06 UTC
Thanks Leoardo to provide the scratch build in https://bugzilla.redhat.com/show_bug.cgi?id=1968509#c26.


I have tested the build (qemu-kvm-7.0.0-8.el9.leonardo202207190013.x86_64), the issues in Description, Comment 2 have been fixed, and we also don't support enable xbzrle with zerocopy enabled.


Only following question still exist, do you plan to fix it?
1.When try to migrate with tls + multifd + zerocopy (note tls certs has been set up on source and destination host)
a. if set tls creds on src and dst host firstly, then enable multifd, zerocopy, will get error prompt, it's the expectation:
{"execute": "migrate-set-parameters", "arguments": {"tls-creds": "tls0"}, "id": "hdTiagq5"}
...
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"capability": "zero-copy-send", "state": true}]}, "id": "wgr8gT3T"}
{"id": "wgr8gT3T", "error": {"class": "GenericError", "desc": "Zero copy only available for non-compressed non-TLS multifd migration"}}
b. but if enable multifd, zerocopy firstly, then set tls creds, all will succeed, but when start migration, migration will fail like:
{"execute": "query-migrate", "id": "qUqoTVuL"}
{"return": {"status": "failed", "error-desc": "Requested Zero Copy feature is not available: Invalid argument"}, "id": "qUqoTVuL"}

My question: for situation b, shall we avoid set tls creds successfully when zerocopy is enabled? or give accurate error prompt than above error-desc like zerocopy is enabled, but don't support tls migration under zerocopy.

Comment 5 Leonardo Bras 2022-07-25 18:13:56 UTC
(In reply to Li Xiaohui from comment #4)
> Thanks Leoardo to provide the scratch build in
> https://bugzilla.redhat.com/show_bug.cgi?id=1968509#c26.
> 
> 
> I have tested the build (qemu-kvm-7.0.0-8.el9.leonardo202207190013.x86_64),
> the issues in Description, Comment 2 have been fixed, and we also don't
> support enable xbzrle with zerocopy enabled.

That's great!

> 
> 
> Only following question still exist, do you plan to fix it?
> 1.When try to migrate with tls + multifd + zerocopy (note tls certs has been
> set up on source and destination host)
> a. if set tls creds on src and dst host firstly, then enable multifd,
> zerocopy, will get error prompt, it's the expectation:
> {"execute": "migrate-set-parameters", "arguments": {"tls-creds": "tls0"},
> "id": "hdTiagq5"}
> ...
> {"execute": "migrate-set-capabilities", "arguments": {"capabilities":
> [{"capability": "zero-copy-send", "state": true}]}, "id": "wgr8gT3T"}
> {"id": "wgr8gT3T", "error": {"class": "GenericError", "desc": "Zero copy
> only available for non-compressed non-TLS multifd migration"}}
> b. but if enable multifd, zerocopy firstly, then set tls creds, all will
> succeed, but when start migration, migration will fail like:
> {"execute": "query-migrate", "id": "qUqoTVuL"}
> {"return": {"status": "failed", "error-desc": "Requested Zero Copy feature
> is not available: Invalid argument"}, "id": "qUqoTVuL"}
> 
> My question: for situation b, shall we avoid set tls creds successfully when
> zerocopy is enabled? or give accurate error prompt than above error-desc
> like zerocopy is enabled, but don't support tls migration under zerocopy.

That's odd. 
I specifically added a test for checking zero-copy enabled & tls_creds when setting a parameter (migrate_params_check()), and it should output the same error message.
I will try to do some debugging on that, and see what could be going wrong.

Comment 6 Leonardo Bras 2022-07-26 01:33:29 UTC
(In reply to Leonardo Bras from comment #5)
> > My question: for situation b, shall we avoid set tls creds successfully when
> > zerocopy is enabled? or give accurate error prompt than above error-desc
> > like zerocopy is enabled, but don't support tls migration under zerocopy.
> 
> That's odd. 
> I specifically added a test for checking zero-copy enabled & tls_creds when
> setting a parameter (migrate_params_check()), and it should output the same
> error message.
> I will try to do some debugging on that, and see what could be going wrong.

I think I found the error, and that is something related to the parameter struct initialization.
It looks like it loads TLS data, even though it says it's not enabled, so the tests for "enable zero-copy" -> "enable tls" will not fail, even though it was desired.

I sent a fix to the mailing list, and I will provide the brew / backport MR as soon as I get some feedback.

Comment 7 John Ferlan 2022-08-01 18:22:22 UTC
Considering the RHEL 8.7.0 cloned bug 2110203 has been posted downstream, I've added the ITR=9.1.0 here as we'll need to fix this in RHEL9 too.

Comment 9 Yanan Fu 2022-08-16 06:57:05 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 12 Li Xiaohui 2022-08-17 09:48:26 UTC
Verify this bug according to Comment 4 on qemu-kvm-7.0.0-11.el9.x86_64, all issues have been fixed. 

Mark this bug verified per test results and remove 'SanityOnly' from 'Verified' since we have test steps to reproduce this bug

Comment 13 Li Xiaohui 2022-08-29 11:29:00 UTC
*** Bug 2106265 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2022-11-15 09:54:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7967