Hide Forgot
Description of problem: migrate failed on no-socket protocl when we open multifd How reproducible: 100% Steps to Reproduce: 1. /usr/libexec/qemu-kvm -cpu host -m 2g -monitor stdio 2. migrate_set_capability multifd on 3. migrate "exec:gzip -c > STATEFILE.gz" Actual results: Segmentation fault (core dumped) Expected results: cmd exit normally (success or fail) Additional info: Upstream fix: https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg04247.html
I could also get qemu core dump when do offline migration with multifd enabled on the latest rhelav-8.5.0(qemu-kvm-6.0.0-24.module+el8.5.0+11844+1e3017bd.x86_64). But as I know, we shouldn't do offline migration with multifd enabled because multifd only supports live migration. Juan, is right? I also tried libvirt to test multifd + offline migration, get libvirt doesn't support such combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64). Fangge, could you help check whether the below libvirt cmd is right? [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose --parallel error: command 'save' doesn't support option --parallel
(In reply to Li Xiaohui from comment #1) > I also tried libvirt to test multifd + offline migration, get libvirt > doesn't support such > combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64). > Fangge, could you help check whether the below libvirt cmd is right? > [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose > --parallel > error: command 'save' doesn't support option --parallel In libvirt, save doesn't support option --parallel for now. Is there any reason to support it?
(In reply to Fangge Jin from comment #2) > (In reply to Li Xiaohui from comment #1) > > > I also tried libvirt to test multifd + offline migration, get libvirt > > doesn't support such > > combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64). > > Fangge, could you help check whether the below libvirt cmd is right? > > [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose > > --parallel > > error: command 'save' doesn't support option --parallel > > In libvirt, save doesn't support option --parallel for now. Is there any > reason to support it? Hi Yiding, Could you help answer above question?
Hi, Xiaohui (In reply to Li Xiaohui from comment #3) > (In reply to Fangge Jin from comment #2) > > (In reply to Li Xiaohui from comment #1) > > > > > I also tried libvirt to test multifd + offline migration, get libvirt > > > doesn't support such > > > combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64). > > > Fangge, could you help check whether the below libvirt cmd is right? > > > [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose > > > --parallel > > > error: command 'save' doesn't support option --parallel > > > > In libvirt, save doesn't support option --parallel for now. Is there any > > reason to support it? No, we don't need to support it in libvirt if it is not supported. Just output an error in qemu like libvirt instead of core dumped. Otherwise users will be confused with the core dumped. > > Hi Yiding, > Could you help answer above question?
Assigned to Meirav for initial triage per bz process and age of bug created or assigned to virt-maint without triage.
There's some (unreviewed) patches on the upstream list that might fix the crash: 935 T 16/07 Li Zhijian (130) [PATCH 1/2] migration: allow multifd for socket protocol only 936 O T 16/07 Li Zhijian ( 0) └─>[PATCH 2/2] migration: allow enabling mutilfd for specific protocol only 937 O T 19/07 lizhijian@fujitsu ( 0) └─>回复: [PATCH 2/2] migration: allow enabling mutilfd for specific protocol only
Hi PULL request has been senst upstream.
Since this is a Fujitsu related bug, I did not create a RHEL9 specific clone or move the bug to RHEL9. If resolution needs testing in RHEL9, then a clone should be created when the bug is resolved for RHEL8.
Update Product to Advanced Virtualization. Normal RHEL don't have multifd at all.
This is the brew for the merge request. https://bugzilla.redhat.com/show_bug.cgi?id=1982224 xiaohli: Could you test? And once there, can you give qa+ back? The problem was on AV, not on normal one.
(In reply to Juan Quintela from comment #10) > This is the brew for the merge request. > > https://bugzilla.redhat.com/show_bug.cgi?id=1982224 Juan, which brew does you mean? > > xiaohli: Could you test? > And once there, can you give qa+ back? > The problem was on AV, not on normal one.
Hi xiaohui I posted the wrong link. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41116300 THis one should work.
(In reply to Juan Quintela from comment #15) > Hi xiaohui > > I posted the wrong link. > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41116300 > > THis one should work. It did work. When test offline migration with multifd enabled, migration will fail and hmp will give error: 1) src qmp {"execute":"migrate", "arguments":{"uri": "exec:gzip -c > STATEFILE.gz"}} {"return": {}} {"execute":"query-migrate"} {"return": {"blocked": false, "status": "failed"}} 2) src hmp: (qemu) multifd is not supported by current protocol Juan, can't we give a qmp error like hmp's when start offline migration rather than give a return: '{"return": {}}'?
Hi Li Thanks for the testing. I will change upstream to give you an error. But should we left that out of this bugzilla? Do you want another bugzilla for the error on qmp? Thanks, Juan.
(In reply to Juan Quintela from comment #17) > Hi Li > > Thanks for the testing. > > I will change upstream to give you an error. > > But should we left that out of this bugzilla? Agree(I mean fix it on upstream) because layered product can't reproduce, and now qemu has given migration failed, vm still works on source host. In avoid to miss this issue, could you inform me which qemu-kvm fix it so that I could have a test after the patch merged? > Do you want another bugzilla for the error on qmp? > > Thanks, Juan.
Hi Xiaohui Ok, I will give you after the patch is merged. Later, Juan.
Now that the RHEL-AV z-stream bug has been created, moving this bug to RHEL 8.6.0 for resolution/testing.
Oh the irony of moving the bug from RHEL-AV to RHEL - all the flags are lost... Interestingly, the blocker+ flag stays, but we lose the release+ because qa_ack? and devel_ack? get reset. In any case, can we please just get the qa_ack+ again to make it "official". Lots of churn with this one just to create an AV z-stream.
Mass update of DTM/ITM to +3 values since the rebase of qemu-6.2 into RHEL 8.6 has been delayed or slowed due to process roadblocks (authentication changes, gating issues). This avoids the DevMissed bot and worse the bot that could come along and strip release+. The +3 was chosen mainly to give a cushion. Also added the qemu-6.2 rebase bug 2027716 as a dependent.
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.
Test again on qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949.x86_64, test results are like Comment 16. I would mark bz verified per Comment 16~19. Juan, please remind fix the qmp issue in later qemu version, thank you.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1759