Bug 1964326
Summary: | Qemu core dump when do tls migration via tcp protocol | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Li Xiaohui <xiaohli> | ||||
Component: | qemu-kvm | Assignee: | Leonardo Bras <leobras> | ||||
qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | 47b13, chayang, dgilbert, fjin, jinzhao, juzhang, lcheng, leobras, mdean, peterx, qzhang, virt-maint | ||||
Version: | 8.5 | Keywords: | Regression | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.5 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-11-16 07:53:34 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Li Xiaohui
2021-05-25 08:44:52 UTC
Leo: This looks like the one you tripped over a couple of days ago. Created attachment 1786771 [details]
tls_cert.sh
When do TLS encryption migration via exec, case passed. I had this issue a while ago, and yesterday I took some time to try and understand this. I first reverted to a pre-yank commit, and after some small fixes, it worked just fine. Then I read the cover letter for the last yank patchset, and tried to monitor the usage of yank interface: - No-TLS migration did: 1 - register_instance() 2 - register_function() 3 - unregister_function() 4 - unregister_instance(). - When I try TLS migration, (3) don't happen at all, and this causes (4) to abort, because there are still valid functions registered. Looking closely, this happens because in migration_channel_connect(), if migration happens over TLS, to_dst_file is not assigned, causing (3) not to happen in migrate_fd_cleanup() because qemu_fclose() is not ran. Now I am trying to understand exactly where (3) is supposed to happen when TLS is used. (In reply to Li Xiaohui from comment #3) > When do TLS encryption migration via exec, case passed. I could not get exec to work with TLS. Could you please show the cmdline used for receiving and sending ends? > > (In reply to Li Xiaohui from comment #3) > > When do TLS encryption migration via exec, case passed. > > I could not get exec to work with TLS. > Could you please show the cmdline used for receiving and sending ends? Test steps about exec: 1. Step1~3 are same with Comment 0; 2. In dst host: (qemu) migrate_set_parameter tls-creds tls0 (qemu) migrate_incoming "exec:socat - TCP4-LISTEN:9002" In src host: (qemu) migrate_set_parameter tls-creds tls0 (qemu) migrate_set_parameter tls-hostname $dst_short_host_name (qemu) migrate "exec:socat - TCP4:$dst_short_host_name:9002" (In reply to Li Xiaohui from comment #5) > > > > (In reply to Li Xiaohui from comment #3) > > > When do TLS encryption migration via exec, case passed. > > > > I could not get exec to work with TLS. > > Could you please show the cmdline used for receiving and sending ends? > > Test steps about exec: > 1. Step1~3 are same with Comment 0; > 2. In dst host: > (qemu) migrate_set_parameter tls-creds tls0 > (qemu) migrate_incoming "exec:socat - TCP4-LISTEN:9002" > In src host: > (qemu) migrate_set_parameter tls-creds tls0 > (qemu) migrate_set_parameter tls-hostname $dst_short_host_name > (qemu) migrate "exec:socat - TCP4:$dst_short_host_name:9002" Thanks Xi, I will try that too! I have sent a v1 & v2 for a fix in this bug: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210526221615.1093506-1-leobras.c@gmail.com/ Peter Xu recommended refactoring Yank on Migration so we place yank in channel-{tls,socket}, which seems to make more sense here. I posted a v3 earlier, which was reviewed by Peter Xu and Lukas Straub: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210601054030.1153249-1-leobras.c@gmail.com/ By all tests I did, this fixes the issue. Patch got accepted upstream at: https://gitlab.com/qemu-project/qemu/-/commit/7de2e8565335c13fb3516cddbe2e40e366cce273 (master) Pass after testing on rhelav 8.5.0 (kernel-4.18.0-312.el8.x86_64 & qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef.x86_64) Run automation per tls migration test requirement with rhel8.5.0 and win2022 guests, cases all passed, logs please see following link: http://fileshare.englab.nay.redhat.com/pub/logs/xiaohli/bz1964326/ QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Mark bz as verified per Comment 17, and remove SanityOnly because we have scenarios to cover this bz. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684 |