Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1584484

Summary: qemu crashes on migration with TLS
Product: Red Hat Enterprise Linux 7 Reporter: Fangge Jin <fjin>
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED DUPLICATE QA Contact: Yumei Huang <yuhuang>
Severity: high Docs Contact:
Priority: medium    
Version: 7.6CC: chayang, dyuan, dzheng, juzhang, knoel, lmen, lvivier, michen, peterx, pkrempa, quintela, qzhang, virt-maint, xianwang, xuzhang, yanqzhan, yiwei, zhguo
Target Milestone: rcKeywords: Automation, Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-02 08:25:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1584139, 1594384    
Bug Blocks:    
Attachments:
Description Flags
libvirtd and qemu log on both source and target hosts
none
qemu backtrace of both source and target hosts
none
qemu backtrace of both source and target hosts none

Description Fangge Jin 2018-05-31 03:18:24 UTC
Created attachment 1446056 [details]
libvirtd and qemu log on both source and target hosts

Description of problem:
Tls migration fails

Version-Release number of selected component (if applicable):
libvirt-4.3.0-1.el7.x86_64
qemu-kvm-rhev-2.12.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start a guest

2.Do tls migration:
# virsh migrate 1 qemu+ssh://10.66.4.140/system --live --verbose --migrateuri tcp://10.66.4.140  --tls
root.4.140's password: 
error: Unable to read from monitor: Connection reset by peer

3. Guest is crashed on source and target host:
# virsh list
 Id    Name                           State
----------------------------------------------------

qemu log on source host:
2018-05-31 02:48:59.360+0000: initiating migration
2018-05-31 02:49:00.855+0000: shutting down, reason=crashed

qemu log on target host:
2018-05-31T02:48:59.412929Z qemu-kvm: Can't find block
2018-05-31T02:48:59.412970Z qemu-kvm: Illegal RAM offset 5145564d00000000
2018-05-31T02:48:59.413020Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2018-05-31T02:48:59.413132Z qemu-kvm: load of migration failed: Invalid argument
2018-05-31 02:48:59.623+0000: shutting down, reason=crashed

Actual results:
Tls migration fails

Expected results:
Tls migration succeeds

Additional info:
Libvirtd log:
2018-05-31 02:48:59.351+0000: 17728: debug : qemuMonitorJSONCheckError:383 : unable to execute QEMU command {"execute":"object-del","arguments":{"id":"objlibvirt_migrate_tls0"},"id":"libvirt-24"}: {"id":"libvirt-24","error":{"class":"GenericError","desc":"object 'objlibvirt_migrate_tls0' not found"}}

Comment 3 Peter Krempa 2018-05-31 06:07:17 UTC
I've checked and libvirt sets up TLS environment correctly.

I remember qemu 2.12 crashing if the TLS environment is not set up correctly.

Please add the backtrace of qemu.

Comment 4 Peter Krempa 2018-05-31 06:09:04 UTC
I suspect it's what I've observed in upstream qemu. It was fixed by:

commit 8b7bf2badac25c0a52aff1b181ad75fdb304dd0c
Author: Dr. David Alan Gilbert <dgilbert>
Date:   Mon Apr 30 19:59:43 2018 +0100

    Migration+TLS: Fix crash due to double cleanup
    
    During a TLS connect we see:
      migration_channel_connect calls
      migration_tls_channel_connect
      (calls after TLS setup)
      migration_channel_connect
    
    My previous error handling fix made migration_channel_connect
    call migrate_fd_connect in all cases; unfortunately the above
    means it gets called twice and crashes doing double cleanup.
    
    Fixes: 688a3dcba98
    
    Reported-by: Peter Krempa <pkrempa>
    Signed-off-by: Dr. David Alan Gilbert <dgilbert>
    Reviewed-by: Daniel P. Berrangé <berrange>
    Message-Id: <20180430185943.35714-1-dgilbert>
    Signed-off-by: Juan Quintela <quintela>

Comment 5 Fangge Jin 2018-05-31 06:52:01 UTC
Created attachment 1446117 [details]
qemu backtrace of both source and target hosts

Comment 6 Fangge Jin 2018-05-31 07:00:17 UTC
Created attachment 1446129 [details]
qemu backtrace of both source and target hosts

Comment 7 Peter Krempa 2018-05-31 07:05:07 UTC
From the source qemu's coredump. I've truncated the boring stuff:

Core was generated by `/usr/libexec/qemu-kvm -name guest=1,debug-threads=on -S -object secret,id=maste'.
Program terminated with signal 11, Segmentation fault.

#0  qemu_bh_schedule (bh=0x0) at util/async.c:159
159	    ctx = bh->ctx;

[...]

Thread 10 (Thread 0x7fcab188a040 (LWP 14331)):
#0  0x00007fcaab3f748d in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007fcaab3f2d7b in _L_lock_812 () at /lib64/libpthread.so.0
#2  0x00007fcaab3f2c48 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x00005580368ef0c9 in qemu_mutex_lock_impl (mutex=mutex@entry=0x558037135880 <qemu_global_mutex>, file=file@entry=0x558036983468 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1765)
    at util/qemu-thread-posix.c:67
#4  0x00005580365dafb8 in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.12.0/cpus.c:1765
#5  0x00005580368ec699 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#6  0x00005580368ec699 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:522
#7  0x0000558036597717 in main () at vl.c:1963
#8  0x0000558036597717 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4768

[...]


Thread 1 (Thread 0x7fca485c3700 (LWP 14371)):
#0  0x00005580368ea5b0 in qemu_bh_schedule (bh=0x0) at util/async.c:159
#1  0x00005580367ba127 in migration_thread (s=0x5580397a0500) at migration/migration.c:2344
#2  0x00005580367ba127 in migration_thread (opaque=0x5580397a0500) at migration/migration.c:2426
#3  0x00007fcaab3f0dd5 in start_thread () at /lib64/libpthread.so.0
#4  0x00007fcaab11a9bd in clone () at /lib64/libc.so.6


Re-assigning to qemu.

Comment 8 Fangge Jin 2018-05-31 07:23:11 UTC
(In reply to Peter Krempa from comment #7)
> From the source qemu's coredump. I've truncated the boring stuff:
> 
> Core was generated by `/usr/libexec/qemu-kvm -name guest=1,debug-threads=on
> -S -object secret,id=maste'.
> Program terminated with signal 11, Segmentation fault.
> 
> #0  qemu_bh_schedule (bh=0x0) at util/async.c:159
...

I re-attached the coredump, the source qemu's coredump is different, please check

Comment 9 Dr. David Alan Gilbert 2018-05-31 11:52:44 UTC
Yeh I think I know what this is; I recently put a fix in upstream:
         Migration+TLS: Fix crash due to double cleanup

Comment 10 Yiqian Wei 2018-06-01 01:02:26 UTC
reproduce this bug with "qemu-kvm-rhev-2.12.0-2.el7.x86_64" and "kernel-3.10.0-896.el7.x86_64"

Steps:
1.manual generate ca, server certificat, client certificate
2.Boot a guest in dst host (TLS server end)
 /usr/libexec/qemu-kvm \
    -object tls-creds-x509,id=tls0,endpoint=server,dir=/root/CA \
    -name 'vm-1'  \
    -machine pc  \
    -vga qxl  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20180528-014913-MLEteNaF,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20180528-014913-MLEteNaF,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20180528-014913-MLEteNaF,path=/var/tmp/seabios-20180528-014913-MLEteNaF,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20180528-014913-MLEteNaF,iobase=0x402 \
    -device nec-usb-xhci,id=usb0  \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/win2016-64-virtio.qcow2  \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -device virtio-net-pci,mac=9a:5e:5f:60:61:62,id=idS5WU7v,vectors=4,netdev=idH57evf,bus=pci.0,addr=0x5  \
    -netdev tap,id=idH57evf,vhost=on   \
    -m 4G  \
    -smp 2   \
    -cpu Broadwell-IBRS,enforce  \
    -device usb-tablet,id=usb-tablet1,bus=usb0.0,port=1  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -enable-kvm \
    -spice port=5930,disable-ticketing \
    -qmp tcp:0:6666,server,nowait \
    -device virtio-balloon-pci,id=balloon0 \
    -monitor stdio \
    -incoming defer \

3.Boot a guest in src host(TLS client end)
 /usr/libexec/qemu-kvm \
    -object tls-creds-x509,id=tls0,endpoint=client,dir=/root/CA \
    -name 'vm-1'  \
    -machine pc  \
    -vga qxl  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20180528-014913-MLEteNaF,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20180528-014913-MLEteNaF,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20180528-014913-MLEteNaF,path=/var/tmp/seabios-20180528-014913-MLEteNaF,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20180528-014913-MLEteNaF,iobase=0x402 \
    -device nec-usb-xhci,id=usb0  \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/win2016-64-virtio.qcow2  \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
    -device virtio-net-pci,mac=9a:5e:5f:60:61:62,id=idS5WU7v,vectors=4,netdev=idH57evf,bus=pci.0,addr=0x5  \
    -netdev tap,id=idH57evf,vhost=on   \
    -m 4G  \
    -smp 2   \
    -cpu Broadwell-IBRS,enforce  \
    -device usb-tablet,id=usb-tablet1,bus=usb0.0,port=1  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -enable-kvm \
    -spice port=5930,disable-ticketing \
    -qmp tcp:0:6666,server,nowait \
    -device virtio-balloon-pci,id=balloon0 \
    -monitor stdio \
4.In dst host:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_incoming tcp:10.73.72.88:5801
In src host:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate -d tcp:10.73.72.88:5801

Test results: 
In dst host: migration failed
(qemu) qemu-kvm: Can't find block 
qemu-kvm: Illegal RAM offset 5145564d00000000
qemu-kvm: error while loading state for instance 0x0 of device 'ram'
qemu-kvm: load of migration failed: Invalid argument
In src host:qemu core dumped

Comment 11 Dr. David Alan Gilbert 2018-06-22 19:22:47 UTC
Fixed by one of the patches in 1584139's set

Comment 12 Dr. David Alan Gilbert 2018-08-02 08:25:56 UTC

*** This bug has been marked as a duplicate of bug 1584139 ***