Bug 1584484
| Summary: | qemu crashes on migration with TLS | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Fangge Jin <fjin> | ||||||||
| Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> | ||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Yumei Huang <yuhuang> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 7.6 | CC: | chayang, dyuan, dzheng, juzhang, knoel, lmen, lvivier, michen, peterx, pkrempa, quintela, qzhang, virt-maint, xianwang, xuzhang, yanqzhan, yiwei, zhguo | ||||||||
| Target Milestone: | rc | Keywords: | Automation, Regression | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-08-02 08:25:56 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1584139, 1594384 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
I've checked and libvirt sets up TLS environment correctly. I remember qemu 2.12 crashing if the TLS environment is not set up correctly. Please add the backtrace of qemu. I suspect it's what I've observed in upstream qemu. It was fixed by:
commit 8b7bf2badac25c0a52aff1b181ad75fdb304dd0c
Author: Dr. David Alan Gilbert <dgilbert>
Date: Mon Apr 30 19:59:43 2018 +0100
Migration+TLS: Fix crash due to double cleanup
During a TLS connect we see:
migration_channel_connect calls
migration_tls_channel_connect
(calls after TLS setup)
migration_channel_connect
My previous error handling fix made migration_channel_connect
call migrate_fd_connect in all cases; unfortunately the above
means it gets called twice and crashes doing double cleanup.
Fixes: 688a3dcba98
Reported-by: Peter Krempa <pkrempa>
Signed-off-by: Dr. David Alan Gilbert <dgilbert>
Reviewed-by: Daniel P. Berrangé <berrange>
Message-Id: <20180430185943.35714-1-dgilbert>
Signed-off-by: Juan Quintela <quintela>
Created attachment 1446117 [details]
qemu backtrace of both source and target hosts
Created attachment 1446129 [details]
qemu backtrace of both source and target hosts
From the source qemu's coredump. I've truncated the boring stuff:
Core was generated by `/usr/libexec/qemu-kvm -name guest=1,debug-threads=on -S -object secret,id=maste'.
Program terminated with signal 11, Segmentation fault.
#0 qemu_bh_schedule (bh=0x0) at util/async.c:159
159 ctx = bh->ctx;
[...]
Thread 10 (Thread 0x7fcab188a040 (LWP 14331)):
#0 0x00007fcaab3f748d in __lll_lock_wait () at /lib64/libpthread.so.0
#1 0x00007fcaab3f2d7b in _L_lock_812 () at /lib64/libpthread.so.0
#2 0x00007fcaab3f2c48 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3 0x00005580368ef0c9 in qemu_mutex_lock_impl (mutex=mutex@entry=0x558037135880 <qemu_global_mutex>, file=file@entry=0x558036983468 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1765)
at util/qemu-thread-posix.c:67
#4 0x00005580365dafb8 in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.12.0/cpus.c:1765
#5 0x00005580368ec699 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#6 0x00005580368ec699 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:522
#7 0x0000558036597717 in main () at vl.c:1963
#8 0x0000558036597717 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4768
[...]
Thread 1 (Thread 0x7fca485c3700 (LWP 14371)):
#0 0x00005580368ea5b0 in qemu_bh_schedule (bh=0x0) at util/async.c:159
#1 0x00005580367ba127 in migration_thread (s=0x5580397a0500) at migration/migration.c:2344
#2 0x00005580367ba127 in migration_thread (opaque=0x5580397a0500) at migration/migration.c:2426
#3 0x00007fcaab3f0dd5 in start_thread () at /lib64/libpthread.so.0
#4 0x00007fcaab11a9bd in clone () at /lib64/libc.so.6
Re-assigning to qemu.
(In reply to Peter Krempa from comment #7) > From the source qemu's coredump. I've truncated the boring stuff: > > Core was generated by `/usr/libexec/qemu-kvm -name guest=1,debug-threads=on > -S -object secret,id=maste'. > Program terminated with signal 11, Segmentation fault. > > #0 qemu_bh_schedule (bh=0x0) at util/async.c:159 ... I re-attached the coredump, the source qemu's coredump is different, please check Yeh I think I know what this is; I recently put a fix in upstream:
Migration+TLS: Fix crash due to double cleanup
reproduce this bug with "qemu-kvm-rhev-2.12.0-2.el7.x86_64" and "kernel-3.10.0-896.el7.x86_64"
Steps:
1.manual generate ca, server certificat, client certificate
2.Boot a guest in dst host (TLS server end)
/usr/libexec/qemu-kvm \
-object tls-creds-x509,id=tls0,endpoint=server,dir=/root/CA \
-name 'vm-1' \
-machine pc \
-vga qxl \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20180528-014913-MLEteNaF,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20180528-014913-MLEteNaF,server,nowait \
-device isa-serial,chardev=serial_id_serial0 \
-chardev socket,id=seabioslog_id_20180528-014913-MLEteNaF,path=/var/tmp/seabios-20180528-014913-MLEteNaF,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20180528-014913-MLEteNaF,iobase=0x402 \
-device nec-usb-xhci,id=usb0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/win2016-64-virtio.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
-device virtio-net-pci,mac=9a:5e:5f:60:61:62,id=idS5WU7v,vectors=4,netdev=idH57evf,bus=pci.0,addr=0x5 \
-netdev tap,id=idH57evf,vhost=on \
-m 4G \
-smp 2 \
-cpu Broadwell-IBRS,enforce \
-device usb-tablet,id=usb-tablet1,bus=usb0.0,port=1 \
-rtc base=localtime,clock=host,driftfix=slew \
-enable-kvm \
-spice port=5930,disable-ticketing \
-qmp tcp:0:6666,server,nowait \
-device virtio-balloon-pci,id=balloon0 \
-monitor stdio \
-incoming defer \
3.Boot a guest in src host(TLS client end)
/usr/libexec/qemu-kvm \
-object tls-creds-x509,id=tls0,endpoint=client,dir=/root/CA \
-name 'vm-1' \
-machine pc \
-vga qxl \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20180528-014913-MLEteNaF,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20180528-014913-MLEteNaF,server,nowait \
-device isa-serial,chardev=serial_id_serial0 \
-chardev socket,id=seabioslog_id_20180528-014913-MLEteNaF,path=/var/tmp/seabios-20180528-014913-MLEteNaF,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20180528-014913-MLEteNaF,iobase=0x402 \
-device nec-usb-xhci,id=usb0 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/win2016-64-virtio.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \
-device virtio-net-pci,mac=9a:5e:5f:60:61:62,id=idS5WU7v,vectors=4,netdev=idH57evf,bus=pci.0,addr=0x5 \
-netdev tap,id=idH57evf,vhost=on \
-m 4G \
-smp 2 \
-cpu Broadwell-IBRS,enforce \
-device usb-tablet,id=usb-tablet1,bus=usb0.0,port=1 \
-rtc base=localtime,clock=host,driftfix=slew \
-enable-kvm \
-spice port=5930,disable-ticketing \
-qmp tcp:0:6666,server,nowait \
-device virtio-balloon-pci,id=balloon0 \
-monitor stdio \
4.In dst host:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate_incoming tcp:10.73.72.88:5801
In src host:
(qemu) migrate_set_parameter tls-creds tls0
(qemu) migrate -d tcp:10.73.72.88:5801
Test results:
In dst host: migration failed
(qemu) qemu-kvm: Can't find block
qemu-kvm: Illegal RAM offset 5145564d00000000
qemu-kvm: error while loading state for instance 0x0 of device 'ram'
qemu-kvm: load of migration failed: Invalid argument
In src host:qemu core dumped
Fixed by one of the patches in 1584139's set *** This bug has been marked as a duplicate of bug 1584139 *** |
Created attachment 1446056 [details] libvirtd and qemu log on both source and target hosts Description of problem: Tls migration fails Version-Release number of selected component (if applicable): libvirt-4.3.0-1.el7.x86_64 qemu-kvm-rhev-2.12.0-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Start a guest 2.Do tls migration: # virsh migrate 1 qemu+ssh://10.66.4.140/system --live --verbose --migrateuri tcp://10.66.4.140 --tls root.4.140's password: error: Unable to read from monitor: Connection reset by peer 3. Guest is crashed on source and target host: # virsh list Id Name State ---------------------------------------------------- qemu log on source host: 2018-05-31 02:48:59.360+0000: initiating migration 2018-05-31 02:49:00.855+0000: shutting down, reason=crashed qemu log on target host: 2018-05-31T02:48:59.412929Z qemu-kvm: Can't find block 2018-05-31T02:48:59.412970Z qemu-kvm: Illegal RAM offset 5145564d00000000 2018-05-31T02:48:59.413020Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2018-05-31T02:48:59.413132Z qemu-kvm: load of migration failed: Invalid argument 2018-05-31 02:48:59.623+0000: shutting down, reason=crashed Actual results: Tls migration fails Expected results: Tls migration succeeds Additional info: Libvirtd log: 2018-05-31 02:48:59.351+0000: 17728: debug : qemuMonitorJSONCheckError:383 : unable to execute QEMU command {"execute":"object-del","arguments":{"id":"objlibvirt_migrate_tls0"},"id":"libvirt-24"}: {"id":"libvirt-24","error":{"class":"GenericError","desc":"object 'objlibvirt_migrate_tls0' not found"}}