Bug 1955666

Summary: qemu-kvm NULL pointer de-reference during migration at migrate_fd_connect ->...-> notifier_list_notify [rhel-8.4.0.z]
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: RHEL Program Management Team <pgm-rhel-tools>
Component: qemu-kvmAssignee: Laurent Vivier <lvivier>
qemu-kvm sub component: General QA Contact: Yanghang Liu <yanghliu>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: aadam, ailan, amusil, dgilbert, jasowang, jinzhao, juzhang, laine, lvivier, mburman, mperina, mtessun, virt-maint, yanghliu, ymankad
Version: 8.4Keywords: Triaged, ZStream
Target Milestone: rc   
Target Release: 8.4   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-5.2.0-16.module+el8.4.0+11358+3b8f35f7.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1953045 Environment:
Last Closed: 2021-07-06 13:21:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1953045    
Bug Blocks: 1964261    

Comment 3 Yanghang Liu 2021-05-14 09:46:33 UTC
I can use qemu-kvm-5.2.0-16.module+el8.4.0 to reproduce this bug:

Test step:

(1) start a vm with a failover virtio net device:

/usr/libexec/qemu-kvm -enable-kvm -m 1g -M q35 \
-device pcie-root-port,slot=4,id=root1 -device pcie-root-port,slot=5,id=root2 \
-device virtio-net-pci,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root1 \
-device e1000e,id=net2,mac=52:54:00:6f:55:cc,bus=root2,addr=0x0,failover_pair_id=net1 \
-monitor stdio \
-vnc :0 \
/home/images/RHEL84.qcow2 \

 
(2) hot-unplug the failover virtio nic

(qemu) device_del net1

(3) do the offline migration

(qemu) migrate "exec:gzip -c > STATEFILE.gz"

(4) check the test result

line 8: 12628 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -enable-kvm -m 1g -M q35 -device pcie-root-port,slot=4,id=root1 -device pcie-root-port,slot=5,id=root2 -device virtio-net-pci,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root1 -device e1000e,id=net2,mac=52:54:00:6f:55:cc,bus=root2,addr=0x0,failover_pair_id=net1 -monitor stdio -vnc :0 /home/images/RHEL84.qcow2


# dmesg
[23911.747222] qemu-kvm[12628]: segfault at 0 ip 0000000000000000 sp 00007fff1762dad8 error 14 in qemu-kvm[5556aaa28000+b13000]
[23911.758442] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.


(gdb) bt
#0  0x0000000000000000 in  ()
#1  0x00005556ab0fbd34 in notifier_list_notify ()
#2  0x00005556aae45552 in migrate_fd_connect ()
#3  0x00005556aadfa6aa in migration_channel_connect ()
#4  0x00005556aae512f8 in exec_start_outgoing_migration ()
#5  0x00005556aae43c99 in qmp_migrate ()
#6  0x00005556aae361b0 in hmp_migrate ()
#7  0x00005556aae0994a in handle_hmp_command ()
#8  0x00005556aae09b70 in monitor_command_cb ()
#9  0x00005556ab108235 in readline_handle_byte ()
#10 0x00005556aae09bc3 in monitor_read ()
#11 0x00005556aafd6e0d in fd_chr_read ()
#12 0x00007f80c5eef8ad in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#13 0x00005556ab10cab0 in main_loop_wait ()
#14 0x00005556aaf6feb1 in qemu_main_loop ()
#15 0x00005556aad4da02 in main ()

Comment 6 Yanan Fu 2021-06-10 05:45:03 UTC
Set Verified:Tested,SanityOnly as gating/tier1 test pass.

Comment 9 Yanghang Liu 2021-06-15 04:06:47 UTC
(In reply to Yanghang Liu from comment #3)


> Test step:
> 
> (1) start a vm with a failover virtio net device:
> 
> /usr/libexec/qemu-kvm -enable-kvm -m 1g -M q35 \
> -device pcie-root-port,slot=4,id=root1 -device
> pcie-root-port,slot=5,id=root2 \
> -device virtio-net-pci,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root1 \
> -device
> e1000e,id=net2,mac=52:54:00:6f:55:cc,bus=root2,addr=0x0,
> failover_pair_id=net1 \
> -monitor stdio \
> -vnc :0 \
> /home/images/RHEL85.qcow2 \
> 
>  
> (2) hot-unplug the failover virtio nic
> 
> (qemu) device_del net1
> 
> (3) do the offline migration
> 
> (qemu) migrate "exec:gzip -c > STATEFILE.gz"
> 
> (4) check the test result


Test with qemu-kvm-5.2.0-16.module+el8.4.0+11358+3b8f35f7.1:

This problem has been fixed.

The vm *will not crash* after hot-unplug the failover virtio net device and do offline migration.

Comment 10 Yanghang Liu 2021-06-15 04:08:32 UTC
According to comment 6 and comment 9 , move the bug status to VERIFIED.

Comment 12 errata-xmlrpc 2021-07-06 13:21:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2656