Bug 1786413 - Assertion `dev->state == USB_STATE_DEFAULT' failed when setting spice usb redir during migration
Summary: Assertion `dev->state == USB_STATE_DEFAULT' failed when setting spice usb red...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: rc
: 8.0
Assignee: Dr. David Alan Gilbert
QA Contact: yduan
URL:
Whiteboard:
Depends On:
Blocks: 1786414 1790844
TreeView+ depends on / blocked
 
Reported: 2019-12-25 03:58 UTC by Han Han
Modified: 2021-10-26 11:18 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1786414 1790844 (view as bug list)
Environment:
Last Closed: 2021-06-25 07:30:42 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Scripts and logs (287.55 KB, application/gzip)
2019-12-25 03:58 UTC, Han Han
no flags Details
Update libvirtd log of host2 (577.34 KB, application/gzip)
2019-12-25 04:07 UTC, Han Han
no flags Details
Segment fault all backtrace (20.12 KB, text/plain)
2019-12-25 04:25 UTC, Han Han
no flags Details
Fix a typo in run.sh (578.75 KB, application/gzip)
2019-12-25 06:37 UTC, Han Han
no flags Details
possible fix (1.51 KB, patch)
2020-01-03 11:52 UTC, Gerd Hoffmann
no flags Details | Diff
Full threads backtrace for segment fault of qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64 (20.47 KB, text/plain)
2020-01-08 02:48 UTC, Han Han
no flags Details
l1 VM xml and qemu cmdline (2.56 KB, application/gzip)
2020-01-08 03:00 UTC, Han Han
no flags Details

Description Han Han 2019-12-25 03:58:51 UTC
Created attachment 1647579 [details]
Scripts and logs

Description of problem:
As subject

Version-Release number of selected component (if applicable):
Desktop:virt-viewer-7.0-9.el8.x86_64 spice-glib-0.37-1.el8.x86_64
Hosts: qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64
libvirt-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64

How reproducible:
50%

Steps to Reproduce:
1. Prepare a vm on an virt host. Prepare migration for two hosts.
Replace the SRC and DST host for migration in migrate.sh. Replace SPICE_SERVER with your spice console url of your vm. Prepare an USB device on your desktop.
Prepare to gather coredump and log:
# cat /etc/libvirt/qemu.conf
gluster_debug_level = 0
max_core = "unlimited"

# tail /etc/libvirt/libvirtd.conf
log_level = 3
log_filters = "1:qemu 1:libvirt"
log_outputs = "1:file:/var/log/libvirt/libvirtd.log"

Then restart libvirtd.


2. Run ./migrate.sh to start migration loop. Run run.sh to start remote-viewer connnects

3. When remote-viewer has connected the vm, click File --> USB device selection --> select your desktop usb device to enable usbredir. 
Continue above in step3 until the vm crashes.

Guest log of crash:
qemu-kvm: hw/usb/core.c:408: usb_handle_packet: Assertion `dev->state == USB_STATE_DEFAULT' failed.
2019-12-25 03:33:00.117+0000: shutting down, reason=crashed


Core backtrace:
#0  0x00007f57712ce8df in raise () from /lib64/libc.so.6
#1  0x00007f57712b8cf5 in abort () from /lib64/libc.so.6
#2  0x00007f57712b8bc9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f57712c6e96 in __assert_fail () from /lib64/libc.so.6
#4  0x000055a9ba0faf5d in usb_handle_packet (dev=0x55a9bd298ac0, p=p@entry=0x7f5738018948) at hw/usb/core.c:408                                                            
#5  0x000055a9ba10939c in xhci_submit (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>) at hw/usb/hcd-xhci.c:1826                                        
#6  xhci_fire_transfer (epctx=<optimized out>, xfer=<optimized out>, xhci=<optimized out>) at hw/usb/hcd-xhci.c:1835                                                       
#7  xhci_kick_epctx (epctx=0x55a9bcb5ad40, streamid=0) at hw/usb/hcd-xhci.c:1986
#8  0x000055a9b9f4ab13 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>,      
    mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/memory.c:507                                                  
#9  0x000055a9b9f48d26 in access_with_adjusted_size (addr=addr@entry=4, value=value@entry=0x7f574e12f558, size=size@entry=4, access_size_min=<optimized out>,              
    access_size_max=<optimized out>, access_fn=access_fn@entry=0x55a9b9f4aac0 <memory_region_write_accessor>, mr=0x7f574d338d70, attrs=...)                                
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/memory.c:573                                                                                   
#10 0x000055a9b9f4cd00 in memory_region_dispatch_write (mr=0x7f574d338d70, addr=4, data=<optimized out>, size=4, attrs=...)                                                
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/memory.c:1509                                                                                  
#11 0x000055a9b9ef9ca3 in flatview_write_continue (fv=0x55a9bd28f6e0, addr=4238352388, attrs=...,                                                                          
    buf=0x7f5776d09028 <error: Cannot access memory at address 0x7f5776d09028>, len=4, addr1=<optimized out>, l=<optimized out>, mr=0x7f574d338d70)                        
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/exec.c:3369                                                                                    
#12 0x000055a9b9ef9ec6 in flatview_write (fv=0x55a9bd28f6e0, addr=4238352388, attrs=..., buf=0x7f5776d09028 <error: Cannot access memory at address 0x7f5776d09028>, len=4)
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/exec.c:3408                                                                                    
#13 0x000055a9b9efe25a in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)                              
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/exec.c:3498                                                                                    
#14 0x000055a9b9f5b9ba in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/accel/kvm/kvm-all.c:2316              
#15 0x000055a9b9f409fe in qemu_kvm_cpu_thread_fn (arg=0x55a9bbeb55e0) at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/cpus.c:1285                  
#16 0x000055a9ba262954 in qemu_thread_start (args=0x55a9bbed9030) at util/qemu-thread-posix.c:502                                                                          
#17 0x00007f57716622de in start_thread () from /lib64/libpthread.so.0
#18 0x00007f5771393133 in clone () from /lib64/libc.so.6


Actual results:
As subject

Expected results:
NO crash

Additional info:
Logs in attachment:
host1-qemu.log: qemu guest log of host1 
host2-qemu.log: qemu guest log of host2
viewer.log: remote-viewer debug log
gdb.txt: full threads backtrace
host1-libvirtd.log: libvirtd log of host1
host2-libvirtd.log: libvirtd log of host2

Comment 1 Han Han 2019-12-25 04:07:21 UTC
Created attachment 1647580 [details]
Update libvirtd log of host2

Comment 2 Han Han 2019-12-25 04:25:44 UTC
Created attachment 1647581 [details]
Segment fault all backtrace

Another qemu crash, due to null pointer reference, with the same case:
(gdb) bt
#0  0x00007fd8b2dff3ed in usbredirparser_do_write (parser_pub=0x0) at usbredirparser.c:1060
#1  0x0000558954e2865b in xhci_ep_nuke_xfers (xhci=xhci@entry=0x7fd886138010, slotid=slotid@entry=1, epid=<optimized out>, report=report@entry=CC_STOPPED)
    at hw/usb/hcd-xhci.c:1256
#2  0x0000558954e2b6bc in xhci_stop_ep (epid=<optimized out>, slotid=1, xhci=0x7fd886138010) at hw/usb/hcd-xhci.c:1319
#3  xhci_process_commands (xhci=0x7fd886138010) at hw/usb/hcd-xhci.c:2514
#4  0x0000558954c6bb13 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, 
    mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/memory.c:507
#5  0x0000558954c69d26 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fd886ffc558, size=size@entry=4, access_size_min=<optimized out>, 
    access_size_max=<optimized out>, access_fn=access_fn@entry=0x558954c6bac0 <memory_region_write_accessor>, mr=0x7fd886138d70, attrs=...)
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/memory.c:573
#6  0x0000558954c6dd00 in memory_region_dispatch_write (mr=0x7fd886138d70, addr=0, data=<optimized out>, size=4, attrs=...)
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/memory.c:1509
#7  0x0000558954c1aca3 in flatview_write_continue (fv=0x558956fe3b20, addr=4238352384, attrs=..., 
    buf=0x7fd8b3c0e028 <error: Cannot access memory at address 0x7fd8b3c0e028>, len=4, addr1=<optimized out>, l=<optimized out>, mr=0x7fd886138d70)
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/exec.c:3369
#8  0x0000558954c1aec6 in flatview_write (fv=0x558956fe3b20, addr=4238352384, attrs=..., buf=0x7fd8b3c0e028 <error: Cannot access memory at address 0x7fd8b3c0e028>, len=4)
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/exec.c:3408
#9  0x0000558954c1f25a in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
    at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/exec.c:3498
#10 0x0000558954c7c9ba in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/accel/kvm/kvm-all.c:2316
#11 0x0000558954c619fe in qemu_kvm_cpu_thread_fn (arg=0x5589560167a0) at /usr/src/debug/qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64/cpus.c:1285
#12 0x0000558954f83954 in qemu_thread_start (args=0x55895603a470) at util/qemu-thread-posix.c:502
#13 0x00007fd8ae5672de in start_thread () from /lib64/libpthread.so.0
#14 0x00007fd8ae298133 in clone () from /lib64/libc.so.6

Comment 3 Han Han 2019-12-25 06:37:09 UTC
Created attachment 1647585 [details]
Fix a typo in run.sh

Comment 6 Gerd Hoffmann 2020-01-03 11:52:58 UTC
Created attachment 1649384 [details]
possible fix

Short into the dark.

xhci checks for surprise removal when entering xhci_kick_epctx(), but doesn't re-check later on, so when processing a ring with multiple transfers queued up this might happen while qemu loops over the transfers.

Second strack trace might be a use-after-free caused by the missing check.  Might also be some unrelated bug in usbredir.

In any case there seems to be another bug in virt-viewer which causes usb device handover fail.  That should not take down qemu though.

Comment 7 Dr. David Alan Gilbert 2020-01-03 18:31:20 UTC
Hi Han,
  I've just created a brew build with:
    a) Gerd's test fix from comment 6
    b) My fix for bz 1752320
    c) My fix for bz 1772774

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=25597958

please let us know if it helps.

Comment 8 Han Han 2020-01-07 01:02:26 UTC
(In reply to Dr. David Alan Gilbert from comment #7)
> Hi Han,
>   I've just created a brew build with:
>     a) Gerd's test fix from comment 6
>     b) My fix for bz 1752320
>     c) My fix for bz 1772774
> 
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=25597958
> 
> please let us know if it helps.

I have run this scratch build for a half-day. BZ1752320 was not reproduced.
Unlikely it triggered BZ1787536 and exited the test.

I will remove the NIC bindwidth limit and run that again to see if any bug
will happen again.

Comment 9 Han Han 2020-01-08 02:48:54 UTC
Created attachment 1650568 [details]
Full threads backtrace for segment fault of qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64

I followed the steps of bug report, then got a segment fault:
#0  0x00007fb6d9f703ed in usbredirparser_do_write (parser_pub=0x0) at usbredirparser.c:1060                                                                                
#1  0x000055956c4021eb in xhci_ep_nuke_xfers (xhci=xhci@entry=0x7fb6cc138010, slotid=slotid@entry=1, epid=<optimized out>, report=report@entry=CC_STOPPED)                 
    at hw/usb/hcd-xhci.c:1257
#2  0x000055956c40531c in xhci_stop_ep (epid=<optimized out>, slotid=1, xhci=0x7fb6cc138010) at hw/usb/hcd-xhci.c:1320                                                     
#3  xhci_process_commands (xhci=0x7fb6cc138010) at hw/usb/hcd-xhci.c:2526
#4  0x000055956c242087 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>,      
    mask=<optimized out>, attrs=...) at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/memory.c:483                                                                 
#5  0x000055956c2402be in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7fb6c6ffc508, size=size@entry=4, access_size_min=<optimized out>,              
    access_size_max=<optimized out>, access_fn=0x55956c242010 <memory_region_write_accessor>, mr=0x7fb6cc138d70, attrs=...)                                                
    at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/memory.c:544
#6  0x000055956c24419c in memory_region_dispatch_write (mr=0x7fb6cc138d70, addr=0, data=<optimized out>, op=<optimized out>, attrs=...)                                    
    at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/memory.c:1475
#7  0x000055956c1f1227 in flatview_write_continue (fv=0x55956f68b1b0, addr=4238352384, attrs=...,                                                                          
    buf=0x7fb6dad7f028 <error: Cannot access memory at address 0x7fb6dad7f028>, len=4, addr1=<optimized out>, l=<optimized out>, mr=0x7fb6cc138d70)                        
    at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/include/qemu/host-utils.h:164                                                                                 
#8  0x000055956c1f1446 in flatview_write (fv=0x55956f68b1b0, addr=4238352384, attrs=..., buf=0x7fb6dad7f028 <error: Cannot access memory at address 0x7fb6dad7f028>, len=4)
    at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/exec.c:3169
#9  0x000055956c1f595f in address_space_write () at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/exec.c:3259                                                      
#10 0x000055956c25307a in kvm_cpu_exec (cpu=<optimized out>) at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/accel/kvm/kvm-all.c:2386                             
#11 0x000055956c237eee in qemu_kvm_cpu_thread_fn (arg=0x55956e5e2100) at /usr/src/debug/qemu-kvm-4.2.0-4.el8.bz1786413a.x86_64/cpus.c:1318                                 
#12 0x000055956c5612a4 in qemu_thread_start (args=0x55956e60a200) at util/qemu-thread-posix.c:519                                                                          
#13 0x00007fb6d5ccd2de in start_thread (arg=<optimized out>) at pthread_create.c:486
#14 0x00007fb6d59fee83 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 10 Han Han 2020-01-08 02:52:55 UTC
All the tests above were done on layer 1 nested machines, since I don't have enough machine for migration.
I am not sure if that will affect test results.
So you can test them on physical machines for rechecking.

Comment 11 Han Han 2020-01-08 03:00:01 UTC
Created attachment 1650569 [details]
l1 VM xml and qemu cmdline

The ENVs of nested virt:
Physical machine:
  cpu:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           94
Model name:                      Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz
Stepping:                        3
CPU MHz:                         2161.029
CPU max MHz:                     3900.0000
CPU min MHz:                     800.0000
BogoMIPS:                        5799.77
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        1 MiB
L3 cache:                        8 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Tsx async abort:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
                                 rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl v
                                 mx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpu
                                 id_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm m
                                 px rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

  software: kernel-5.3.16-300.fc31.x86_64 libvirt-5.6.0-5.fc31.x86_64 qemu-4.1.1-1.fc31.x86_64


For the vm XML or qemu cmdline of l1 machine. See the attachment

Comment 12 Dr. David Alan Gilbert 2020-01-08 17:30:14 UTC
comment 9 looks the same as comment 2 to me.
The :   usbredirparser_do_write (parser_pub=0x0)
looks bad to me;  I guess the parser went away.

I don't quite see the callpath - I'm expecting to see something else between xhci_ep_nuke_xfers and usbredirparser_do_write
given the line numbers perhaps usb_device_ep_stopped at the end of nuke_xfer's calling usbredir_ep_stopped.

Han: I don't think this is related to your use of nesting; this looks like a non-nest problem.

Gerd: How is this supposed to do it - none of the stuff in hw/usb/redirect checks dev->parser - so it seems to assume it's still alive;
who is supposed to check, and when?

Comment 13 Gerd Hoffmann 2020-01-14 07:33:32 UTC
> I don't quite see the callpath - I'm expecting to see something else between
> xhci_ep_nuke_xfers and usbredirparser_do_write
> given the line numbers perhaps usb_device_ep_stopped at the end of
> nuke_xfer's calling usbredir_ep_stopped.

Yep, ep_stopped most likely.  I'm wondering too why this isn't in the stack trace,
especially as usbredir_ep_stopped is called via function pointer so gcc can hardly
inline it ...

> Han: I don't think this is related to your use of nesting; this looks like a
> non-nest problem.
> 
> Gerd: How is this supposed to do it - none of the stuff in hw/usb/redirect
> checks dev->parser - so it seems to assume it's still alive;
> who is supposed to check, and when?

I think the flow is (1) virt-viewer is killed, (2) chardev is unconnected,
(3) usbredir_chardev_close_bh() will detach the usb device and free parser.

The expectation seems to be that there wouldn't be any callbacks after
calling usb_device_detach().  Which probably is actually the case for
most of them, xhci/ehci/uhci should not allow the guest to start new
usb transfers after detach().

So I guess we should add checks.  For ep_stopped "if (!parser) return"
should be fine.  Maybe for free_streams too, like ep_stopped this might
be called in after detach() in reponse to the guest driver cleaning up.

For the other callbacks an "assert(parser)" should do the trick.

Comment 14 Xueqiang Wei 2020-01-16 12:25:11 UTC
(In reply to Han Han from comment #10)
> All the tests above were done on layer 1 nested machines, since I don't have
> enough machine for migration.
> I am not sure if that will affect test results.
> So you can test them on physical machines for rechecking.


Try to reproduce this bug on two physical machines with the same version mentioned in bug 1786414 (rhel820-av).

I have run the steps for a half-day, but not hit it.

Just hit Bug 1752320 - vm gets stuck when migrate vm back and forth with remote-viewer trying to connect
(https://bugzilla.redhat.com/show_bug.cgi?id=1752320#c2.)


Details:
kernel-4.18.0-167.el8.x86_64
qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc
libvirt-5.10.0-1.module+el8.2.0+5135+ed3b2489.x86_64
virt-viewer-7.0-9.el8.x86_64
spice-glib-0.37-1.el8.x86_64
edk2-ovmf-20190829git37eef91017ad-4.el8.noarch

Guest:
kernel-4.18.0-160.el8.x86_64


1. Prepare a vm on src host. Guest image shared by nfs (/home/nfs/rhel820-64-virtio-scsi.qcow2.ovmf)
# virsh define rhel8.2-migration.xml 
# virsh start rhel820src

xml file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/rhel8.2-migration.xml


2.  Prepare migration for two hosts.
Replace the SRC and DST host for migration in migrate.sh. Replace SPICE_SERVER with your spice console url of your vm. Prepare an USB device on your desktop.
Prepare to gather coredump and log:

# cat /etc/libvirt/qemu.conf
gluster_debug_level = 0
max_core = "unlimited"

# tail /etc/libvirt/libvirtd.conf
log_level = 3
log_filters = "1:qemu 1:libvirt"
log_outputs = "1:file:/var/log/libvirt/libvirtd.log"

Then restart libvirtd.

migrate.sh: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/migrate.sh
run.sh: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/run.sh


3. Run ./migrate.sh to start migration loop. Run run.sh to start remote-viewer connnects

4. When remote-viewer has connected the vm, click File --> USB device selection --> select your desktop usb device to enable usbredir. 
   
   I select keyboard or usb disk randomly.


5. Continue above in step 4 until the vm crashes.


Actual results:
Hit Bug 1752320 - vm gets stuck when migrate vm back and forth with remote-viewer trying to connect

# sh migrate.sh 
Migration: [100 %]
Now vm is on hp-dl385pg8-04.lab.eng.pek2.redhat.com
Migration: [100 %]
ow vm is on hp-dl385pg8-03.lab.eng.pek2.redhat.com
Migration: [100 %]
Now vm is on hp-dl385pg8-04.lab.eng.pek2.redhat.com
Migration: [100 %]
Now vm is on hp-dl385pg8-03.lab.eng.pek2.redhat.com
Migration: [100 %]
Now vm is on hp-dl385pg8-04.lab.eng.pek2.redhat.com
Migration: [100 %]
Now vm is on hp-dl385pg8-03.lab.eng.pek2.redhat.com
Migration: [100 %]
Now vm is on hp-dl385pg8-04.lab.eng.pek2.redhat.com
Migration: [100 %]
Now vm is on hp-dl385pg8-03.lab.eng.pek2.redhat.com

vm is running, but it gets stuck after many times migration.
# virsh list 
 Id   Name         State
----------------------------
 88   rhel820src   running

logs:
2020-01-16T08:49:02.966414Z qemu-kvm: warning: Spice: Connection reset by peer
2020-01-16T08:49:02.967464Z qemu-kvm: warning: Spice: Connection reset by peer
2020-01-16T08:49:02.967609Z qemu-kvm: warning: Spice: Connection reset by peer
2020-01-16T08:49:07.225026Z qemu-kvm: warning: chardev_can_read called on non open chardev!

2020-01-16T08:49:07.333308Z qemu-kvm: warning: chardev_can_read called on non open chardev!

2020-01-16T08:49:11.275576Z qemu-kvm: warning: chardev_can_read called on non open chardev!

2020-01-16T08:49:11.334193Z qemu-kvm: warning: chardev_can_read called on non open chardev!

warning: chardev_can_read called on non open chardev!

warning: chardev_can_read called on non open chardev!

2020-01-16T08:49:15.685457Z qemu-kvm: warning: chardev_can_read called on non open chardev!


log files:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host1_libvirtd.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host1_qemu_rhel820src.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host2_libvirtd.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host2_qemu_rhel820src.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/viewer.log




Hi hhan and David,

I want to confirm with you if i missed some special steps or configurations?
I can not reproduce this bug and Bug 1790844 - USB related fixes.
Please help check them, many thanks.

Comment 15 Han Han 2020-01-16 12:55:15 UTC
The coredump stack shows SIGABRT happended on xhci_submit. So I think the reason why you didn't reproduce the bug is because no xhci usb controller in your guest xml.
I am sorry that I forgot to provide the guest xml in this bug. But as I remember I didn't update guest xml when digging these usb bugs. So you can refer to the guest xml in https://bugzilla.redhat.com/attachment.cgi?id=1615431 .
Surely, there are qemu-xhci in my guest xml:
    <controller type='usb' index='1' model='qemu-xhci'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>

Comment 16 Han Han 2020-01-16 13:02:34 UTC
And make sure spice vmc device is attached to qemu-xhci:
    <controller type='usb' index='1' model='qemu-xhci'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
...
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='1' port='2'/>
    </redirdev>

Since it is likely a usb controller related issue, you can also research if that problem could be happened on other usb controllers like nec-xhci, *-ehci, *-uhci...

Comment 17 Xueqiang Wei 2020-01-17 10:34:58 UTC
(In reply to Han Han from comment #15)
> The coredump stack shows SIGABRT happended on xhci_submit. So I think the
> reason why you didn't reproduce the bug is because no xhci usb controller in
> your guest xml.
> I am sorry that I forgot to provide the guest xml in this bug. But as I
> remember I didn't update guest xml when digging these usb bugs. So you can
> refer to the guest xml in
> https://bugzilla.redhat.com/attachment.cgi?id=1615431 .
> Surely, there are qemu-xhci in my guest xml:
>     <controller type='usb' index='1' model='qemu-xhci'>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x00'
> function='0x0'/>
>     </controller>


Han Han,

Thank you very much. I reproduced it with qemu-xhci controller.

updated file:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/gdb.txt
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host1_qemu_rhel820src.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/rhel8.2-migration.xml


But it still hard to reproduce. I run it for two hours, just hit once.


Hit "Bug 1752320 - vm gets stuck when migrate vm back and forth with remote-viewer trying to connect" many times.

With nec-xhci controller, also hit "Hit Bug 1752320 - vm gets stuck when migrate vm back and forth with remote-viewer trying to connect
" many times.



Hi David,

I found the status of "Bug 1790844 - USB related fixes." is ON_QA, but this bug blocks it.
I want to confirm with you if I verify it until this bug is fixed. If not, I will verify it first, just focus on whether hit the new issue.

Many thanks.

Comment 18 Dr. David Alan Gilbert 2020-01-17 11:44:50 UTC
(In reply to Xueqiang Wei from comment #17)
> (In reply to Han Han from comment #15)
> > The coredump stack shows SIGABRT happended on xhci_submit. So I think the
> > reason why you didn't reproduce the bug is because no xhci usb controller in
> > your guest xml.
> > I am sorry that I forgot to provide the guest xml in this bug. But as I
> > remember I didn't update guest xml when digging these usb bugs. So you can
> > refer to the guest xml in
> > https://bugzilla.redhat.com/attachment.cgi?id=1615431 .
> > Surely, there are qemu-xhci in my guest xml:
> >     <controller type='usb' index='1' model='qemu-xhci'>
> >       <address type='pci' domain='0x0000' bus='0x02' slot='0x00'
> > function='0x0'/>
> >     </controller>
> 
> 
> Han Han,
> 
> Thank you very much. I reproduced it with qemu-xhci controller.
> 
> updated file:
> http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/
> gdb.txt
> http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/
> host1_qemu_rhel820src.log
> http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/
> rhel8.2-migration.xml
> 
> 
> But it still hard to reproduce. I run it for two hours, just hit once.
> 
> 
> Hit "Bug 1752320 - vm gets stuck when migrate vm back and forth with
> remote-viewer trying to connect" many times.
> 
> With nec-xhci controller, also hit "Hit Bug 1752320 - vm gets stuck when
> migrate vm back and forth with remote-viewer trying to connect
> " many times.
> 
> 
> 
> Hi David,
> 
> I found the status of "Bug 1790844 - USB related fixes." is ON_QA, but this
> bug blocks it.
> I want to confirm with you if I verify it until this bug is fixed. If not, I
> will verify it first, just focus on whether hit the new issue.
> 
> Many thanks.

Hi,
  The 'USB related fixes' bug has the fix for the problem reported in the first comment of this bug, and also the
bz 1752320 'vm gets stuck' bug.  It doesn't fix comment 2 and comment 9 in this bz;
that's why I want to keep this one open until we find that.

Comment 19 Xueqiang Wei 2020-01-19 07:50:36 UTC
(In reply to Dr. David Alan Gilbert from comment #18)
> (In reply to Xueqiang Wei from comment #17)
> > (In reply to Han Han from comment #15)
> > > The coredump stack shows SIGABRT happended on xhci_submit. So I think the
> > > reason why you didn't reproduce the bug is because no xhci usb controller in
> > > your guest xml.
> > > I am sorry that I forgot to provide the guest xml in this bug. But as I
> > > remember I didn't update guest xml when digging these usb bugs. So you can
> > > refer to the guest xml in
> > > https://bugzilla.redhat.com/attachment.cgi?id=1615431 .
> > > Surely, there are qemu-xhci in my guest xml:
> > >     <controller type='usb' index='1' model='qemu-xhci'>
> > >       <address type='pci' domain='0x0000' bus='0x02' slot='0x00'
> > > function='0x0'/>
> > >     </controller>
> > 
> > 
> > Han Han,
> > 
> > Thank you very much. I reproduced it with qemu-xhci controller.
> > 
> > updated file:
> > http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/
> > gdb.txt
> > http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/
> > host1_qemu_rhel820src.log
> > http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/
> > rhel8.2-migration.xml
> > 
> > 
> > But it still hard to reproduce. I run it for two hours, just hit once.
> > 
> > 
> > Hit "Bug 1752320 - vm gets stuck when migrate vm back and forth with
> > remote-viewer trying to connect" many times.
> > 
> > With nec-xhci controller, also hit "Hit Bug 1752320 - vm gets stuck when
> > migrate vm back and forth with remote-viewer trying to connect
> > " many times.
> > 
> > 
> > 
> > Hi David,
> > 
> > I found the status of "Bug 1790844 - USB related fixes." is ON_QA, but this
> > bug blocks it.
> > I want to confirm with you if I verify it until this bug is fixed. If not, I
> > will verify it first, just focus on whether hit the new issue.
> > 
> > Many thanks.
> 
> Hi,
>   The 'USB related fixes' bug has the fix for the problem reported in the
> first comment of this bug, and also the
> bz 1752320 'vm gets stuck' bug.  It doesn't fix comment 2 and comment 9 in
> this bz;
> that's why I want to keep this one open until we find that.


Thank you very much. I have verified bug1790844.

Tested it on qemu-kvm-4.2.0-6.module+el8.2.0+5451+991cea0d for 1 hour.
qemu crash due to qemu-kvm: hw/usb/redirect.c:1356: usbredir_chardev_read: Assertion `dev->read_buf == NULL' failed.
       
2020-01-19T06:23:49.342200Z qemu-kvm: usbredirparser: Received second hello message, ignoring
2020-01-19T06:23:49.342450Z qemu-kvm: usbredirparser: error invalid usb-redir packet type: 132
qemu-kvm: hw/usb/redirect.c:1356: usbredir_chardev_read: Assertion `dev->read_buf == NULL' failed.
2020-01-19 06:23:52.262+0000: shutting down, reason=crashed
     

logs:
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host1_rhel820src_qemu5451.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/host2_rhel820src_qemu5451.log
http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1786413/gdb_qemu-5451.txt

Comment 20 Dr. David Alan Gilbert 2020-01-20 10:49:45 UTC
Thanks, that looks like a new one; I think Han Han's test will keep finding more of these.


red_channel_client_event
red_channel_client_receive
red_channel_client_handle_incoming
spicemv_red_channel_client_handle_message
red_char_device_write_to_device
vmc_write
usbredir_chardev_read : dev->read_buf == NULL

lets track it here with the others.

Comment 21 Ademar Reis 2020-02-05 23:11:28 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 22 Xueqiang Wei 2020-03-18 03:00:55 UTC
set priority to low, if I was wrong, please correct me.

Comment 25 RHEL Program Management 2021-06-25 07:30:42 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 26 yduan 2021-07-27 09:33:02 UTC
This is a corner case and hard to reproduce according to comment 17, so agree to close it from my perspective.

What do you think? @dgilbert

Comment 27 Dr. David Alan Gilbert 2021-07-27 09:47:00 UTC
I think we should ask @kraxel since it's USB stuff.

Comment 28 yduan 2021-08-02 08:06:58 UTC
This is a corner case and hard to reproduce according to comment 17, so agree to close it.

Comment 29 Gerd Hoffmann 2021-10-26 11:18:04 UTC
agree on the above (and clear needinfo).


Note You need to log in before you can comment on or make changes to this bug.