Description of problem: [usb][pci-ohci][fast-train]Boot guest with 38 usb disks on the same pci-ohci usb controller,when ohci is stressed,the guest generates a calltrace Version-Release number of selected component (if applicable): How reproducible: 3/10 Steps to Reproduce: 1.Boot up guest with 38 usb disks on the same pci-ohci usb controller with following script: [root@ibm-p8-kvm-02-qe micai]# cat usb.sh #!/bin/bash CLI="/usr/libexec/qemu-kvm -name virtioblk-0828-le1 -machine pseries,accel=kvm,usb=off -m 2048M -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 -uuid 95346a10-1828-403a-a610-ac5a52a29415 -no-user-config -nodefaults -monitor stdio -rtc base=utc,clock=host -boot strict=on -device spapr-vscsi,id=scsi0,reg=0x1000 -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/root/test_home/micai/rhel80-ppc64le-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=04,bootindex=1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device spapr-vlan,netdev=hostnet0,id=net0,mac=52:54:00:c4:e7:15,reg=0x2000 -vnc 0:05 -msg timestamp=on -device VGA,id=video0,bus=pci.0,addr=0x8 -device nec-usb-xhci,id=xhci0 -device usb-kbd,id=input0,bus=xhci0.0 -device usb-mouse,id=input1,bus=xhci0.0 -device usb-tablet,id=input2,bus=xhci0.0 -chardev socket,path=/tmp/S,nowait,id=idQdLRHP,server -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=09 -device virtserialport,id=idBu8FQH,name=vs,bus=virtio_serial_pci0.0,chardev=idQdLRHP -object rng-random,filename=/dev/random,id=passthrough-rOXjKxaC -device pci-ohci,id=ohci0 -device virtio-scsi-pci,id=scsi1,bus=pci.0 " while [ ${i:=0} -lt ${1:-1} ] do qemu-img create -f qcow2 /root/test_home/micai/disk$i.qcow2 10M CLI="$CLI -blockdev node-name=disk$i,file.driver=file,driver=qcow2,file.driver=file,file.filename=/root/test_home/micai/disk$i.qcow2" CLI="$CLI -device usb-storage,drive=disk$i,id=usb-0-$i,removable=on,bus=ohci0.0" i=$((i+1)) done $CLI 2. After the guest boots up, check the usb device inside the guest: # ll /dev/sd* | wc -l [root@dhcp19-129-179 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 10M 0 rom sr1 11:1 1 10M 0 rom sr2 11:2 1 10M 0 rom sr3 11:3 1 10M 0 rom sr4 11:4 1 10M 0 rom sr5 11:5 1 10M 0 rom sr6 11:6 1 10M 0 rom sr7 11:7 1 10M 0 rom sr8 11:8 1 10M 0 rom sr9 11:9 1 10M 0 rom sr10 11:10 1 10M 0 rom sr11 11:11 1 10M 0 rom sr12 11:12 1 10M 0 rom sr13 11:13 1 10M 0 rom sr14 11:14 1 10M 0 rom sr15 11:15 1 10M 0 rom sr16 11:16 1 10M 0 rom sr17 11:17 1 10M 0 rom sr18 11:18 1 10M 0 rom sr19 11:19 1 10M 0 rom sr20 11:20 1 10M 0 rom sr21 11:21 1 10M 0 rom sr22 11:22 1 10M 0 rom sr23 11:23 1 10M 0 rom sr24 11:24 1 10M 0 rom sr25 11:25 1 10M 0 rom sr26 11:26 1 10M 0 rom sr27 11:27 1 10M 0 rom sr28 11:28 1 10M 0 rom sr29 11:29 1 10M 0 rom sr30 11:30 1 10M 0 rom sr31 11:31 1 10M 0 rom sr32 11:32 1 10M 0 rom sr33 11:33 1 10M 0 rom sr34 11:34 1 10M 0 rom sr35 11:35 1 10M 0 rom sr36 11:36 1 10M 0 rom sr37 11:37 1 10M 0 rom vda 252:0 0 20G 0 disk ├─vda1 252:1 0 4M 0 part ├─vda2 252:2 0 1G 0 part /boot └─vda3 252:3 0 19G 0 part ├─rhel_dhcp19--129--64-root 253:0 0 17G 0 lvm / └─rhel_dhcp19--129--64-swap 253:1 0 2G 0 lvm [SWAP] 3.run the scipts in the guest: [root@dhcp16-201-182 ~]# cat concurrent_dd.sh for dev in $(ls /dev/sd*) do dd if=/dev/zero of=$dev bs=1M count=10 & done Actual results: the guest generates a calltrace: [ 192.655763] ohci-pci 0000:00:01.0: OHCI Unrecoverable Error, disabled [ 192.655784] ohci-pci 0000:00:01.0: HC died; cleaning up [ 193.014595] irq 18: nobody cared (try booting with the "irqpoll" option) [ 193.014601] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 4.18.0-76.el8.ppc64le #1 [ 193.014603] Call Trace: [ 193.014645] [c00000007ffcfd10] [c000000000ca87ec] dump_stack+0xb0/0xf4 (unreliable) [ 193.014659] [c00000007ffcfd50] [c0000000001d35a8] __report_bad_irq+0x70/0x138 [ 193.014662] [c00000007ffcfde0] [c0000000001d3440] note_interrupt+0x360/0x40c [ 193.014664] [c00000007ffcfe90] [c0000000001ceb7c] handle_irq_event+0xdc/0x130 [ 193.014667] [c00000007ffcfed0] [c0000000001d4584] handle_fasteoi_irq+0xf4/0x2c0 [ 193.014669] [c00000007ffcff00] [c0000000001ccb5c] generic_handle_irq+0x4c/0x70 [ 193.014672] [c00000007ffcff20] [c0000000000188dc] __do_irq+0xac/0x210 [ 193.014676] [c00000007ffcff90] [c00000000002d074] call_do_irq+0x14/0x24 [ 193.014678] [c00000007a7c35c0] [c000000000018adc] do_IRQ+0x9c/0x130 [ 193.014680] [c00000007a7c3610] [c00000000000bb88] fast_exception_return+0x138/0x198 [ 193.014685] --- interrupt: 501 at replay_interrupt_return+0x0/0x4 LR = arch_local_irq_restore+0x74/0x90 [ 193.014687] [c00000007a7c3900] [c00000007fc80500] 0xc00000007fc80500 (unreliable) [ 193.014691] [c00000007a7c3920] [c000000000cd1e20] __do_softirq+0xe0/0x400 [ 193.014695] [c00000007a7c3a10] [c00000000013eb14] irq_exit+0x184/0x1c0 [ 193.014698] [c00000007a7c3a40] [c000000000026958] timer_interrupt+0x128/0x2f0 [ 193.014700] [c00000007a7c3aa0] [c00000000000bbc4] fast_exception_return+0x174/0x198 [ 193.014714] --- interrupt: 901 at snooze_loop+0x98/0x1d0 LR = snooze_loop+0x70/0x1d0 [ 193.014734] [c00000007a7c3d90] [c000000001105588] cpu_core_map+0x0/0x100 (unreliable) [ 193.014738] [c00000007a7c3dc0] [c000000000a31944] cpuidle_enter_state+0xa4/0x420 [ 193.014740] [c00000007a7c3e20] [c000000000189670] do_idle+0x3f0/0x480 [ 193.014743] [c00000007a7c3ea0] [c00000000018993c] cpu_startup_entry+0x3c/0x40 [ 193.014746] [c00000007a7c3ed0] [c000000000055fec] start_secondary+0x78c/0x870 [ 193.014749] [c00000007a7c3f90] [c00000000000ac70] start_secondary_prolog+0x10/0x14 [ 193.014750] handlers: [ 193.014763] [<00000000e1feb2ae>] usb_hcd_irq [ 193.014765] Disabling IRQ #18 [ 193.014815] usb 1-1: USB disconnect, device number 2 [ 193.046886] print_req_error: I/O error, dev sda, sector 0 flags 4801 [ 193.046897] Buffer I/O error on dev sda, logical block 0, lost async page write [ 193.046938] print_req_error: I/O error, dev sda, sector 128 flags 4801 [ 193.046942] Buffer I/O error on dev sda, logical block 1, lost async page write [ 193.047074] print_req_error: I/O error, dev sda, sector 384 flags 4001 [ 193.047077] Buffer I/O error on dev sda, logical block 3, lost async page write [ 193.047090] print_req_error: I/O error, dev sda, sector 256 flags 801 [ 193.047092] Buffer I/O error on dev sda, logical block 2, lost async page write [ 193.047113] print_req_error: I/O error, dev sda, sector 512 flags 4001 [ 193.047115] Buffer I/O error on dev sda, logical block 4, lost async page write [ 193.047119] print_req_error: I/O error, dev sda, sector 640 flags 4001 [ 193.047121] Buffer I/O error on dev sda, logical block 5, lost async page write [ 193.047131] print_req_error: I/O error, dev sda, sector 768 flags 1 [ 193.047133] Buffer I/O error on dev sda, logical block 6, lost async page write [ 193.047151] print_req_error: I/O error, dev sda, sector 896 flags 4801 [ 193.047153] Buffer I/O error on dev sda, logical block 7, lost async page write [ 193.047157] print_req_error: I/O error, dev sda, sector 1024 flags 4801 [ 193.047159] Buffer I/O error on dev sda, logical block 8, lost async page write [ 193.047176] print_req_error: I/O error, dev sda, sector 1152 flags 4801 [ 193.047178] Buffer I/O error on dev sda, logical block 9, lost async page write [ 193.257198] usb 1-2: USB disconnect, device number 3 [ 193.257645] sd 3:0:0:0: [sdb] Synchronizing SCSI cache [ 193.276902] sd 3:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 193.497014] usb 1-3: USB disconnect, device number 4 [ 193.497017] usb 1-3.1: USB disconnect, device number 5 [ 193.687014] usb 1-3.2: USB disconnect, device number 6 [ 193.687419] sd 5:0:0:0: [sdd] Synchronizing SCSI cache [ 193.716878] sd 5:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 193.897110] usb 1-3.3: USB disconnect, device number 7 [ 193.897773] sd 6:0:0:0: [sde] Synchronizing SCSI cache [ 193.917477] sd 6:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 194.096992] usb 1-3.4: USB disconnect, device number 8 [ 194.097380] sd 7:0:0:0: [sdf] Synchronizing SCSI cache [ 194.127294] sd 7:0:0:0: [sdf] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 194.276987] usb 1-3.5: USB disconnect, device number 9 [ 194.426976] usb 1-3.6: USB disconnect, device number 10 [ 194.427443] sd 9:0:0:0: [sdh] Synchronizing SCSI cache [ 194.456926] sd 9:0:0:0: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 194.597029] usb 1-3.7: USB disconnect, device number 11 [ 194.597508] sd 10:0:0:0: [sdi] Synchronizing SCSI cache [ 194.616878] sd 10:0:0:0: [sdi] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 194.746981] usb 1-3.8: USB disconnect, device number 12 [ 194.746984] usb 1-3.8.1: USB disconnect, device number 13 [ 194.916971] usb 1-3.8.2: USB disconnect, device number 14 [ 195.087210] usb 1-3.8.3: USB disconnect, device number 15 [ 195.306998] usb 1-3.8.4: USB disconnect, device number 16 [ 195.496932] usb 1-3.8.5: USB disconnect, device number 17 [ 195.666965] usb 1-3.8.6: USB disconnect, device number 18 [ 195.667322] sd 16:0:0:0: [sdo] Synchronizing SCSI cache [ 195.696839] sd 16:0:0:0: [sdo] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 195.846952] usb 1-3.8.7: USB disconnect, device number 19 [ 196.026933] usb 1-3.8.8: USB disconnect, device number 20 [ 196.026936] usb 1-3.8.8.1: USB disconnect, device number 21 [ 196.216943] usb 1-3.8.8.2: USB disconnect, device number 22 [ 196.217339] sd 19:0:0:0: [sdr] Synchronizing SCSI cache [ 196.236900] sd 19:0:0:0: [sdr] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 196.386946] usb 1-3.8.8.3: USB disconnect, device number 23 [ 196.387314] sd 20:0:0:0: [sds] Synchronizing SCSI cache [ 196.411710] sd 20:0:0:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 196.527088] usb 1-3.8.8.4: USB disconnect, device number 24 [ 196.527569] sd 21:0:0:0: [sdt] Synchronizing SCSI cache [ 196.556845] sd 21:0:0:0: [sdt] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 196.716943] usb 1-3.8.8.5: USB disconnect, device number 25 [ 196.717349] sd 22:0:0:0: [sdu] Synchronizing SCSI cache [ 196.736803] sd 22:0:0:0: [sdu] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 196.876933] usb 1-3.8.8.6: USB disconnect, device number 26 [ 196.877336] sd 23:0:0:0: [sdv] Synchronizing SCSI cache [ 196.896837] sd 23:0:0:0: [sdv] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 197.036912] usb 1-3.8.8.7: USB disconnect, device number 27 [ 197.037322] sd 24:0:0:0: [sdw] Synchronizing SCSI cache [ 197.067360] sd 24:0:0:0: [sdw] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 197.216904] usb 1-3.8.8.8: USB disconnect, device number 28 [ 197.216907] usb 1-3.8.8.8.1: USB disconnect, device number 29 [ 197.217959] sd 25:0:0:0: [sdx] Synchronizing SCSI cache [ 197.246793] sd 25:0:0:0: [sdx] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 197.406914] usb 1-3.8.8.8.2: USB disconnect, device number 30 [ 197.407277] sd 26:0:0:0: [sdy] Synchronizing SCSI cache [ 197.426880] sd 26:0:0:0: [sdy] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 197.586894] usb 1-3.8.8.8.3: USB disconnect, device number 31 [ 197.587341] sd 27:0:0:0: [sdz] Synchronizing SCSI cache [ 197.606813] sd 27:0:0:0: [sdz] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 197.756957] usb 1-3.8.8.8.4: USB disconnect, device number 32 [ 197.757358] sd 28:0:0:0: [sdaa] Synchronizing SCSI cache [ 197.786796] sd 28:0:0:0: [sdaa] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 197.936904] usb 1-3.8.8.8.5: USB disconnect, device number 33 [ 197.937304] sd 29:0:0:0: [sdab] Synchronizing SCSI cache [ 197.966771] sd 29:0:0:0: [sdab] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 198.106895] usb 1-3.8.8.8.6: USB disconnect, device number 34 [ 198.107338] sd 30:0:0:0: [sdac] Synchronizing SCSI cache [ 198.136784] sd 30:0:0:0: [sdac] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 198.286884] usb 1-3.8.8.8.7: USB disconnect, device number 35 [ 198.287273] sd 31:0:0:0: [sdad] Synchronizing SCSI cache [ 198.316825] sd 31:0:0:0: [sdad] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 198.506880] usb 1-3.8.8.8.8: USB disconnect, device number 36 [ 198.506884] usb 1-3.8.8.8.8.1: USB disconnect, device number 37 [ 198.546715] print_req_error: 150 callbacks suppressed [ 198.546727] print_req_error: I/O error, dev sdae, sector 128 flags 4801 [ 198.546755] buffer_io_error: 150 callbacks suppressed [ 198.546757] Buffer I/O error on dev sdae, logical block 1, lost async page write [ 198.546804] print_req_error: I/O error, dev sdae, sector 256 flags 801 [ 198.546807] Buffer I/O error on dev sdae, logical block 2, lost async page write [ 198.546857] print_req_error: I/O error, dev sdae, sector 384 flags 4001 [ 198.546860] Buffer I/O error on dev sdae, logical block 3, lost async page write [ 198.546864] print_req_error: I/O error, dev sdae, sector 640 flags 4801 [ 198.546866] Buffer I/O error on dev sdae, logical block 5, lost async page write [ 198.546876] print_req_error: I/O error, dev sdae, sector 512 flags 1 [ 198.546878] Buffer I/O error on dev sdae, logical block 4, lost async page write [ 198.546894] print_req_error: I/O error, dev sdae, sector 768 flags 4801 [ 198.546896] Buffer I/O error on dev sdae, logical block 6, lost async page write [ 198.546899] print_req_error: I/O error, dev sdae, sector 896 flags 4801 [ 198.546902] Buffer I/O error on dev sdae, logical block 7, lost async page write [ 198.546915] print_req_error: I/O error, dev sdae, sector 1024 flags 4801 [ 198.546918] Buffer I/O error on dev sdae, logical block 8, lost async page write [ 198.546921] print_req_error: I/O error, dev sdae, sector 1152 flags 4801 [ 198.546923] Buffer I/O error on dev sdae, logical block 9, lost async page write [ 198.546937] print_req_error: I/O error, dev sdae, sector 1280 flags 4801 [ 198.546939] Buffer I/O error on dev sdae, logical block 10, lost async page write [ 198.687058] usb 1-3.8.8.8.8.2: USB disconnect, device number 38 [ 198.687555] sd 33:0:0:0: [sdaf] Synchronizing SCSI cache [ 198.709543] sd 33:0:0:0: [sdaf] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 198.968135] usb 1-3.8.8.8.8.3: USB disconnect, device number 39 [ 198.970880] sd 34:0:0:0: [sdag] Synchronizing SCSI cache [ 199.006761] sd 34:0:0:0: [sdag] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 199.190264] usb 1-3.8.8.8.8.4: USB disconnect, device number 40 [ 199.224407] sd 35:0:0:0: [sdah] Synchronizing SCSI cache [ 199.237318] sd 35:0:0:0: [sdah] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 199.237346] scsi 35:0:0:0: rejecting I/O to dead device [ 199.457222] usb 1-3.8.8.8.8.5: USB disconnect, device number 41 [ 199.457797] sd 36:0:0:0: [sdai] Synchronizing SCSI cache [ 199.487025] sd 36:0:0:0: [sdai] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 199.667090] usb 1-3.8.8.8.8.6: USB disconnect, device number 42 [ 199.668181] sd 37:0:0:0: [sdaj] Synchronizing SCSI cache [ 199.689088] sd 37:0:0:0: [sdaj] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 199.937126] usb 1-3.8.8.8.8.7: USB disconnect, device number 43 [ 199.984701] sd 38:0:0:0: [sdak] Synchronizing SCSI cache [ 200.033920] sd 38:0:0:0: [sdak] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 200.239701] usb 1-3.8.8.8.8.8: USB disconnect, device number 44 Expected results: the guest is normal Additional info: if I use xhci instead of ohci,The same operation is done in guest, which is normal. host: qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2.ppc64le [root@ibm-p8-kvm-02-qe micai]# uname -r 4.18.0-74.el8.ppc64le guest: [root@dhcp16-201-182 ~]# uname -r 4.18.0-76.el8.ppc64le
I'll test it on rhel7.6 next week. Confirm for regression. thanks
According to comment0 above and rhel7.6(host and guest) did the same operation, the guest has also had the following Calltrace: [ 173.562420] blk_update_request: I/O error, dev sdz, sector 128 [ 173.562424] Buffer I/O error on dev sdz, logical block 1, lost async page write [ 173.668945] irq 18: nobody cared (try booting with the "irqpoll" option) [ 173.668951] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 3.10.0-957.10.1.el7.ppc64 #1 [ 173.668952] Call Trace: [ 173.669139] [c00000007ffd3690] [c00000000001cca8] .show_stack+0x88/0x330 (unreliable) [ 173.669241] [c00000007ffd3750] [c000000000ac4d08] .dump_stack+0x28/0x3c [ 173.669276] [c00000007ffd37c0] [c00000000020393c] .note_interrupt+0x2ec/0x460 [ 173.669278] [c00000007ffd3870] [c0000000001ff860] .handle_irq_event+0xa0/0x100 [ 173.669280] [c00000007ffd3900] [c000000000205294] .handle_fasteoi_irq+0xd4/0x240 [ 173.669282] [c00000007ffd3980] [c0000000001fe684] .generic_handle_irq+0x54/0x80 [ 173.669283] [c00000007ffd3a00] [c000000000015e6c] .__do_irq+0x8c/0x1b0 [ 173.669285] [c00000007ffd3a80] [c00000000001608c] .do_IRQ+0xfc/0x110 [ 173.669287] [c00000007ffd3b30] [c00000000000ab00] restore_check_irq_replay+0x2c/0x70 [ 173.669289] --- Exception: 501 at .arch_local_irq_restore+0xf0/0x140 LR = .arch_local_irq_restore+0xf0/0x140 [ 173.669357] [c00000007ffd3e20] [c000000001702280] jiffies+0x0/0x80 (unreliable) [ 173.669359] [c00000007ffd3e90] [c0000000000feb84] .__do_softirq+0xf4/0x390 [ 173.669360] [c00000007ffd3f90] [c00000000002bf8c] .call_do_softirq+0x14/0x24 [ 173.669362] [c00000007eeb3740] [c0000000000162a0] .do_softirq+0x120/0x170 [ 173.669363] [c00000007eeb37d0] [c0000000000ff174] .irq_exit+0x1e4/0x1f0 [ 173.669365] [c00000007eeb3860] [c000000000025b2c] .timer_interrupt+0x9c/0xd0 [ 173.669367] [c00000007eeb38e0] [c00000000000ab28] restore_check_irq_replay+0x54/0x70 [ 173.669397] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4 LR = .shared_cede_loop+0x48/0x90 [ 173.669399] [c00000007eeb3bd0] [c000000000864484] .shared_cede_loop+0x34/0x90 (unreliable) [ 173.669402] [c00000007eeb3c50] [c000000000861a58] .cpuidle_idle_call+0x128/0x430 [ 173.669403] [c00000007eeb3d30] [c0000000000aa4b8] .pseries_lpar_idle+0x18/0x60 [ 173.669405] [c00000007eeb3da0] [c00000000001d68c] .arch_cpu_idle+0x6c/0x160 [ 173.669407] [c00000007eeb3e20] [c00000000018acf0] .cpu_startup_entry+0x170/0x1e0 [ 173.669409] [c00000007eeb3ed0] [c000000000056860] .start_secondary+0x310/0x340 [ 173.669410] [c00000007eeb3f90] [c000000000009b6c] start_secondary_prolog+0x10/0x14 [ 173.669411] handlers: [ 173.669413] [<c0000000016a98c8>] .usb_hcd_irq [ 173.669414] Disabling IRQ #18 So this is not a regression bug,thanks
Minjia, Thanks for the confirmation that this isn't a regression. This is probably a bug under load in the OHCI qemu device, which isn't very widely used or tested. Since we're now encouraging people to use XHCI most of the time, this will remain low priority and may eventually be closed WONTFIX.
I think the stress condition can cause an overflow on the link counter: qemu/hw/usb/hcd-ohci.c: ... 44 45 #define ED_LINK_LIMIT 32 46 ... 1182 /* Service an endpoint list. Returns nonzero if active TD were found. */ 1183 static int ohci_service_ed_list(OHCIState *ohci, uint32_t head, int completion) 1184 { ... 1204 if (++link_cnt > ED_LINK_LIMIT) { 1205 ohci_die(ohci); 1206 return 0; 1207 } ... This limit has been indroduced by: commit 95ed56939eb2eaa4e2f349fe6dcd13ca4edfd8fb Author: Li Qiang <liqiang6-s> Date: Tue Feb 7 02:23:33 2017 -0800 usb: ohci: limit the number of link eds The guest may builds an infinite loop with link eds. This patch limit the number of linked ed to avoid this. Signed-off-by: Li Qiang <liqiang6-s> Message-id: 5899a02e.45ca240a.6c373.93c1.com Signed-off-by: Gerd Hoffmann <kraxel> and increased from 4 to 32 by: commit ab6b1105a2259c7072905887f71caa850ce63190 Author: Gerd Hoffmann <kraxel> Date: Tue Mar 7 09:40:18 2017 +0100 ohci: relax link check The strict td link limit added by commit "95ed569 usb: ohci: limit the number of link eds" causes problems with macos guests. Lets raise the limit. Reported-by: Programmingkid <programmingkidx> Reported-by: Howard Spoelstra <hsp.cat7> Signed-off-by: Gerd Hoffmann <kraxel> Reviewed-by: BALATON Zoltan <balaton.hu> Reviewed-by: John Arbuckle <programmingkidx> Message-id: 1488876018-31576-1-git-send-email-kraxel
I have downloaded comment6 package, test for 10 times, inside the guest did not produce calltrace. But I have found in dmesg once this error(Then I did not see this error again): [ 18.263280] sd 39:0:0:0: Power-on or device reset occurred [ 18.274607] sd 39:0:0:0: [sdal] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) [ 18.286287] sd 39:0:0:0: [sdal] Write Protect is off [ 18.286297] sd 39:0:0:0: [sdal] Mode Sense: 63 00 00 08 [ 18.292713] sd 39:0:0:0: [sdal] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 18.369250] sd 39:0:0:0: [sdal] Attached SCSI removable disk [ 525.094612] usb 1-3.8.8.8.8.2: reset full-speed USB device number 38 using ohci-pci [ 526.342169] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) [ 527.404688] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) [ 528.415348] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) [ 529.456521] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) [ 534.064617] usb 1-3.8.8.8.3: reset full-speed USB device number 31 using ohci-pci [ 534.334606] usb 1-3.8.8.4: reset full-speed USB device number 24 using ohci-pci [ 534.604605] usb 1-3.8.8.6: reset full-speed USB device number 26 using ohci-pci [ 534.874601] usb 1-3.8.2: reset full-speed USB device number 14 using ohci-pci [ 534.981245] ohci-pci 0000:00:01.0: leak ed 0000000028eab42d (#00) state 2 [ 535.114603] usb 1-3.8.8.8.8.8: reset full-speed USB device number 44 using ohci-pci [ 535.238339] ohci-pci 0000:00:01.0: leak ed 0000000022370ecd (#00) state 2 host: [root@ibm-p8-garrison-03 ~]# rpm -qa|grep qemu qemu-kvm-block-curl-3.1.0-18.el8.BZ1686705.ppc64le qemu-img-3.1.0-18.el8.BZ1686705.ppc64le qemu-guest-agent-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-rbd-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-tests-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-debugsource-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-common-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-iscsi-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-ssh-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-core-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-tests-3.1.0-18.el8.BZ1686705.ppc64le qemu-img-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-iscsi-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-ssh-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-core-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-guest-agent-3.1.0-18.el8.BZ1686705.ppc64le ipxe-roms-qemu-20181214-1.git133f4c47.el8.noarch qemu-kvm-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-rbd-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-block-curl-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le qemu-kvm-common-debuginfo-3.1.0-18.el8.BZ1686705.ppc64le [root@ibm-p8-garrison-03 ~]# rpm -qa|grep SLOF SLOF-20180702-3.git9b7ab2f.module+el8+2717+98011079.noarch [root@ibm-p8-garrison-03 ~]# UNAME -R -bash: UNAME: command not found [root@ibm-p8-garrison-03 ~]# uname -r 4.18.0-80.el8.ppc64le
(In reply to Minjia Cai from comment #8) > I have downloaded comment6 package, test for 10 times, inside the guest did > not produce calltrace. b This seems not ppc64le specific. I don't think we can increase the ED_LINK_LIMIT indefinitely, but what could be the good value? Gerd, do you think we should increase ED_LINK_LIMIT? > But I have found in dmesg once this error(Then I did not see this error > again): > [ 18.263280] sd 39:0:0:0: Power-on or device reset occurred > [ 18.274607] sd 39:0:0:0: [sdal] 20480 512-byte logical blocks: (10.5 > MB/10.0 MiB) > [ 18.286287] sd 39:0:0:0: [sdal] Write Protect is off > [ 18.286297] sd 39:0:0:0: [sdal] Mode Sense: 63 00 00 08 > [ 18.292713] sd 39:0:0:0: [sdal] Write cache: enabled, read cache: > enabled, doesn't support DPO or FUA > [ 18.369250] sd 39:0:0:0: [sdal] Attached SCSI removable disk > [ 525.094612] usb 1-3.8.8.8.8.2: reset full-speed USB device number 38 > using ohci-pci > [ 526.342169] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) > [ 527.404688] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) > [ 528.415348] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) > [ 529.456521] usb 1-3.8.8.8.8-port2: cannot reset (err = -110) This is a timeout. I think this is caused by the stress test. I don't think we can do anything for that.
Instead of raising the limit we could try to not ohci_dir(), but just stop processing the descriptor list. The next frame timer tick should resume things. Something like this (untested): diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c index 196a9f72002d..81cf5ab7a5a7 100644 --- a/hw/usb/hcd-ohci.c +++ b/hw/usb/hcd-ohci.c @@ -1200,7 +1200,7 @@ static int ohci_service_ed_list(OHCIState *ohci, uint32_t head, int completion) if (head == 0) return 0; - for (cur = head; cur; cur = next_ed) { + for (cur = head; cur && link_cnt++ < ED_LINK_LIMIT; cur = next_ed) { if (ohci_read_ed(ohci, cur, &ed)) { trace_usb_ohci_ed_read_error(cur); ohci_die(ohci); @@ -1209,11 +1209,6 @@ static int ohci_service_ed_list(OHCIState *ohci, uint32_t head, int completion) next_ed = ed.next & OHCI_DPTR_MASK; - if (++link_cnt > ED_LINK_LIMIT) { - ohci_die(ohci); - return 0; - } - if ((ed.head & OHCI_ED_H) || (ed.flags & OHCI_ED_K)) { uint32_t addr; /* Cancel pending packets for ED that have been paused. */
Note on comment 9: > This seems not ppc64le specific. It's not ppc specific from a code or upstream point of view, but it is from a downstream testing point of view, since only POWER uses the ohci device.
Patch sent upstream: ohci: don't die on ED_LINK_LIMIT overflow https://patchwork.ozlabs.org/patch/1059900/
Patch is now merged: commit ab8789987f5e8d09b71e7425f3980b259967e17c Author: Laurent Vivier <lvivier> Date: Thu Mar 21 09:52:12 2019 +0100 ohci: don't die on ED_LINK_LIMIT overflow Stop processing the descriptor list instead. The next frame timer tick will resume the work Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686705 Suggested-by: Gerd Hoffmann <kraxel> Signed-off-by: Laurent Vivier <lvivier> Message-id: 20190321085212.10796-1-lvivier Signed-off-by: Gerd Hoffmann <kraxel>
I've downloaded the latest qemu package and tested it several times using the comment0 test steps, Check guest dmesg, didn't find any error and calltrace. HOST: [root@ibm-p9b-04 kar]# rpm -qa|grep qemu qemu-kvm-common-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le qemu-kvm-block-rbd-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le ipxe-roms-qemu-20181214-1.git133f4c47.el8.noarch qemu-kvm-core-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le libvirt-daemon-driver-qemu-5.4.0-1.module+el8.1.0+3304+7eb41d5f.ppc64le qemu-img-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le qemu-kvm-block-curl-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le qemu-kvm-block-iscsi-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le qemu-kvm-block-ssh-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71.ppc64le (workspace) [root@ibm-p9b-04 kar]# uname -r 4.18.0-103.el8.ppc64le guest dmesg: [ 15.168178] usb 1-3.8.8.8.8.8: New USB device found, idVendor=46f4, idProduct=0001, bcdDevice= 0.00 [ 15.168181] usb 1-3.8.8.8.8.8: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 15.168183] usb 1-3.8.8.8.8.8: Product: QEMU USB HARDDRIVE [ 15.168185] usb 1-3.8.8.8.8.8: Manufacturer: QEMU [ 15.168186] usb 1-3.8.8.8.8.8: SerialNumber: 1-pci@800000020000000:01.0-3.8.8.8.8.8 [ 15.172185] usb-storage 1-3.8.8.8.8.8:1.0: USB Mass Storage device detected [ 15.203352] scsi host39: usb-storage 1-3.8.8.8.8.8:1.0 [ 15.224119] sd 35:0:0:0: [sdah] Attached SCSI removable disk [ 15.370129] scsi 36:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 [ 15.370392] sd 36:0:0:0: Attached scsi generic sg34 type 0 [ 15.384369] sd 36:0:0:0: Power-on or device reset occurred [ 15.393156] sd 36:0:0:0: [sdai] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) [ 15.400644] sd 36:0:0:0: [sdai] Write Protect is off [ 15.400647] sd 36:0:0:0: [sdai] Mode Sense: 63 00 00 08 [ 15.406540] sd 36:0:0:0: [sdai] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 15.469261] sd 36:0:0:0: [sdai] Attached SCSI removable disk [ 15.690631] scsi 37:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 [ 15.693029] sd 37:0:0:0: Attached scsi generic sg35 type 0 [ 15.700360] sd 37:0:0:0: Power-on or device reset occurred [ 15.710477] sd 37:0:0:0: [sdaj] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) [ 15.716494] sd 37:0:0:0: [sdaj] Write Protect is off [ 15.716496] sd 37:0:0:0: [sdaj] Mode Sense: 63 00 00 08 [ 15.722338] sd 37:0:0:0: [sdaj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 15.782166] sd 37:0:0:0: [sdaj] Attached SCSI removable disk [ 15.935216] scsi 38:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 [ 15.935443] sd 38:0:0:0: Attached scsi generic sg36 type 0 [ 15.945187] sd 38:0:0:0: Power-on or device reset occurred [ 15.955426] sd 38:0:0:0: [sdak] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) [ 15.962447] sd 38:0:0:0: [sdak] Write Protect is off [ 15.962450] sd 38:0:0:0: [sdak] Mode Sense: 63 00 00 08 [ 15.967142] sd 38:0:0:0: [sdak] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 16.032559] sd 38:0:0:0: [sdak] Attached SCSI removable disk [ 16.248164] scsi 39:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 [ 16.248390] sd 39:0:0:0: Attached scsi generic sg37 type 0 [ 16.258167] sd 39:0:0:0: Power-on or device reset occurred [ 16.268181] sd 39:0:0:0: [sdal] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB) [ 16.274258] sd 39:0:0:0: [sdal] Write Protect is off [ 16.274260] sd 39:0:0:0: [sdal] Mode Sense: 63 00 00 08 [ 16.280202] sd 39:0:0:0: [sdal] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 16.341233] sd 39:0:0:0: [sdal] Attached SCSI removable disk [ 20.121915] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 25.023824] fuse init (API version 7.27) I think this bug can be verified. Please feel free to contact me if you have any questions
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723