Bug 1805656

Summary:	Guest hang after "echo 1 > /sys/bus/pci/devices/$vhost_user_nic_pcie/reset"
Product:	Red Hat Enterprise Linux 9	Reporter:	Pei Zhang <pezhang>
Component:	qemu-kvm	Assignee:	Eugenio Pérez Martín <eperezma>
qemu-kvm sub component:	Networking	QA Contact:	Pei Zhang <pezhang>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	high
Priority:	high	CC:	aadam, ailan, ameynarkhede03, chayang, eperezma, jinzhao, juzhang, maxime.coquelin, smitterl, virt-maint
Version:	unspecified	Keywords:	Triaged
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-12-01 07:27:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1948358

Description Pei Zhang 2020-02-21 10:17:01 UTC

Description of problem:
Boot guest with vhost-user NIC, then reset vhost-user PCIe in guest, guest will hang after several seconds. 

Version-Release number of selected component (if applicable):
4.18.0-180.el8.x86_64
qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot OVS with vhost-user client

2. Boot qemu with vhost-user server

/usr/libexec/qemu-kvm \
-name guest=rhel8.2 \
-machine q35,kernel_irqchip=split \
-cpu host \
-m 8192 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8589934592,host-nodes=0,policy=bind \
-numa node,memdev=ram-node0 \
-smp 6,sockets=6,cores=1,threads=1 \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/images_nfv-virt-rt-kvm/rhel8.2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.3,addr=0x0 \
-monitor stdio \
-vnc :2 \

3. In guest, reset vhost-user NIC PCIe 

# echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/reset

4. After several seconds, guest hang. However both qemu and host work well, no any error.


Actual results:
Guest hang after reset vhost-user NIC PCIe in guest.

Expected results:
Guest should keep working well after reset vhost-user NIC PCIe in guest.

Additional info:
1. virtio-rng-pci and virtio-net-pci vhost works well with reset.

-device virtio-rng-pci,id=dev0,bus=pci.6,addr=0x0 \

-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \

2. This bug was found during verification of Bug 1678365.

Comment 2 Maxime Coquelin 2020-06-18 08:15:15 UTC

I manage to reproduce the issue with both RHL8.3 in host and guest.
I get a softlockup in guest, seems virtnet_send_command never completes:

[  196.018102] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-1.module+el8.3.0+6423+e4cb6418 04/01/2014
[  196.019266] RIP: 0010:virtnet_send_command+0x100/0x150 [virtio_net]
[  196.020066] Code: 74 24 48 e8 e2 74 5a d5 48 8b 7b 08 e8 e9 57 5a d5 84 c0 75 11 eb 22 48 8b 7b 08 e8 7a 52 5a d5 84 c0 75 15 f3 90 48 8b 7b 08 <48> 8d 74 24 04 e8 16 61 5a d5 48 85 c0 74 de 48 8b 83 58 01 00 00
[  196.022418] RSP: 0018:ffffaa0000b1fa68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[  196.023377] RAX: 0000000000000000 RBX: ffff88ea736b6ac0 RCX: 0000000000000001
[  196.024286] RDX: 0000000000000000 RSI: ffffaa0000b1fa6c RDI: ffff88ea7269d180
[  196.025192] RBP: 0000000000000002 R08: 0000771640000000 R09: ffff88ea736b6ac0
[  196.026091] R10: 0000000171213000 R11: 0000000000000000 R12: ffffaa0000b1fa90
[  196.026999] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff9679c3c0
[  196.027904] FS:  00007fac4745a3c0(0000) GS:ffff88eabbb00000(0000) knlGS:0000000000000000
[  196.028923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  196.029658] CR2: 00007f3449ad3000 CR3: 00000001771b0001 CR4: 0000000000760ee0
[  196.030559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  196.031466] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  196.032364] PKRU: 55555554
[  196.032718] Call Trace:
[  196.033038]  virtnet_set_rx_mode+0xbc/0x330 [virtio_net]
[  196.033721]  __dev_mc_del+0x64/0x70
[  196.034171]  igmp6_group_dropped+0xee/0x200
[  196.034715]  ? netlink_broadcast_filtered+0x145/0x400
[  196.035356]  __ipv6_dev_mc_dec+0xbc/0x130
[  196.035871]  addrconf_leave_solict.part.65+0x42/0x60
[  196.036513]  __ipv6_ifa_notify+0x10a/0x320
[  196.037036]  addrconf_ifdown+0x2b9/0x570
[  196.037543]  addrconf_notify+0x24c/0xaf0
[  196.038048]  ? copy_overflow+0x20/0x20
[  196.038532]  ? copy_overflow+0x20/0x20
[  196.039019]  ? __do_proc_dointvec+0x21d/0x410
[  196.039578]  ? dev_disable_change+0x4c/0x80
[  196.040115]  dev_disable_change+0x4c/0x80
[  196.040635]  addrconf_sysctl_disable+0x11e/0x1a0
[  196.041227]  ? dev_disable_change+0x80/0x80
[  196.041772]  proc_sys_call_handler+0x1a5/0x1c0
[  196.042342]  vfs_write+0xa5/0x1a0
[  196.042776]  ksys_write+0x4f/0xb0
[  196.043205]  do_syscall_64+0x5b/0x1a0
[  196.043682]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  196.044324] RIP: 0033:0x7fac44c50847
[  196.044788] Code: c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 1b fd ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 54 fd ff ff 48
[  196.047135] RSP: 002b:00007fff067bcf50 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[  196.048097] RAX: ffffffffffffffda RBX: 000000000000001c RCX: 00007fac44c50847
[  196.049001] RDX: 0000000000000002 RSI: 00007fff067bcf80 RDI: 000000000000001c
[  196.049903] RBP: 00007fff067bcf80 R08: 0000000000000000 R09: 00007fac449f9d40
[  196.050809] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
[  196.051709] R13: 000000000000001c R14: 0000000000000000 R15: 00007fff067bcf80
[  202.014002] rcu: INFO: rcu_sched self-detected stall on CPU
[  202.014720] rcu: 	2-....: (59922 ticks this GP) idle=29a/1/0x4000000000000002 softirq=12857/12857 fqs=14951
[  202.015958] 	(t=60000 jiffies g=18969 q=540)
[  202.016503] NMI backtrace for cpu 2
[  202.016952] CPU: 2 PID: 752 Comm: NetworkManager Kdump: loaded Tainted: G             L   --------- -  - 4.18.0-215.el8.x86_64 #1

Comment 6 Pei Zhang 2021-01-21 03:36:08 UTC

This issue still exists with latest rhel8.4-av.

Versions:
4.18.0-276.el8.x86_64
qemu-kvm-5.2.0-3.module+el8.4.0+9499+42e58f08.x86_64

Comment 15 Amey Narkhede 2021-05-11 17:52:40 UTC

Currently there is hard coded policy/ordering of reset methods in kernel
(Dev specific->FLR->AF_FLR->Power Management->slot->bus).
I proposed a patch that would let user to see all supported reset
methods and call the specific one through new reset_methods sysfs attribute.
https://lore.kernel.org/linux-pci/20210409192324.30080-1-ameynarkhede03@gmail.com/

Can you test which reset method is being used by vhost-user-nic using that patch?

Comment 16 Eugenio Pérez Martín 2021-05-27 18:45:42 UTC

(In reply to Amey Narkhede from comment #15)
> Currently there is hard coded policy/ordering of reset methods in kernel
> (Dev specific->FLR->AF_FLR->Power Management->slot->bus).
> I proposed a patch that would let user to see all supported reset
> methods and call the specific one through new reset_methods sysfs attribute.
> https://lore.kernel.org/linux-pci/20210409192324.30080-1-
> ameynarkhede03/
> 
> Can you test which reset method is being used by vhost-user-nic using that
> patch?

Hi Amey. Thanks for the suggestion.

I'm not able to see any reset_method file under /sys/bus/pci/devices/ after
apply your path on top of v5.12-rc2. Am I missing something?

I'm not able to apply it over latest master due to conflicts, in case you want
to send an updated version.

Thanks!

Comment 18 Eugenio Pérez Martín 2021-05-31 06:08:00 UTC

Hi Amey.

Not sure about what failed. The output of the sys file is flr,pm,bus.

Thanks!

Comment 19 Amey Narkhede 2021-05-31 13:27:10 UTC

Hi Eugenio,
Can you try writing pm and bus to reset_method file and then perform the reset?
# echo pm > /sys/bus/..../reset_method
Then try performing reset by 
# echo 1 > /sys/bus/..../reset
You can try same steps for the bus reset.

Also you can use latest version of patches from here
https://lore.kernel.org/linux-pci/20210529192527.2708-1-ameynarkhede03@gmail.com/T/#t
if you get merge conflicts.

Comment 20 Eugenio Pérez Martín 2021-05-31 17:05:21 UTC

(In reply to Amey Narkhede from comment #19)
> Hi Eugenio,
> Can you try writing pm and bus to reset_method file and then perform the
> reset?
> # echo pm > /sys/bus/..../reset_method
> Then try performing reset by 
> # echo 1 > /sys/bus/..../reset
> You can try same steps for the bus reset.
> 
> Also you can use latest version of patches from here
> https://lore.kernel.org/linux-pci/20210529192527.2708-1-ameynarkhede03@gmail.
> com/T/#t
> if you get merge conflicts.

Hi Armey.

Thank you very much, the soft lockup is gone with pm.

Could you expand on the differences of these methods? Would it be right to switch to
pm or does it have undesired consequences?

Thanks!

Comment 21 Amey Narkhede 2021-06-01 07:43:25 UTC

(In reply to Eugenio Pérez Martín from comment #20)
> (In reply to Amey Narkhede from comment #19)
> > Hi Eugenio,
> > Can you try writing pm and bus to reset_method file and then perform the
> > reset?
> > # echo pm > /sys/bus/..../reset_method
> > Then try performing reset by 
> > # echo 1 > /sys/bus/..../reset
> > You can try same steps for the bus reset.
> > 
> > Also you can use latest version of patches from here
> > https://lore.kernel.org/linux-pci/20210529192527.2708-1-ameynarkhede03@gmail.
> > com/T/#t
> > if you get merge conflicts.
> 
> Hi Armey.
> 
> Thank you very much, the soft lockup is gone with pm.
> 
> Could you expand on the differences of these methods? Would it be right to
> switch to
> pm or does it have undesired consequences?
> 
> Thanks!

I think difference is device specific.
Looks like the problem is in FLR implementation in vhost-user NIC.
Can you try pinging on qemu mailing list?

Thanks,
Amey

Comment 24 John Ferlan 2021-09-09 12:53:10 UTC

Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 25 RHEL Program Management 2021-12-01 07:27:14 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 26 Pei Zhang 2022-01-04 12:22:09 UTC

Testing update:

This issue cannot be reproduced with latest rhel9.0.  

Versions:

5.14.0-39.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64
openvswitch2.15-2.15.0-33.el9fdp.x86_64

Following steps in Description, guest keeps working well. 

So this issue is gone. Move status to CurrentRelease.