Bug 717783

Summary: [Xen][5.4] PCIe-hotplug doesn't work on Dom0.
Product: Red Hat Enterprise Linux 5 Reporter: asilva <asilva>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: drjones, jmunilla, leiwang, moshiro, pcao, qwan, tmuneda, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-07 06:53:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description asilva 2011-06-29 20:29:13 UTC
> Description of problem:
PCIe hotplug was tested on 2.6.18-164.37.1.el5xen, it does not work. The
test result is as follows.

* Information from lspci:
0c:00.0 Fibre Channel: Emulex Corporation Saturn: LightPulse Fibre Channel Host Adapter (rev 03)
0d:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
0d:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)

* Test 82571EB hotplug
[root@pq3-3 ~]# cat /sys/bus/pci/slots/0013_0022/address
0000:0d:00
[root@pq3-3 ~]# cat /sys/bus/pci/slots/0013_0022/power
1
[root@pq3-3 ~]# echo 0 >  /sys/bus/pci/slots/0013_0022/power
[root@pq3-3 ~]# cat /sys/bus/pci/slots/0013_0022/power
0
[root@pq3-3 ~]# echo 1 >  /sys/bus/pci/slots/0013_0022/power

This echo command never return from kernel. The following message
is printed on console again and again...

BUG: soft lockup - CPU#0 stuck for 10s! [bash:7535]
CPU 0:
Modules linked in: pciehp netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat
ip_nat bridge autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netb
ios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_
REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ib_iser rdma_cm ib_cm iw_cm
 ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api ui
o cxgb3i cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi
 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backl
ight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport sg pcspkr i2c_i801 i2c_core igb serial_core 8021q serio_raw e1000e dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage lpfc scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 7535, comm: bash Not tainted 2.6.18-164.37.1.el5xen #1
RIP: e030:[<ffffffff8020622a>]  [<ffffffff8020622a>] hypercall_page+0x22a/0x1000
RSP: e02b:ffff8803baabbce8  EFLAGS: 00000246
RAX: 0000000000030001 RBX: 0000000000000000 RCX: ffffffff8020622a
RDX: ffffffffff578000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000ffc R08: 0000000000000004 R09: ffff8803baabbd1c
R10: 0000000000000ffc R11: 0000000000000246 R12: ffff8803bf14de00
R13: 00000000ffffffea R14: 0000000000000000 R15: ffff8803baabbdc0
FS:  00002b9d3710ee10(0000) GS:ffffffff805cb000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff8040fdbe>] pci_conf1_read+0x0/0xc0
 [<ffffffff803aeafa>] force_evtchn_callback+0xa/0xb
 [<ffffffff80347e6b>] pci_bus_read_config_dword+0x70/0x82
 [<ffffffff887b243f>] :pciehp:program_fw_provided_values+0x137/0x3b0
 [<ffffffff887b2860>] :pciehp:pciehp_configure_device+0x1a8/0x200
 [<ffffffff887b1313>] :pciehp:pciehp_enable_slot+0x386/0x49d
 [<ffffffff80288c47>] default_wake_function+0x0/0xe
 [<ffffffff80350d4d>] power_write_file+0xa5/0x111
 [<ffffffff802fe0d9>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80217361>] vfs_write+0xce/0x174
 [<ffffffff80217b99>] sys_write+0x45/0x6e
 [<ffffffff802602f9>] tracesys+0xab/0xb6

* Test lpfc hotplug
[root@pq3-3 ~]# cat /sys/bus/pci/slots/0012_0021/address
0000:0c:00
[root@pq3-3 ~]# cat /sys/bus/pci/slots/0012_0021/power
1
[root@pq3-3 ~]# echo 0 > /sys/bus/pci/slots/0012_0021/power
[root@pq3-3 ~]# echo 1 > /sys/bus/pci/slots/0012_0021/power

The result is the same as above. Soft lockup message is displayed repeatedly.

> Version-Release number of selected component (if applicable):
- kernel 2.6.18-164.37.1.el5xen
- Server: FUJITSU PRIMEQUEST 1800

> How reproducible:
Always

> Additional info:
- The problem is not occured on non-xen environment.

==Console log on native (non-xen) environment==
[Fri Jun 17 16:31:59.705 2011] sh-3.2# echo 1 > /sys/bus/pci/slots/0015_0008/power
[Fri Jun 17 16:32:21.428 2011] sh-3.2# 
[Fri Jun 17 16:32:25.500 2011] sh-3.2# cat /sys/bus/pci/slots/0015_0008/power
[Fri Jun 17 16:32:59.438 2011] 1
[Fri Jun 17 16:32:59.438 2011] sh-3.2# 

[Fri Jun 17 16:33:36.836 2011] sh-3.2# echo 1 > /sys/bus/pci/slots/0014_0007/power
[Fri Jun 17 16:33:54.254 2011] sh-3.2# 
[Fri Jun 17 16:33:57.951 2011] sh-3.2# cat /sys/bus/pci/slots/0014_0007/power
[Fri Jun 17 16:33:59.979 2011] 1
[Fri Jun 17 16:33:59.979 2011] sh-3.2# 

==grub.conf which uses (following text exists on grub.conf on sosreport)
title Red Hat Enterprise Linux Server SingleUser (2.6.18-164.28.1.el5)
	root (hd0,0)
	kernel /vmlinuz-2.6.18-164.28.1.el5 ro root=LABEL=/ mce=0 rhgb quiet nmi_watchdog=0 console=tty0 crashkernel=512M@32M 1
	initrd /initrd-2.6.18-164.28.1.el5.img

- Both kernels (Xen and generic) have PCIE hotplug support compiled in.

CONFIG_HOTPLUG_PCI_PCIE=m