Bug 1241886
| Summary: | hot plugged pci devices won't appear unless reboot | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Zhengtong <zhengtli> |
| Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.2 | CC: | bugproxy, dgibson, dzheng, gsun, hannsj_uhl, knoel, lvivier, michen, mrezanin, qzhang, thuth, virt-maint, zhengtli, zhwang |
| Target Milestone: | rc | ||
| Target Release: | 7.2 | ||
| Hardware: | ppc64le | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qemu-kvm-rhev-2.3.0-19.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-12-04 16:49:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1182027, 1182040 | ||
| Bug Blocks: | 1172478, 1201513 | ||
|
Description
Zhengtong
2015-07-10 10:29:33 UTC
Is rtas_errd running inside the guest? This is necessary for the guest to see hotplug actions. (In reply to David Gibson from comment #2) > Is rtas_errd running inside the guest? This is necessary for the guest to > see hotplug actions. Yes ,running inside guest [root@dhcp71-167 ~]# ps aux | grep rtas root 3643 0.0 0.0 5376 4096 ? Ss 06:14 0:00 /usr/sbin/rtas_errd root 5843 0.0 0.0 110784 2816 pts/0 S+ 06:16 0:00 grep --color=auto rtas and in boot.log [root@dhcp71-167 ~]# cat /var/log/boot.log ... [^[[32m OK ^[[0m] Started opal_errd (PowerNV platform error handling) Service. [^[[32m OK ^[[0m] Started ppc64-diag rtas_errd (platform error handling) Service. ... Ok. It looks like you are running a RHEL7.1 guest, which might not have a recent enough rtas_errd to work correctly with hotplug. Does the problem also occur with a RHEL7.2 snapshot? Also do you see anything new in the output of "dmesg" or "journalctl" after you did the hotplug? (In reply to David Gibson from comment #4) > Ok. > > It looks like you are running a RHEL7.1 guest, which might not have a recent > enough rtas_errd to work correctly with hotplug. Does the problem also > occur with a RHEL7.2 snapshot? Yes , after I try again with rhel7.2, found it also happened with RHEL7.2 guest. I just tried to hotplug a virtio-serial device to my guest with the following command: virsh qemu-monitor-command --hmp thuth-virtio-le "device_add virtio-serial-pci,id=virtio-serial1" and I got the following messages in the output of "dmesg": [ 122.904310] virtio-pci 0000:00:02.0: enabling device (0000 -> 0003) [ 122.904958] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver [ 123.044829] virtio_console virtio3: Error -2 initializing vqs [ 123.044907] virtio_console: probe of virtio3 failed with error -2 Maybe that's a hint? Zhengtong, do you get a similar message? Zhengtong, could you please also check whether the problem also occurs if you try to hot-plug a second device in the same way? ... for me, it seems like it only happens for the first device that I try to hotplug. (In reply to Thomas Huth from comment #8) > Zhengtong, could you please also check whether the problem also occurs if > you try to hot-plug a second device in the same way? ... for me, it seems > like it only happens for the first device that I try to hotplug. This is the dmesg info while I hot plug 3 devices below: (qemu) device_add virtio-serial-pci,id=virtio-serial1 (qemu) device_add virtio-serial-pci,id=virtio-serial2 (qemu) device_add virtio-serial-pci,id=virtio-serial3 dmesg: [ 122.793888] RTAS: event: 1, Type: Unknown, Severity: 1 [ 123.156661] pci 0000:00:00.0: [1af4:1003] type 00 class 0x078000 [ 123.157143] pci 0000:00:00.0: reg 0x10: [io 0x10000-0x1001f] [ 123.157297] pci 0000:00:00.0: reg 0x14: [mem 0x00000000-0x00000fff] [ 123.159349] pci 0000:00:00.0: BAR 1: assigned [mem 0x100a1000000-0x100a1000fff] [ 123.159404] pci 0000:00:00.0: BAR 0: assigned [io 0x10000-0x1001f] [ 123.159838] virtio-pci 0000:00:00.0: enabling device (0000 -> 0003) [ 123.160623] virtio-pci 0000:00:00.0: virtio_pci: leaving for legacy driver [ 123.174682] virtio_console virtio1: Error -2 initializing vqs [ 123.174784] virtio_console: probe of virtio1 failed with error -2 [ 130.109147] RTAS: event: 2, Type: Unknown, Severity: 1 [ 130.446936] pci 0000:00:05.0: [1af4:1003] type 00 class 0x078000 [ 130.447267] pci 0000:00:05.0: reg 0x10: [io 0x10000-0x1001f] [ 130.447414] pci 0000:00:05.0: reg 0x14: [mem 0x00000000-0x00000fff] [ 130.449154] pci 0000:00:05.0: BAR 1: assigned [mem 0x100a1001000-0x100a1001fff] [ 130.449221] pci 0000:00:05.0: BAR 0: assigned [io 0x10040-0x1005f] [ 130.449726] virtio-pci 0000:00:05.0: enabling device (0000 -> 0003) [ 130.451221] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver [ 156.748449] RTAS: event: 3, Type: Unknown, Severity: 1 [ 157.135089] pci 0000:00:06.0: [1af4:1003] type 00 class 0x078000 [ 157.135424] pci 0000:00:06.0: reg 0x10: [io 0x10000-0x1001f] [ 157.135573] pci 0000:00:06.0: reg 0x14: [mem 0x00000000-0x00000fff] [ 157.137360] pci 0000:00:06.0: BAR 1: assigned [mem 0x100a1002000-0x100a1002fff] [ 157.137425] pci 0000:00:06.0: BAR 0: assigned [io 0x10060-0x1007f] [ 157.137976] virtio-pci 0000:00:06.0: enabling device (0000 -> 0003) [ 157.139726] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver Seems like the error msg only existed at the first hotplug device. (In reply to Thomas Huth from comment #8) > Zhengtong, could you please also check whether the problem also occurs if > you try to hot-plug a second device in the same way? ... for me, it seems > like it only happens for the first device that I try to hotplug. Tried with the second time hot plugged device, It works well, and shows up without reboot. So Is there anything different with the first time host plugged device? (In reply to Zhengtong from comment #10) > So Is there anything different with the first time host plugged device? Seems like the guest kernel fails to initialize the first hot-plugged device for some strange reasons. But it's good to know that you see the same "Error -2 initializing vqs" in the dmesg output as I see ... that's already a good first hint for debugging this problem. Thanks! I don't know if it is the same bug, but I add a NIC, remove it and add it again, I have: [ 24.770044] pci 0000:00:01.0: BAR 6: assigned [mem 0x100a0000000-0x100a003ffff pref] [ 24.770100] pci 0000:00:01.0: BAR 1: assigned [mem 0x100a0040000-0x100a0040fff] [ 24.770154] pci 0000:00:01.0: BAR 0: assigned [io 0x10000-0x1001f] [ 24.770326] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003) [ 24.770709] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver [ 24.771559] virtio_net: probe of virtio3 failed with error -2 [ 52.513670] list_add corruption. prev->next should be next (c000000001540278), but was (null). (prev=c000000037cc2fc8). [ 52.513785] ------------[ cut here ]------------ [ 52.513812] WARNING: at lib/list_debug.c:33 [ 52.513831] Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables virtio_balloon virtio_console xfs libcrc32c sd_mod crc_t10dif crct10dif_common virtio_net ibmvscsi scsi_transport_srp scsi_tgt virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [ 52.514445] CPU: 0 PID: 2364 Comm: drmgr Not tainted 3.10.0-294.el7.ppc64le #1 [ 52.514484] task: c0000000371e3ab0 ti: c000000005e64000 task.ti: c000000005e64000 [ 52.514522] NIP: c0000000004c3784 LR: c0000000004c3780 CTR: c0000000005a6540 [ 52.514560] REGS: c000000005e67890 TRAP: 0700 Not tainted (3.10.0-294.el7.ppc64le) [ 52.514598] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28000424 XER: 20000000 [ 52.514691] CFAR: c00000000094ccbc SOFTE: 1 GPR00: c0000000004c3780 c000000005e67b10 c000000001100bb0 0000000000000075 GPR04: c000000001608018 c000000001618c90 6338292e0d0a3030 3030333763633266 GPR08: c000000000ca0bb0 0000000000000000 0000000000000000 3030303030303063 GPR12: 0000000042000442 c00000000fb80000 000000000000000c 0000000000000000 GPR16: 00000100368e9930 0000000010015680 0000000010015458 00000000100153e8 GPR20: 00003ffffa47c5a8 00000100367da928 00003ffffa47c659 00000100368e9970 GPR24: c00000003fff5f40 c0000000373fc009 c000000001070880 c000000037cc8b80 GPR28: c000000001540000 c000000001540278 c000000037cc2fc8 c000000037cc9148 [ 52.515214] NIP [c0000000004c3784] __list_add+0xe4/0x110 [ 52.515241] LR [c0000000004c3780] __list_add+0xe0/0x110 [ 52.515266] Call Trace: [ 52.515281] [c000000005e67b10] [c0000000004c3780] __list_add+0xe0/0x110 (unreliable) [ 52.515328] [c000000005e67b90] [c00000000004d624] update_dn_pci_info+0x194/0x2a0 [ 52.515374] [c000000005e67bd0] [c0000000000951fc] pci_dn_reconfig_notifier+0x4c/0x80 [ 52.515426] [c000000005e67c10] [c000000000118da8] blocking_notifier_call_chain+0x98/0x100 [ 52.515482] [c000000005e67c60] [c0000000007771c4] of_attach_node+0x34/0x170 [ 52.515521] [c000000005e67cd0] [c0000000000949e4] ofdt_write+0x604/0x800 [ 52.515562] [c000000005e67d90] [c0000000003a04f4] proc_reg_write+0x84/0x120 [ 52.515602] [c000000005e67dd0] [c0000000002f7350] SyS_write+0x150/0x400 [ 52.515641] [c000000005e67e30] [c00000000000a17c] system_call+0x38/0xb4 [ 52.515679] Instruction dump: [ 52.515699] e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 3c62ffa5 7fa4eb78 [ 52.515765] 38636420 7fc6f378 484894c5 60000000 <0fe00000> 4bffff5c 3c62ffa5 7fa6eb78 [ 52.515832] ---[ end trace ac972e2b37881070 ]--- [ 52.516829] pci 0000:00:06.0: BAR 6: assigned [mem 0x100a0000000-0x100a003ffff pref] [ 52.516872] pci 0000:00:06.0: BAR 1: assigned [mem 0x100a0040000-0x100a0040fff] [ 52.516927] pci 0000:00:06.0: BAR 0: assigned [io 0x10000-0x1001f] [ 52.517098] virtio-pci 0000:00:06.0: enabling device (0000 -> 0003) [ 52.517571] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver [ 52.518463] virtio_net: probe of virtio3 failed with error -2 Laurent, that looks like a different bug to me - it's an oops rather than a simple error during initialization. Probably worth filing a new BZ for it. Classifying this as a host-side bug in qemu-kvm-rhev for the time being. There's a good chance this could be a guest side kernel bug, or even a guest-side rtas_errd bug. We can refile if that turns out to be the case. With the qemu monitor command "info qtree" we can have the PCI configuration from the qemu side:
The existing and good network interface is:
dev: virtio-net-pci, id "net0"
addr = 05.0
class Ethernet controller, addr 00:05.0, pci id 1af4:1000 (sub 1af4:0001)
bar 0: i/o at 0x60 [0x7f]
bar 1: mem at 0xc0002000 [0xc0002fff]
bar 6: mem at 0xffffffffffffffff [0x3fffe]
mac = "52:54:00:22:ab:39"
netdev = "hostnet0"
The hotplugged and not good interface is:
dev: virtio-net-pci, id "net1"
addr = 01.0
class Ethernet controller, addr 00:01.0, pci id 1af4:1000 (sub 1af4:0001)
bar 0: i/o at 0xffffffffffffffff [0x1e]
bar 1: mem at 0x80040000 [0x80040fff]
bar 6: mem at 0xffffffffffffffff [0x3fffe]
mac = "52:54:00:68:50:53"
netdev = "hostnet1"
[ 53.483278] pci 0000:00:01.0: BAR 1: assigned [mem 0x100a0040000-0x100a0040fff]
[ 53.483332] pci 0000:00:01.0: BAR 0: assigned [io 0x10000-0x1001f]
[ 53.483506] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
[ 53.483928] virtio-pci 0000:00:01.0: virtio_pci: leaving for legacy driver
[ 53.484823] virtio_net: probe of virtio3 failed with error -2
lspci Region 0: I/O ports at 0000 [size=32]
We can see the BARs don't seem to be initialized correctly.
If we add one more, it is OK:
[ 201.463788] pci 0000:00:06.0: BAR 6: assigned [mem 0x100a0080000-0x100a00bffff pref]
[ 201.463847] pci 0000:00:06.0: BAR 1: assigned [mem 0x100a0041000-0x100a0041fff]
[ 201.463901] pci 0000:00:06.0: BAR 0: assigned [io 0x10080-0x1009f]
[ 201.464069] virtio-pci 0000:00:06.0: enabling device (0000 -> 0003)
[ 201.464710] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
[ 201.563926] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
lspci Region 0: I/O ports at 0080 [size=32]
dev: virtio-net-pci, id "net2"
addr = 06.0
class Ethernet controller, addr 00:06.0, pci id 1af4:1000 (sub 1af4:0001)
bar 0: i/o at 0x80 [0x9f]
bar 1: mem at 0x80041000 [0x80041fff]
bar 6: mem at 0xffffffffffffffff [0x3fffe]
mac = "52:54:00:21:a9:df"
netdev = "hostnet2"
After Reboot (with just the "net0" and "net1")
both work
dev: virtio-net-pci, id "net0"
addr = 05.0
class Ethernet controller, addr 00:05.0, pci id 1af4:1000 (sub 1af4:0001)
bar 0: i/o at 0x80 [0x9f]
bar 1: mem at 0xc0081000 [0xc0081fff]
bar 6: mem at 0xffffffffffffffff [0x3fffe]
mac = "52:54:00:22:ab:39"
netdev = "hostnet0"
dev: virtio-net-pci, id "net1"
addr = 01.0
class Ethernet controller, addr 00:01.0, pci id 1af4:1000 (sub 1af4:0001)
bar 0: i/o at 0x20 [0x3f]
bar 1: mem at 0xc0001000 [0xc0001fff]
bar 6: mem at 0xffffffffffffffff [0x3fffe]
bus: virtio-bus
mac = "52:54:00:68:50:53"
netdev = "hostnet1"
I've checked upstream QEMU, and there is the same problem:
be0df8c Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging
QEMU 2.3.90 monitor - type 'help' for more information
(qemu) device_add virtio-net-pci,id=virtio-net-pci0,mac=52:54:00:12:34:56
[ 98.753067] pci 0000:00:00.0: BAR 6: assigned [mem 0x100a0000000-0x100a003ffff pref]
[ 98.753135] pci 0000:00:00.0: BAR 1: assigned [mem 0x100a0040000-0x100a0040fff]
[ 98.753194] pci 0000:00:00.0: BAR 0: assigned [io 0x10000-0x1001f]
[ 98.753361] virtio-pci 0000:00:00.0: enabling device (0000 -> 0003)
[ 98.753620] virtio-pci 0000:00:00.0: virtio_pci: leaving for legacy driver
[ 98.754373] virtio_net: probe of virtio1 failed with error -2
(qemu) info qtree
...
bar 0: i/o at 0xffffffffffffffff [0x1e]
bar 1: mem at 0x80040000 [0x80040fff]
...
bar 0: i/o at 0x20 [0x3f]
bar 1: mem at 0xc0000000 [0xc0000fff]
So, I previously said that non-hotplugged devices had BARs allocated by SLOF, and hotplugged devices had them allocated by qemu. I've had a closer look, and now I'm a bit confused.
* I think I see the code in SLOF that will do BAR allocation (obiously only for devices present at boot when SLOF runs)
* I'm pretty sure that the guest kernel *won't* do any BAR allocation (for the "pseries" platform, some other platforms will)
* BUT, I haven't managed to find any qemu code that will do BAR allocation for hotplugged devices or otherwise - it will populate the device tree with information about the BARs if they're allocated ("assigned-addresses" property) but I can't spot code that actually sets the BAR registers.
Which would explain this bug, except that it doesn't explain how the BARs are assigned for the second hotplugged device, nor how BAR1 is being allocated for the first hotplugged device.
So clearly I'm missing something.
------- Comment From fnovak.com 2015-07-17 13:19 EDT------- Do you have latest updates for * librtas * librtas-devel * powerpc-utils * ppc64-diag ? In reply to IBM Bug Proxy from comment #18, To be able to compare the correct behavior with our faulty one, I'd like to know which version of upstream QEMU works. I have compared our RHEL7 behavior with a up-to-date fedora and the result is the same: QEMU 2.4.0-rc1 fedora22 guest (4.0.4-301.fc22): librtas 3.13.1-fc22 ppc64-diag 2.6.7-2.fc22 powerpc-utils 1.2.24-1.fc22 SLOF git-7d766a3ac9b2474f -> SLOF-0.1.git20150313-1.fc22 (In reply to Laurent Vivier from comment #19) > In reply to IBM Bug Proxy from comment #18, > > To be able to compare the correct behavior with our faulty one, > I'd like to know which version of upstream QEMU works. > > I have compared our RHEL7 behavior with a up-to-date fedora and the result > is the same: . ... using a RHEL7.2 current nightly build ..? Please advise ... > > QEMU 2.4.0-rc1 > fedora22 guest (4.0.4-301.fc22): > librtas 3.13.1-fc22 . ok .. .. > ppc64-diag 2.6.7-2.fc22 . ... please update to the most current 2.6.9 level ... ... see RHBZ 1182027 - [7.2 FEAT] ppc64-diag package update - ppc64/ppc64le .. > powerpc-utils 1.2.24-1.fc22 . ... please update to the most current 1.2.26 level ... ... see RHBZ 1182040 - [7.2 FEAT] powerpc-utils package update - ppc64/ppc64le .. > SLOF git-7d766a3ac9b2474f > -> SLOF-0.1.git20150313-1.fc22 . ... could you please update your system to the above package level and retest this bugzilla again ..? Please advise ... same result with RHEL-7.2-20150708
[ 120.630890] pci 0000:00:04.0: BAR 6: assigned [mem 0x100a0000000-0x100a003ffff pref]
[ 120.630956] pci 0000:00:04.0: BAR 1: assigned [mem 0x100a0040000-0x100a0040fff]
[ 120.631012] pci 0000:00:04.0: BAR 0: assigned [io 0x10000-0x1001f]
[ 120.631188] virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
[ 120.631665] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[ 120.644250] virtio_net: probe of virtio2 failed with error -2
QEMU: qemu-kvm-rhev-2.3.0-10.el7 and upstream QEMU v2.4.0-rc1
GUEST:
kernel 3.10.0-290.el7.ppc64le
librtas 1.3.13-2.el7
ppc64-diag 2.6.9-1.el7
powerpc-utils 1.2.26-1.el7
SLOF git-7d766a3ac9b2474f
------- Comment From mdroth.com 2015-07-17 18:54 EDT------- I believe you're hitting the issue addressed by this patch: http://lists.nongnu.org/archive/html/qemu-devel/2014-12/msg03454.html Some additional discussion on the patch is available here: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01171.html The gist of it is that it's an acceptable fix for pseries, since pseries using a dedicated IO window that does not have any risk of overlapping a PCI address wherein offset 0 is reserved for some legacy function/port (we don't even have legacy ports on pseries). The patch hasn't been applied upstream however because the fix applies to all architectures, and there are concerns that in the case of, say, x86, where BAR to overlap legacy I/O space, guests may rely on 0 BARs being rejected by QEMU to function properly. Doing a full analysis of all the possibilities will require a good amount of time so for now we've been carrying this patch for pkvm. There are a number of ways to limit the behavior to pseries but I think from an upstream perspective we'll need to do the full analysis. Not sure how RedHat would prefer to address this. >BUT, I haven't managed to find any qemu code that will do BAR allocation for hotplugged devices or otherwise - it will populate the device tree with information about the BARs if they're allocated ("assigned-addresses" property) but I can't spot code that actually sets the BAR registers. This is correct, we don't currently do BAR assignment in QEMU, but instead rely on the guest assign them. It's actually the pseries kernel that's picking the 0 addr, and QEMU is telling it that's invalid. We do have the code to populate assigned-addresses and friend because the guest relies on it's presence, even when the actually values are ignored. We do plan on switching to QEMU-based BAR assignment and using rpaphp-based hotplug module in the guest, but there are a number of guest kernel fixes required to use it so that will be enabled later. ------- Comment From mdroth.com 2015-07-17 19:06 EDT------- (In reply to comment #9) > We do have the code to populate assigned-addresses and friend because the > guest relies on it's presence, even when the actually values are ignored. More accurately, the guest relies on the config space portion of the properties being populated, regardless of BAR assignments/handling. I confirm this patch fixes the problem. http://patchwork.ozlabs.org/patch/423796/ Ugh. If I'd realised the hotplug code was relying on the guest for BAR assignment, I would have been a lot less keen to apply Nikunj's patches to use the same logic for cold-plugged devices. This is basically a horrid hack, relying on the behaviour of a particular guest, which you only get away with because no-one cares about any guests other than Linux. Under PAPR, BAR assignment is supposed to be handled by the "firmware" which in our case means either qemu or SLOF. But, that's a problem that can't be fixed in time for RHEL 7.2. So we need to go with the BAR address 0 fix, but even that has complications upstream. So, what I think we need to do is this: 1) Make a version of that patch that affects only Power - just using an ugly ifdef - and apply it as downstream only for RHEL7.2 2) The fix allowing zero BARs seems correct in general, so pursue the fix upstream, working out how to properly activate/deactivate it 3) Implement proper BAR assignment in qemu to bring us back closer to PAPR. Laurent, can you handle (1) please. Michael, with my upstream hat on, I await your patches for (2) and (3). ------- Comment From mdroth.com 2015-07-20 03:51 EDT------- (In reply to comment #14) > Ugh. If I'd realised the hotplug code was relying on the guest for BAR > assignment, I would have been a lot less keen to apply Nikunj's patches to > use the same logic for cold-plugged devices. I need to double-check the code, but I think the topic was brought up and the plan was to still do the *actual* PCI device enumeration in SLOF and to still use SLOF's version of assigned-addresses. Nikunj did move some of the bridge enumeration bits over to QEMU though. When we add the QEMU-based bar assignment SLOF can either check for a new device-tree flag like it does now or examine QEMU's version of assigned-address to determine if we've already done the assignment. > This is basically a horrid hack, relying on the behaviour of a particular > guest, which you only get away with because no-one cares about any guests > other than Linux. Under PAPR, BAR assignment is supposed to be handled by > the "firmware" which in our case means either qemu or SLOF. > But, that's a problem that can't be fixed in time for RHEL 7.2. So we need > to go with the BAR address 0 fix, but even that has complications upstream. Agreed. We have patches to enable BAR assignment in QEMU, but it's actually guest kernel fixes for rpaphp that are necessitating the current/alternate approach. Once those issues are addressed the plan is to enable graceful switch over to QEMU-based BAR assignment using ibm,client-architecture-support flag set by guest. > So, what I think we need to do is this: > 1) Make a version of that patch that affects only Power - just using an ugly > ifdef - and apply it as downstream only for RHEL7.2 > > 2) The fix allowing zero BARs seems correct in general, so pursue the fix > upstream, working out how to properly activate/deactivate it > > 3) Implement proper BAR assignment in qemu to bring us back closer to PAPR. > > Laurent, can you handle (1) please. > > Michael, with my upstream hat on, I await your patches for (2) and (3). Absolutely, thanks. (In reply to David Gibson from comment #26) > 1) Make a version of that patch that affects only Power - just using an > ugly ifdef - and apply it as downstream only for RHEL7.2 It seems not possible to manage this with a ifdef: pci.c is in the common-obj part of qemu and is compiled once for all the targets, so we can't use "#ifdef CONFIG_PPC" inside. I'm trying to do this dynamically by adding a field "accept_addr_0" in PCIBus and check this value in pci_bar_address() to know it we can accept address 0 for BAR. "accept_addr_0" is set to false by default in pci_bus_init() and to true in spapr_phb_realize(). Do you think it is an acceptable approach ? Hrm. I suspect the final upstream fix will look something like that (maybe something in MachineClass rather than the PCI bus though). However as a downstream only fix it has the drawback that it needs (small) changes in multiple parts of the code, increasing the chance for conflicts. The other approach would be to use #ifdef __powerpc__. That's a ghastly hack: it's actually testing the type of the host, rather than the guest, which is wrong, but would work in our circumstances since we only support KVM, not TCG. And as a downstream only hack it's small and local. I'm not sure which is the best way to go here. So, I'm thinking we should ask someone like Paolo or Michael Tsirkin to make a taste judgement. Fix included in qemu-kvm-rhev-2.3.0-19.el7 Reproduced the bug on qemu-kvm-rhev-2.3.0-18.el7.ppc64le with the same steps as comment 0. Also tested virtio-net-pci, after hotplug, there's no eth* interface inside guest after "ifconfig -a". And after reboot, guest could get a new ip for the hot-plugged network device. ====================== Verified the bug on qemu-kvm-rhev-2.3.0-19.el7.ppc64le. CLI: /usr/libexec/qemu-kvm -name test_qzhang -machine pseries,accel=kvm,usb=off -m 4G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=RHEL-7.2-LE-0806.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0 -drive file=RHEL-7.2-20150806.1-Server-ppc64le-dvd1.iso,if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c,disable-legacy=off,disable-modern=on (1) virtio-serial-pci (qemu) device_add virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 (qemu) chardev-add socket,path=/tmp/hello,id=socket1,server,nowait (qemu) device_add virtserialport,id=port0,chardev=socket1,bus=virtio-serial0.0,name=com.redhat.vsdm1 Afer hotplug the device, it works well and no need to reboot guest. Transfer data between host and guest, succeeds. (2) virtio-net-pci (qemu) netdev_add tap,id=hostnet2 (qemu) device_add virtio-net-pci,id=net2,mac=00:54:5a:5f:5b:11,netdev=hostnet2,bus=pci.0,addr=0x6 After hotplug the device, guest could get the new interface and have ip address. No need to reboot. (3) virtio-blk-pci (qemu) __com.redhat_drive_add file=disk.qcow2,format=qcow2,id=disk1 (qemu) device_add virtio-blk-pci,id=disk1,drive=disk1,bus=pci.0,addr=0x8 After hotplug the device, and make filesystem for it. The block could be used at once and no need to reboot. Do some dd operation on the disk, works well. (4) virtio-scsi-pci (qemu) __com.redhat_drive_add file=test.qcow2,format=qcow2,id=disk2 (qemu) device_add virtio-scsi-pci,id=scsi0,vectors=0,bus=pci.0,addr=0x7 (qemu) device_add scsi-hd,ver=sluo,drive=drive-data-disk,bus=scsi0.0,id=data-disk After hotplug the device, and make filesystem for it. The block could be used at once and no need to reboot. Do some dd operation on the disk, works well. (5) virtio-balloon-pci (qemu) device_add virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 Memory balloon device works well (could enlarge and shrink guest memory) after hotplug the device, no need to reboot. (6) usb-echi (qemu) __com.redhat_drive_add file=usb.qcow2,format=qcow2,id=disk3 (qemu) device_add usb-ehci,id=ehci,bus=pci.0,addr=0x3 (qemu) device_add usb-storage,drive=disk3,id=usb3,bus=ehci.0,port=1 After hotplug the device, and make filesystem for it. The block could be used at once and no need to reboot. Do some dd operation on the disk, works well. Based on above, I will set the bug as VERIFIED. Did not test all of the supported pci devices, if there's some other thing need to be covered, please add the comment here. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2546.html |