Bug 1972699 - [OSP 17.0] Volumes and vNICs are being hot plugged into SEV based instances without iommu='on' causing failures to attach and later detach within the guest OS
Summary: [OSP 17.0] Volumes and vNICs are being hot plugged into SEV based instances w...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: beta
: 17.0
Assignee: Lee Yarwood
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1967293
TreeView+ depends on / blocked
 
Reported: 2021-06-16 12:38 UTC by Lee Yarwood
Modified: 2023-03-21 19:44 UTC (History)
7 users (show)

Fixed In Version: openstack-nova-23.0.3-0.20210908140341.e39bbdc.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:15:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-5235 0 None None None 2021-11-15 13:09:59 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:16:42 UTC

Description Lee Yarwood 2021-06-16 12:38:19 UTC
Description of problem:

This was found while testing RHEL 8.4 based SEV enabled instances for OSP 16.2 under the following RFE:

[RFE][Test Only] AMD SEV-encrypted instances
https://bugzilla.redhat.com/show_bug.cgi?id=1833442

After successfully attaching a disk to a RHEL 8.4 based SEV enabled instance the request to detach the disk never completes with the following trace eventually logged:

[    7.773877] pcieport 0000:00:02.5: Slot(0-5): Attention button pressed
[    7.774743] pcieport 0000:00:02.5: Slot(0-5) Powering on due to button press
[    7.775714] pcieport 0000:00:02.5: Slot(0-5): Card present
[    7.776403] pcieport 0000:00:02.5: Slot(0-5): Link Up
[    7.903183] pci 0000:06:00.0: [1af4:1042] type 00 class 0x010000
[    7.904095] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x00000fff]
[    7.905024] pci 0000:06:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[    7.906977] pcieport 0000:00:02.5: bridge window [io  0x1000-0x0fff] to [bus 06] add_size 1000
[    7.908069] pcieport 0000:00:02.5: BAR 13: no space for [io  size 0x1000]
[    7.908917] pcieport 0000:00:02.5: BAR 13: failed to assign [io  size 0x1000]
[    7.909832] pcieport 0000:00:02.5: BAR 13: no space for [io  size 0x1000]
[    7.910667] pcieport 0000:00:02.5: BAR 13: failed to assign [io  size 0x1000]
[    7.911586] pci 0000:06:00.0: BAR 4: assigned [mem 0x800600000-0x800603fff 64bit pref]
[    7.912616] pci 0000:06:00.0: BAR 1: assigned [mem 0x80400000-0x80400fff]
[    7.913472] pcieport 0000:00:02.5: PCI bridge to [bus 06]
[    7.915762] pcieport 0000:00:02.5:   bridge window [mem 0x80400000-0x805fffff]
[    7.917525] pcieport 0000:00:02.5:   bridge window [mem 0x800600000-0x8007fffff 64bit pref]
[    7.920252] virtio-pci 0000:06:00.0: enabling device (0000 -> 0002)
[    7.924487] virtio_blk virtio4: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB)
[    7.926616] vdb: detected capacity change from 0 to 1073741824
[ .. ]
[  246.751028] INFO: task irq/29-pciehp:173 blocked for more than 120 seconds.
[  246.752801]       Not tainted 4.18.0-305.el8.x86_64 #1
[  246.753902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.755457] irq/29-pciehp   D    0   173      2 0x80004000
[  246.756616] Call Trace:
[  246.757328]  __schedule+0x2c4/0x700
[  246.758185]  schedule+0x38/0xa0
[  246.758966]  io_schedule+0x12/0x40
[  246.759801]  do_read_cache_page+0x513/0x770
[  246.760761]  ? blkdev_writepages+0x10/0x10
[  246.761692]  ? file_fdatawait_range+0x20/0x20
[  246.762659]  read_part_sector+0x38/0xda
[  246.763554]  read_lba+0x10f/0x220
[  246.764367]  efi_partition+0x1e4/0x6de
[  246.765245]  ? snprintf+0x49/0x60
[  246.766046]  ? is_gpt_valid.part.5+0x430/0x430
[  246.766991]  blk_add_partitions+0x164/0x3f0
[  246.767915]  ? blk_drop_partitions+0x91/0xc0
[  246.768863]  bdev_disk_changed+0x65/0xd0
[  246.769748]  __blkdev_get+0x3c4/0x510
[  246.770595]  blkdev_get+0xaf/0x180
[  246.771394]  __device_add_disk+0x3de/0x4b0
[  246.772302]  virtblk_probe+0x4ba/0x8a0 [virtio_blk]
[  246.773313]  virtio_dev_probe+0x158/0x1f0
[  246.774208]  really_probe+0x255/0x4a0
[  246.775046]  ? __driver_attach_async_helper+0x90/0x90
[  246.776091]  driver_probe_device+0x49/0xc0
[  246.776965]  bus_for_each_drv+0x79/0xc0
[  246.777813]  __device_attach+0xdc/0x160
[  246.778669]  bus_probe_device+0x9d/0xb0
[  246.779523]  device_add+0x418/0x780
[  246.780321]  register_virtio_device+0x9e/0xe0
[  246.781254]  virtio_pci_probe+0xb3/0x140
[  246.782124]  local_pci_probe+0x41/0x90
[  246.782937]  pci_device_probe+0x105/0x1c0
[  246.783807]  really_probe+0x255/0x4a0
[  246.784623]  ? __driver_attach_async_helper+0x90/0x90
[  246.785647]  driver_probe_device+0x49/0xc0
[  246.786526]  bus_for_each_drv+0x79/0xc0
[  246.787364]  __device_attach+0xdc/0x160
[  246.788205]  pci_bus_add_device+0x4a/0x90
[  246.789063]  pci_bus_add_devices+0x2c/0x70
[  246.789916]  pciehp_configure_device+0x91/0x130
[  246.790855]  pciehp_handle_presence_or_link_change+0x334/0x460
[  246.791985]  pciehp_ist+0x1a2/0x1b0
[  246.792768]  ? irq_finalize_oneshot.part.47+0xf0/0xf0
[  246.793768]  irq_thread_fn+0x1f/0x50
[  246.794550]  irq_thread+0xe7/0x170
[  246.795299]  ? irq_forced_thread_fn+0x70/0x70
[  246.796190]  ? irq_thread_check_affinity+0xe0/0xe0
[  246.797147]  kthread+0x116/0x130
[  246.797841]  ? kthread_flush_work_fn+0x10/0x10
[  246.798735]  ret_from_fork+0x22/0x40
[  246.799523] INFO: task sfdisk:1129 blocked for more than 120 seconds.
[  246.800717]       Not tainted 4.18.0-305.el8.x86_64 #1
[  246.801733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.803155] sfdisk          D    0  1129   1107 0x00004080
[  246.804225] Call Trace:
[  246.804827]  __schedule+0x2c4/0x700
[  246.805590]  ? submit_bio+0x3c/0x160
[  246.806373]  schedule+0x38/0xa0
[  246.807089]  schedule_preempt_disabled+0xa/0x10
[  246.807990]  __mutex_lock.isra.6+0x2d0/0x4a0
[  246.808876]  ? wake_up_q+0x80/0x80
[  246.809636]  ? fdatawait_one_bdev+0x20/0x20
[  246.810508]  iterate_bdevs+0x98/0x142
[  246.811304]  ksys_sync+0x6e/0xb0
[  246.812041]  __ia32_sys_sync+0xa/0x10
[  246.812820]  do_syscall_64+0x5b/0x1a0
[  246.813613]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  246.814652] RIP: 0033:0x7fa9c04924fb
[  246.815431] Code: Unable to access opcode bytes at RIP 0x7fa9c04924d1.
[  246.816655] RSP: 002b:00007fff47661478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[  246.818047] RAX: ffffffffffffffda RBX: 000055d79fc512f0 RCX: 00007fa9c04924fb
[  246.824526] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055d79fc512f0
[  246.825714] RBP: 0000000000000000 R08: 000055d79fc51012 R09: 0000000000000006
[  246.826941] R10: 000000000000000a R11: 0000000000000246 R12: 00007fa9c075e6e0
[  246.828169] R13: 000055d79fc58c80 R14: 0000000000000001 R15: 00007fff47661590

This has also been reproduced with PCIe based NICs in the same environment. The full QEMU log including launch command line is provided below.

Version-Release number of selected component (if applicable):

qemu-kvm-block-rbd-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-block-curl-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-common-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-block-ssh-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-ui-opengl-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-block-gluster-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-img-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
ipxe-roms-qemu-20181214-8.git133f4c47.el8.noarch
qemu-kvm-core-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-block-iscsi-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
qemu-kvm-ui-spice-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64
libvirt-daemon-driver-qemu-7.0.0-14.module+el8.4.0+10886+79296686.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. Hot plug a PCIe device into a RHEL 8.4 based SEV enabled instance.
2. Attempt to hot unplug said device.

Actual results:
The request to hot unplug fails and the guest OS eventually logs the above trace.

Expected results:
The request to hot unplug succeeds.

Additional info:

2021-06-02 18:58:48.515+0000: starting up libvirt version: 7.0.0, package: 14.module+el8.4.0+10886+79296686 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2021-05-06-06:29:31, ), qemu version: 5.2.0qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207, kernel: 4.18.0-305.el8.x86_64, hostname: computeamdsev-0.localdomain
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
HOME=/var/lib/libvirt/qemu/domain-182-instance-0000010e \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-182-instance-0000010e/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-182-instance-0000010e/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-182-instance-0000010e/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=instance-0000010e,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-182-instance-0000010e/master-key.aes \
-blockdev '{"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/instance-0000010e_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off,memory-encryption=sev0,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,memory-backend=pc.ram \
-cpu EPYC-Rome,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,spec-ctrl=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,cmp-legacy=on,ibrs=on,amd-ssbd=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,svm=off,npt=off,nrip-save=off \
-m 2048 \
-object memory-backend-ram,id=pc.ram,size=2147483648 \
-overcommit mem-lock=on \
-smp 2,sockets=2,dies=1,cores=1,threads=1 \
-uuid db27e653-2c69-453e-84f1-6d6189fc61ae \
-smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.6.1-2.20210510134812.10df176.el8ost.2,serial=db27e653-2c69-453e-84f1-6d6189fc61ae,uuid=db27e653-2c69-453e-84f1-6d6189fc61ae,family=Virtual Machine' \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=34,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \
-device pcie-root-port,port=0x18,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 \
-device pcie-root-port,port=0x19,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 \
-device pcie-root-port,port=0x1a,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 \
-device pcie-root-port,port=0x1b,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 \
-device pcie-root-port,port=0x1c,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 \
-device pcie-root-port,port=0x1d,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 \
-device pcie-root-port,port=0x1e,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 \
-device pcie-root-port,port=0x1f,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7 \
-device pcie-root-port,port=0x20,chassis=17,id=pci.17,bus=pcie.0,multifunction=on,addr=0x4 \
-device pcie-root-port,port=0x21,chassis=18,id=pci.18,bus=pcie.0,addr=0x4.0x1 \
-device pcie-root-port,port=0x22,chassis=19,id=pci.19,bus=pcie.0,addr=0x4.0x2 \
-device pcie-root-port,port=0x23,chassis=20,id=pci.20,bus=pcie.0,addr=0x4.0x3 \
-device pcie-root-port,port=0x24,chassis=21,id=pci.21,bus=pcie.0,addr=0x4.0x4 \
-device pcie-root-port,port=0x25,chassis=22,id=pci.22,bus=pcie.0,addr=0x4.0x5 \
-device pcie-root-port,port=0x26,chassis=23,id=pci.23,bus=pcie.0,addr=0x4.0x6 \
-device pcie-root-port,port=0x27,chassis=24,id=pci.24,bus=pcie.0,addr=0x4.0x7 \
-device qemu-xhci,id=usb,bus=pci.2,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b0d11633025a596af2b8dcaf94a639e4f71a0cf","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/db27e653-2c69-453e-84f1-6d6189fc61ae/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \
-device virtio-blk-pci,iommu_platform=on,bus=pci.3,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=197,id=hostnet0,vhost=on,vhostfd=198 \
-device virtio-net-pci,rx_queue_size=512,host_mtu=1450,netdev=hostnet0,id=net0,mac=fa:16:3e:e8:31:85,bus=pci.1,addr=0x0,iommu_platform=on \
-add-fd set=3,fd=200 \
-chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 172.16.2.147:1 \
-device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.4,addr=0x0,iommu_platform=on \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,iommu_platform=on,bus=pci.5,addr=0x0 \
-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x33 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/4 (label charserial0)
2021-06-02T18:58:49.178263Z qemu-kvm: -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
2021-06-02T18:59:02.271928Z qemu-kvm: Guest says index 352 is available

Comment 7 errata-xmlrpc 2022-09-21 12:15:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.