RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2156536 - [RHEL9.2] nvme_user_space_driver Bus error (core dumped) when install guest
Summary: [RHEL9.2] nvme_user_space_driver Bus error (core dumped) when install guest
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.2
Hardware: aarch64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Eric Auger
QA Contact: Zhenyu Zhang
URL:
Whiteboard:
Depends On:
Blocks: 2158686
TreeView+ depends on / blocked
 
Reported: 2022-12-27 11:34 UTC by Zhenyu Zhang
Modified: 2023-03-14 18:07 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2158686 (view as bug list)
Environment:
Last Closed: 2023-02-09 08:37:27 UTC
Type: ---
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-143241 0 None None None 2022-12-27 11:44:48 UTC

Description Zhenyu Zhang 2022-12-27 11:34:37 UTC
Description of problem:
NVMe Userspace Driver install a guest on NVMe device failed
Hit Bus error (core dumped) 

Version-Release number of selected component (if applicable):
Test Environment:
Host Distro: RHEL-9.2.0-20221226.0
Host Kernel: 5.14.0-226.el9.aarch64
Qemu-kvm: qemu-kvm-7.2.0-2.el9.aarch64
Guest Kernel: 5.14.0-226.el9.aarch64
edk2:edk2-aarch64-20221207gitfff6d81270b5-1.el9.noarch

How reproducible:
100%

Steps to Reproduce:
1. Setup host nvme
[root@ampere-hr350a-07 ~]# modprobe vfio-pci
[root@ampere-hr350a-07 ~]# modinfo vfio-pci
filename:       /lib/modules/5.14.0-226.el9.aarch64/kernel/drivers/vfio/pci/vfio-pci.ko.xz
description:    VFIO PCI - User Level meta-driver
author:         Alex Williamson <alex.williamson>
license:        GPL v2
rhelversion:    9.2
srcversion:     A326814B1A0356BEFB22207
alias:          vfio_pci:v*d*sv*sd*bc*sc*i*
depends:        vfio-pci-core
intree:         Y
name:           vfio_pci
vermagic:       5.14.0-226.el9.aarch64 SMP preempt mod_unload modversions aarch64
sig_id:         PKCS#7
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        0D:4E:DB:4A:00:71:20:EC:E4:2A:9C:E4:91:88:40:D5:1A:DB:FC:A3
sig_hashalgo:   sha256
signature:      9B:4F:5A:25:FC:7F:37:2E:C0:20:19:1B:A5:E7:06:DF:D2:3C:89:B4:
		04:73:44:5C:F4:6B:0B:F4:A7:74:3B:E8:24:E3:AB:E2:0A:C1:34:B4:
		E1:25:F9:64:51:94:74:AA:EE:64:50:B2:3B:09:FF:4D:DA:1C:9D:AF:
		58:2B:84:96:03:05:8F:91:F8:94:9F:79:4B:09:A9:E5:79:5E:B2:08:
		B9:AF:BF:F9:61:C2:F9:EC:A3:38:F3:9E:30:BE:96:04:40:1B:08:8C:
		0C:AF:22:1A:EE:FF:9A:AC:5E:90:E4:AC:97:37:06:DB:E2:A2:2F:38:
		32:2E:09:0A:3A:DE:82:86:AE:10:2D:63:35:95:9B:E4:3C:FA:DB:C7:
		4E:E8:9D:96:D0:47:25:2F:8C:5A:78:05:6F:77:3D:84:7D:D0:02:7D:
		18:6F:84:1B:95:3B:2E:AE:FC:B7:FF:55:6E:12:11:AC:5C:3E:D3:B6:
		D9:B1:E2:CB:C2:02:37:AA:B8:F2:47:4F:83:11:13:94:5B:6D:5D:F5:
		88:7B:96:34:41:F1:07:B3:FC:F1:E7:FF:52:A0:6B:85:19:CA:DE:E5:
		0C:EE:77:F6:BB:B7:6E:70:11:A6:AF:A8:EE:64:6B:B6:A1:61:1A:BA:
		4D:D1:D4:DC:F8:98:B1:88:7B:B5:8B:5D:A2:17:99:B6:D1:CC:7A:49:
		19:71:F7:7F:2C:F7:1D:9C:6B:65:F7:37:87:D0:31:0A:2A:52:26:50:
		59:31:8C:13:9A:9F:E0:75:0F:BA:45:54:AD:AE:65:D8:04:55:88:FA:
		19:0D:94:E2:6F:94:65:8A:57:AB:97:D7:E9:5A:AE:B4:64:37:17:3D:
		C7:2F:FC:E4:B9:CE:FF:B4:E0:47:99:51:53:22:AC:82:5C:80:BB:6B:
		D3:D0:41:B5:37:F9:3D:D6:57:AC:95:A6:BD:34:71:FD:C1:8D:EB:07:
		87:8E:AC:67:9B:69:57:EF:94:4F:81:7C:93:F0:DC:82:31:D1:79:43:
		BD:FA:21:6B
parm:           ids:Initial PCI IDs to add to the vfio driver, format is "vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]" and multiple comma separated entries can be specified (string)
parm:           nointxmask:Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci.org so the device can be fixed automatically via the broken_intx_masking flag. (bool)
parm:           disable_idle_d3:Disable using the PCI D3 low power state for idle, unused devices (bool)
parm:           enable_sriov:Enable support for SR-IOV configuration.  Enabling SR-IOV on a PF typically requires support of the userspace PF driver, enabling VFs without such support may result in non-functional VFs or PF. (bool)
parm:           disable_denylist:Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users. (bool)

# lspci | grep -i 'Non-Volatile'
0004:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
0005:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]

# lspci -n -s 0005:01:00.0
0005:01:00.0 0108: 8086:0a54

# echo 0005:01:00.0 > /sys/bus/pci/devices/0005:01:00.0/driver/unbind
# echo 8086 0a54 > /sys/bus/pci/drivers/vfio-pci/new_id

2. create a image on nvme 
[root@ampere-hr350a-07 ~]# /usr/bin/qemu-img create -f qcow2 nvme://0005:01:00.0/1 20G
Formatting 'nvme://0005:01:00.0/1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=21474836480 lazy_refcounts=off refcount_bits=16
[root@ampere-hr350a-07 ~]# qemu-img info -U nvme://0005:01:00.0/1
image: nvme://0005:01:00.0/1
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: unavailable
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

3. install a guest on image
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev node-name=file_aavmf_code,driver=file,filename=/usr/share/edk2/aarch64/QEMU_EFI-silent-pflash.raw,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_code,driver=raw,read-only=on,file=file_aavmf_code \
-blockdev node-name=file_aavmf_vars,driver=file,filename=/root/avocado/data/avocado-vt/avocado-vt-vm1_rhel920-aarch64_nvme_direct_raw_VARS.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_aavmf_vars,driver=raw,read-only=off,file=file_aavmf_vars \
-machine virt,gic-version=host,memory-backend=mem-machine_mem,pflash0=drive_aavmf_code,pflash1=drive_aavmf_vars \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device virtio-gpu-pci,bus=pcie-root-port-1,addr=0x0 \
-m 253952 \
-object '{"qom-type": "memory-backend-ram", "size": 266287972352, "id": "mem-machine_mem"}'  \
-smp 16,maxcpus=16,cores=8,threads=1,clusters=1,sockets=2  \
-cpu 'host' \
-chardev socket,wait=off,server=on,path=/var/tmp/monitor-qmpmonitor1,id=qmp_id_qmpmonitor1  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,wait=off,server=on,path=/var/tmp/monitor-catch_monitor,id=qmp_id_catch_monitor  \
-mon chardev=qmp_id_catch_monitor,mode=control  \
-serial unix:'/var/tmp/serial-serial0',server=on,wait=off \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-2,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev '{"node-name": "nvme_image1", "driver": "nvme", "auto-read-only": true, "discard": "unmap", "device": "0005:01:00.0", "namespace": 1, "cache": {"direct": false, "no-flush": false}}' \
-blockdev '{"node-name": "drive_image1", "driver": "raw", "read-only": false, "cache": {"direct": false, "no-flush": false}, "file": "nvme_image1"}' \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device '{"driver": "virtio-blk-pci", "id": "image1", "drive": "drive_image1", "bootindex": 1, "write-cache": "off", "bus": "pcie-root-port-3", "addr": "0x0"}' \
-device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
-device virtio-net-pci,mac=9a:91:18:a0:16:a9,rombar=0,id=idsTPBvn,netdev=idsuVgf9,bus=pcie-root-port-4,addr=0x0  \
-netdev tap,id=idsuVgf9,vhost=on \
-device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
-device '{"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pcie-root-port-5", "addr": "0x0"}' \
-blockdev '{"node-name": "file_cd1", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/iso/linux/RHEL-9.2.0-20221227.0-aarch64-dvd1.iso", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '{"node-name": "drive_cd1", "driver": "raw", "read-only": true, "cache": {"direct": true, "no-flush": false}, "file": "file_cd1"}' \
-device '{"driver": "scsi-cd", "id": "cd1", "drive": "drive_cd1", "bootindex": 2, "write-cache": "on"}' \
-blockdev '{"node-name": "file_unattended", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/rhel920-aarch64/ks.iso", "cache": {"direct": true, "no-flush": false}}' \
-blockdev '{"node-name": "drive_unattended", "driver": "raw", "read-only": true, "cache": {"direct": true, "no-flush": false}, "file": "file_unattended"}' \
-device '{"driver": "scsi-cd", "id": "unattended", "drive": "drive_unattended", "bootindex": 3, "write-cache": "on"}'  \
-kernel '/home/kvm_autotest_root/images/rhel920-aarch64/vmlinuz'  \
-append 'inst.sshd inst.repo=cdrom inst.ks=cdrom:/ks.cfg net.ifnames=0 console=ttyAMA0,38400'  \
-initrd '/home/kvm_autotest_root/images/rhel920-aarch64/initrd.img'  \
-vnc :20  \
-rtc base=utc,clock=host,driftfix=slew  \
-no-shutdown \
-enable-kvm \
-monitor stdio 

Actual results:
guest hang
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel: INFO: task dd:2693 blocked for more than 122 seconds.
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:       Not tainted 5.14.0-226.el9.aarch64 #1
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel: task:dd              state:D stack:    0 pid: 2693 ppid:  2686 flags:0x00000204
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel: Call trace:
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  __switch_to+0xc8/0x110
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  __schedule+0x220/0x520
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  schedule+0x60/0xe0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  io_schedule+0x40/0x60
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  bit_wait_io+0x1c/0x6c
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  __wait_on_bit+0x9c/0x190
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  out_of_line_wait_on_bit+0x84/0xb0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  __block_write_begin_int+0x218/0x4cc
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  block_write_begin+0x54/0xa0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  blkdev_write_begin+0x30/0x40
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  generic_perform_write+0xac/0x1e0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  __generic_file_write_iter+0xe4/0x1a0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  blkdev_write_iter+0xfc/0x180
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  new_sync_write+0xd4/0x160
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  vfs_write+0x1a0/0x1ec
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  ksys_write+0x6c/0xf4
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  __arm64_sys_write+0x20/0x2c
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  invoke_syscall.constprop.0+0x7c/0xd0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  el0_svc_common.constprop.0+0x154/0x160
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  do_el0_svc+0x2c/0xb0
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  el0_svc+0x38/0x190
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  el0t_64_sync_handler+0x9c/0x120
2022-12-27 02:07:47: 192.168.122.209Dec 27 06:10:45 localhost.localdomain kernel:  el0t_64_sync+0x174/0x178

hit Bus error (core dumped)
[root@ampere-hr350a-07 home]# echo "/home/core" > /proc/sys/kernel/core_pattern
[root@ampere-hr350a-07 home]# ulimit -c unlimited
[root@ampere-hr350a-07 home]# qemu-img info nvme://0005:01:00.0/1
Bus error (core dumped)

Comment 18 Stefan Hajnoczi 2023-01-10 18:49:33 UTC
(Just out of interest, this is not high priority.)

SIGBUS from this line:

  cap = le64_to_cpu(regs->cap);

This 64-bit field is 4KB aligned:

  regs = 0xffff87629000

  typedef struct QEMU_PACKED NvmeBar {
      uint64_t    cap;
      ...

The mapping function call looks okay to me as well:

    regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
                                 PROT_READ | PROT_WRITE, errp);

I'm not sure why there is a SIGBUS.

Eric: Any ideas how the VFIO BAR mapping could raise a SIGBUS on a memory load?

Comment 20 Eric Auger 2023-01-16 07:46:57 UTC
(In reply to Stefan Hajnoczi from comment #18)
> (Just out of interest, this is not high priority.)
> 
> SIGBUS from this line:
> 
>   cap = le64_to_cpu(regs->cap);
> 
> This 64-bit field is 4KB aligned:
> 
>   regs = 0xffff87629000
> 
>   typedef struct QEMU_PACKED NvmeBar {
>       uint64_t    cap;
>       ...
> 
> The mapping function call looks okay to me as well:
> 
>     regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
>                                  PROT_READ | PROT_WRITE, errp);
> 
> I'm not sure why there is a SIGBUS.
> 
> Eric: Any ideas how the VFIO BAR mapping could raise a SIGBUS on a memory
> load?

Stefan, one case that comes to my mind is if the memory access is disabled on the PCI device. We got the case in the past on arm but I fail to remember the exact circumstances. 

https://lists.archive.carbon60.com/linux/kernel/3714001

Comment 26 Eric Auger 2023-02-09 08:12:34 UTC
Hi Zhenyu,
sure, please close it. The best would be to run this kind of test in our VFIO test plan.
Thanks

Eric

Comment 27 Zhenyu Zhang 2023-02-09 08:37:27 UTC
Ok, we will keep this test scenario in the aarch64 platform.

Close this bug as a current release according to comments 24-26.


Note You need to log in before you can comment on or make changes to this bug.