Bug 1975441
| Summary: | lpfc: NULL pointer dereference from lpfc_scsi_unprep_dma_buf | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Robert Peterson <rpeterso> |
| Component: | kernel | Assignee: | Dick Kennedy (Broadcom ECD) <dkennedy> |
| kernel sub component: | Storage Drivers | QA Contact: | Storage QE <storage-qe> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | emilne, nstraz |
| Version: | 8.4 | Flags: | pm-rhel:
mirror+
|
| Target Milestone: | beta | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-15 12:50:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Robert Peterson
2021-06-23 16:44:37 UTC
Note that the crashes are roughly one week apart: [root@fs-i40c-15 /var/crash]# ls -l total 0 drwxr-xr-x. 2 root root 67 May 27 12:37 127.0.0.1-2021-05-27-12:37:11 drwxr-xr-x. 2 root root 67 Jun 8 16:00 127.0.0.1-2021-06-08-16:00:15 drwxr-xr-x. 2 root root 67 Jun 15 10:49 127.0.0.1-2021-06-15-10:49:22 drwxr-xr-x. 2 root root 67 Jun 23 12:08 127.0.0.1-2021-06-23-12:07:47 [ 14.404460] mlx5_core 0000:3b:00.1: firmware version: 16.27.6120 [ 14.410509] mlx5_core 0000:3b:00.1: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:3a:00.0 (capabl e of 252.048 Gb/s with 16.0 GT/s PCIe x16 link) [ 14.463254] lpfc 0000:18:00.0: 0:3176 Port Name 0 Physical Link is functional [ 14.662266] lpfc 0000:18:00.1: 1:2574 IO channels: hdwQ 40 IRQ 40 MRQ: 0 [ 14.682458] scsi host16: Emulex LPe32000 16Gb PCIe Fibre Channel Adapter on PCI bus 18 device 01 irq 775 PCI resettable [ 14.701239] mlx5_core 0000:3b:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 14.710199] mlx5_core 0000:3b:00.1: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384) [ 14.750578] mlx5_core 0000:3b:00.1: Port module event: module 1, Cable plugged [ 14.758086] mlx5_core 0000:3b:00.1: mlx5_pcie_event:296:(pid 1268): PCIe slot advertised sufficient power (75W). [ 14.775491] mlx5_core 0000:3b:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 14.950356] lpfc 0000:18:00.1: 1:6448 Dual Dump is enabled [ 14.978969] mlx5_core 0000:3b:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295 [ 14.999010] mlx5_core 0000:3b:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 15.207090] mlx5_core 0000:3b:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295 [ 15.225678] mlx5_core 0000:3b:00.1 ens1f1: renamed from eth1 [ 15.237503] mlx5_core 0000:3b:00.0 ens1f0: renamed from eth0 Are the vm's doing pci-pass through for the fc-ports? Is this kernel something that redhat ships to customers or is it only in the lab? 4.18.0-311.el8.kpq1.x86_64 Did it dump? if it did cab you attach the vmcore-dmesg.txt to the bz? Id it did kdump then were you saving the console log output? if so attach it to the bz. My kvm guests were simply using the host lpfc devices as SCSI devices. I updated my host to a newer kernel and have not seen the problem since. My guests are now typically using the same devices, but as virtio, not SCSI. I also had eng-ops swap the fibre cables to the EMC storage array. I have noticed that the performance of lpfc seems to similarly glitch and pause for long periods of time, but it no longer times out and gives me lpfc errors from the kernel. Perhaps the lpfc timeouts were increased or error paths improved? I no longer have any vmcore dmesg files on that system. I'll keep using the same hardware and watch for the problem, but I'm not sure we can do anything more on this problem. |