This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2210141 - [RHEL-9.3] BUG: kernel NULL pointer dereference, address: 0000000000000004
Summary: [RHEL-9.3] BUG: kernel NULL pointer dereference, address: 0000000000000004
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: openmpi
Version: 9.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kamal Heib
QA Contact: Brian Chae
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-25 20:57 UTC by Brian Chae
Modified: 2023-09-21 13:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-21 13:40:04 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-6076 0 None Migrated None 2023-09-21 13:40:00 UTC
Red Hat Issue Tracker RHELPLAN-158233 0 None None None 2023-05-25 20:58:41 UTC

Description Brian Chae 2023-05-25 20:57:29 UTC
Description of problem:

During RDMA OPENMPI testing, the RDMA server side console reported the following traceback and crashed.

[ 5619.703370] BUG: kernel NULL pointer dereference, address: 0000000000000004 
[ 5619.704044] #PF: supervisor read access in kernel mode 
[ 5619.704649] #PF: error_code(0x0000) - not-present page 
[ 5619.705273] PGD 0 P4D 0  
[ 5619.705758] Oops: 0000 [#1] PREEMPT SMP PTI 
[ 5619.706331] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G          I       -------  ---  5.14.0-316.el9.x86_64 #1 
[ 5619.707327] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013 
[ 5619.708129] RIP: 0010:fwevtq_handler+0x3d/0x140 [cxgb4] 
[ 5619.708889] Code: 3c a5 0f 85 9f 00 00 00 48 8b 8a e8 01 00 00 41 8b 00 0f c8 25 ff ff 01 00 2b 81 18 84 00 00 48 8b 89 28 84 00 00 48 8b 34 c1 <8b> 46 04 48 83 46 20 01 85 c0 74 1a 48 8d be 90 00 00 00 f0 48 0f 
[ 5619.711309] RSP: 0018:ffffb39f0038cd58 EFLAGS: 00010202 
[ 5619.712167] RAX: 00000000000003c2 RBX: ffff9ae1d6307ec0 RCX: ffff9ae1c1530000 
[ 5619.713548] RDX: ffff9ae1d6307ec0 RSI: 0000000000000000 RDI: ffff9ae1d6307ec0 
[ 5619.714939] RBP: 0000000000000040 R08: ffff9ae1d6a100c8 R09: a096aa9ce0080af0 
[ 5619.716375] R10: 0000000000000008 R11: ffffffff9aa060c0 R12: 0000000000000000 
[ 5619.717817] R13: ffff9ae1d6300000 R14: ffff9ae1d6a100f0 R15: ffff9ae1d6307ec0 
[ 5619.719298] FS:  0000000000000000(0000) GS:ffff9ae8aed00000(0000) knlGS:0000000000000000 
[ 5619.720422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 5619.721848] CR2: 0000000000000004 CR3: 00000001484d8006 CR4: 00000000001706e0 
[ 5619.723406] Call Trace: 
[ 5619.724285]  <IRQ> 
[ 5619.725506]  process_responses+0x3bd/0x4c0 [cxgb4] 
[ 5619.726933]  ? ip_list_rcv+0x135/0x160 
[ 5619.727921]  ? __netif_receive_skb_list_core+0x29f/0x2c0 
[ 5619.729009]  ? netif_receive_skb_list_internal+0x1e4/0x300 
[ 5619.730085]  napi_rx_handler+0x13/0x100 [cxgb4] 
[ 5619.731572]  __napi_poll+0x2a/0x170 
[ 5619.732932]  net_rx_action+0x233/0x2f0 
[ 5619.733986]  __do_softirq+0xca/0x2ac 
[ 5619.735036]  __irq_exit_rcu+0xb9/0xf0 
[ 5619.736078]  common_interrupt+0x80/0xa0 
[ 5619.737112]  </IRQ> 
[ 5619.738389]  <TASK> 
[ 5619.767379] rupt+0x22/0x40 
[ 5619.839853] RIP: 0010:cpuidle_enter_state+0xd2/0x400 
[ 5619.841030] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 69 56 8b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 11 03 00 00 31 ff e8 22 90 91 ff fb 45 85 f6 <0f> 88 15 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d 04 82 49 
[ 5619.844253] RSP: 0018:ffffb39f000dbe80 EFLAGS: 00000202 
[ 5619.845495] RAX: ffff9ae8aed30bc0 RBX: 0000000000000001 RCX: 000000000000001f 
[ 5619.847253] RDX: 0000000000000000 RSI: 0000000025bb8b00 RDI: 0000000000000000 
[ 5619.849049] RBP: ffff9ae1c1431800 R08: 0000051c706c31af R09: 0000000000000018 
[ 5619.850837] R10: 0000000000000082 R11: 00000000000000c6 R12: ffffffff9aec34c0 
[ 5619.852659] R13: 0000051c706c31af R14: 0000000000000001 R15: 0000000000000000 
[ 5619.854552]  cpuidle_enter+0x29/0x40 
[ 5619.855843]  cpuidle_idle_call+0xfa/0x160 
[ 5619.857154]  do_idle+0x78/0xe0 
[ 5619.858778]  cpu_startup_entry+0x19/0x20 
[ 5619.860106]  start_secondary+0x10d/0x130 
[ 5619.861421]  secondary_startup_64_no_verify+0xe5/0xeb 
[ 5619.862768]  </TASK> 
[ 5619.864229] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi 8021q garp mrp stp llc rfkill sunrpc ext4 mbcache mi_msghandler gpio_ich lpc_ich acpi_power_meter ie31200_edac drm_shmem_helper drm_kms_helper rapl syscopyarea intel_cstate sysfillrect sysimgblt intel_uncore fb_sys_fops drm fuse xfs libcrc32c csiostor sd_mod t10_pi sg ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata cxgb4 ghash_clmulni_intel tg3 serio_raw hpwdt tls scsi_transport_fc 
[ 5620.375027] CR2: 0000000000000004 
[    0.000000] Linux version 5.14.0-316.el9.x86_64 (mockbuild.eng.bos.redhat.com) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-39.el9) #1 SMP PREEMPT_DYNAMIC Fri May 19 13:18:40 EDT 2023 
[    0.000000] The list of certified hardware and cloud instances for Red Hat Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com. 
[    0.000000] Command line: elfcorehdr=0xdd000000 BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-316.el9.x86_64 ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH resume=UUID=69aba99a-5910-4b3d-9ed6-28c216879007 console=ttyS1,115200n81 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 disable_cpu_apicid=0 hpwdt.pretimeout=0 hpwdt.kdumptimeout=0 
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' 
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' 
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' 
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256 
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. 
[    0.000000] signal: max sigframe size: 1776 
[    0.000000] BIOS-provided physical RAM map: 
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved 
[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x00000000000997ff] usable 
[    0.000000] BIOS-e820: [mem 0x0000000000099800-0x0000000000099bff] reserved 
[    0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000dd001000-0x00000000ecffffff] usable 
[    0.000000] BIOS-e820: [mem 0x0000e4000-0x00000000eddedfff] ACPI data 
[    0.000000] BIOS-e820: [mem 0x00000000eddee00-0x00000000f3ffffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fee0ffff] reserved 
[    0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000000ffffffff] reserved 
[    0.000000] NX (Execute Disable) protection: active 
[    0.000000] SMBIOS 2.7 present. 
[    0.000000] DMI: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013 
[    0.000000] tsc: Fast TSC calibration using PIT 
[    0.000000] tsc: Detected 3392.277 MHz processor 

Please, refer to the following console log:

http://lab-02.hosts.prod.upshift.rdu2.redhat.com/beaker/logs/recipes/13968+/13968366/console.log

RDMA OPENMPI test beaker job ID:
https://beaker.engineering.redhat.com/jobs/7890428 [ RS:11794138 / R:13968366 ]



Version-Release number of selected component (if applicable):



Clients: rdma-perf-06
Servers: rdma-dev-13

DISTRO=RHEL-9.3.0-20230521.45

+ [23-05-25 12:02:40] cat /etc/redhat-release
Red Hat Enterprise Linux release 9.3 Beta (Plow)

+ [23-05-25 12:02:40] uname -a
Linux rdma-dev-13.rdma.lab.eng.rdu2.redhat.com 5.14.0-316.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 19 13:18:40 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

+ [23-05-25 12:02:40] cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-316.el9.x86_64 root=UUID=7c453c45-2eaa-4f0f-afbb-79a6d9e70ca3 ro intel_idle.max_cstate=0 intremap=no_x2apic_optout processor.max_cstate=0 console=tty0 rd_NO_PLYMOUTH crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=69aba99a-5910-4b3d-9ed6-28c216879007 console=ttyS1,115200n81

+ [23-05-25 12:02:40] rpm -q rdma-core linux-firmware
rdma-core-44.0-2.el9.x86_64
linux-firmware-20230404-134.el9.noarch

+ [23-05-25 12:02:40] tail /sys/class/infiniband/cxgb4_0/fw_ver
1.27.1.0

+ [23-05-25 12:02:40] lspci
+ [23-05-25 12:02:40] grep -i -e ethernet -e infiniband -e omni -e ConnectX
03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe (rev 10)
03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe (rev 10)
05:00.0 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
05:00.1 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
05:00.2 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
05:00.3 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller
05:00.4 Ethernet controller: Chelsio Communications Inc T62100-LP-CR Unified Wire Ethernet Controller


How reproducible:
So far, only once...

Steps to Reproduce:
1. Refer to the above beaker job for OPENMPI tests
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 RHEL Program Management 2023-09-21 13:39:43 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 2 RHEL Program Management 2023-09-21 13:40:04 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.