Bug 2055457

Summary:	kernel NULL pointer dereference while calling dma_pool_alloc from the mlx5_core module [rhel-7.9.z]
Product:	Red Hat Enterprise Linux 7	Reporter:	suresh kumar <surkumar>
Component:	kernel	Assignee:	William Zhao <wizhao>
kernel sub component:	Networking	QA Contact:	Tianhao <tizhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	arawal, jiji, kpfleming, kzhang, mleitner, nmurray, sukulkar, tizhao
Version:	7.9	Keywords:	OtherQA, ZStream
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	7.9
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-3.10.0-1160.66.1.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-18 16:15:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2069951

Description suresh kumar 2022-02-17 02:32:43 UTC

Description of problem:

While rebooting, the system triggered panic

   
[  755.537074] qla2xxx [0000:37:00.1]-fffa:2: Adapter shutdown
[  755.537140] qla2xxx [0000:37:00.1]-00af:2: Performing ISP error recovery - ha=ffff89243d855000.
[  755.537594] qla2xxx [0000:37:00.1]-fffe:2: Adapter shutdown successfully.
[  755.537596] qla2xxx [0000:37:00.0]-fffa:0: Adapter shutdown
[  755.537685] qla2xxx [0000:37:00.0]-00af:0: Performing ISP error recovery - ha=ffff8905b2f99000.
[  755.538510] qla2xxx [0000:37:00.0]-fffe:0: Adapter shutdown successfully.
[  755.549084] mlx5_core 0000:12:00.1: Shutdown was called            <---------------------------------
[  755.613877] bond0: link status definitely down for interface ens1f1, disabling it
[  756.404455] mlx5_core 0000:12:00.0: Shutdown was called           <---------------------------------
[  756.462899] bond0: link status definitely down for interface ens1f0, disabling it
[  756.462909] bond0: now running without any active interface!
[  757.164336] mlx5_core 0000:12:00.0: mlx5_cmd_check:745:(pid 12649): ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x192deb)
[  757.303748] dlm: closing connection to node 6
[  757.303759] dlm: closing connection to node 5
[  757.303766] dlm: closing connection to node 4
[  757.303773] dlm: closing connection to node 3
[  757.303780] dlm: closing connection to node 2
[  757.303788] dlm: closing connection to node 1
[  757.305729] dlm: data: no userland control daemon, stopping lockspace
[  757.305743] dlm: data1: no userland control daemon, stopping lockspace
[  757.305757] dlm: clvmd: no userland control daemon, stopping lockspace
[  757.305769] dlm: dlm user daemon left 3 lockspaces
[  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
[  758.102527] PGD 8000001f970b9067 PUD 1f9b841067 PMD 0 
[  758.164249] Oops: 0000 [#1] SMP 
[  758.202963] Modules linked in: gfs2 dlm bonding falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE) falcon_kal(E) falcon_lsm_pinned_12904(E) sunrpc dm_service_time skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ses enclosure dm_multipath ipmi_si mei_me ipmi_devintf sg lpc_ich hpilo joydev mei wmi hpwdt ipmi_msghandler acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 qla2xxx i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core drm crct10dif_pclmul crct10dif_common crc32c_intel serio_raw smartpqi nvme_fc nvme_fabrics nvme_core tg3 scsi_transport_sas scsi_transport_fc scsi_tgt mlxfw devlink
[  759.051524]  ptp drm_panel_orientation_quirks pps_core dm_mirror dm_region_hash dm_log dm_mod
[  759.138374] CPU: 1 PID: 12649 Comm: amsd Kdump: loaded Tainted: P            E  ------------   3.10.0-1160.53.1.el7.x86_64 #1
[  759.274327] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 01/23/2021
[  759.376813] task: ffff8924108f2100 ti: ffff89240e1a0000 task.ti: ffff89240e1a0000
[  759.466751] RIP: 0010:[<ffffffff8ee11acb>]  [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
[  759.567160] RSP: 0018:ffff89240e1a3968  EFLAGS: 00010046
[  759.630955] RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000
[  759.716709] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090
[  759.802465] RBP: ffff89240e1a39c0 R08: 000000000001f080 R09: ffff8905ffc03c00
[  759.888220] R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0
[  759.973976] R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000
[  760.059732] FS:  00007fa2fbc6c8c0(0000) GS:ffff892440040000(0000) knlGS:0000000000000000
[  760.156991] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  760.226021] CR2: 0000000000000000 CR3: 0000001f8e2fc000 CR4: 00000000007607e0
[  760.311773] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  760.397529] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  760.483283] PKRU: 55555554
[  760.515707] Call Trace:
[  760.527371] sd 0:0:1:27: rejecting I/O to offline device
[  760.527421] sd 0:0:1:72: rejecting I/O to offline device
[  760.672598]  [<ffffffff8ee2920c>] ? kmem_cache_alloc_trace+0x3c/0x200
[  760.750018]  [<ffffffffc04680f3>] mlx5_alloc_cmd_msg+0xd3/0x2a0 [mlx5_core]
[  760.833716]  [<ffffffffc046ad62>] cmd_exec+0x112/0x860 [mlx5_core]
[  760.907999]  [<ffffffffc046b4fb>] mlx5_cmd_exec+0x2b/0x50 [mlx5_core]
[  760.985428]  [<ffffffffc0475434>] mlx5_core_access_reg+0xe4/0x130 [mlx5_core]
[  761.071222]  [<ffffffffc04a7348>] mlx5e_get_fec_caps+0xa8/0x100 [mlx5_core]
[  761.154933]  [<ffffffffc04992bf>] get_fec_supported_advertised+0x3f/0x150 [mlx5_core]
[  761.249101]  [<ffffffffc049ab36>] mlx5e_get_link_ksettings+0x3a6/0x530 [mlx5_core]
[  761.340112]  [<ffffffff8f25db46>] __ethtool_get_link_ksettings+0xa6/0x210
[  761.421701]  [<ffffffff8f277208>] speed_show+0x78/0xb0
[  761.483417]  [<ffffffff8f0b70e3>] dev_attr_show+0x23/0x60
[  761.548270]  [<ffffffff8f3875f2>] ? mutex_lock+0x12/0x2f
[  761.612076]  [<ffffffff8eedbedf>] sysfs_kf_seq_show+0xcf/0x1f0
[  761.682160]  [<ffffffff8eeda596>] kernfs_seq_show+0x26/0x30
[  761.749101]  [<ffffffff8ee76d10>] seq_read+0x130/0x450
[  761.810813]  [<ffffffff8eedaef5>] kernfs_fop_read+0x105/0x170
[  761.879845]  [<ffffffff8ee4e3ff>] vfs_read+0x9f/0x170
[  761.940510]  [<ffffffff8ee4f27f>] SyS_read+0x7f/0xf0
[  762.000131]  [<ffffffff8f395f92>] system_call_fastpath+0x25/0x2a
[  762.072293] Code: 4c 89 f6 48 89 df 48 89 45 b0 e8 d1 4b 19 00 8b 53 24 48 8b 45 b0 49 89 d7 4c 03 7b 10 83 43 20 01 48 03 53 18 48 89 c6 4c 89 ef <41> 8b 0f 89 4b 24 48 8b 4d b8 48 89 11 e8 e3 9c 57 00 41 81 e4 
[  762.299642] RIP  [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
[  762.371817]  RSP <ffff89240e1a3968>
[  762.413652] CR2: 0000000000000000


Version-Release number of selected component (if applicable):

kernel 3.10.0-1160.53.1.el7.x86_64


How reproducible:

  Customer is able to reproduce it with:
     pcs stonith fence <host>

Actual results:
  System panic while rebooting

Expected results:
  No panic

Additional info:
  Provided an upstream patch to check for netdevice being present to net-sysfs speed_show

Comment 3 suresh kumar 2022-02-17 02:40:54 UTC

[1]

The system was trying to access /sys/devices/pci0000:11/0000:11:00.0/0000:12:00.0/net/ens1f0/speed and crashed because the device was already removed

+++
crash> mount |grep ffff8924f3b81000
ffff8944bfb28300 ffff8924f3b81000 sysfs  sysfs     /sys      

crash> dentry.d_name.name,d_parent 0xffff892419b10f00
  d_name.name = 0xffff892419b10f38 "speed"
  d_parent = 0xffff89243cbc4cc0
crash> dentry.d_name.name,d_parent 0xffff89243cbc4cc0
  d_name.name = 0xffff89243cbc4cf8 "ens1f0"
  d_parent = 0xffff89443cecbd40
crash> dentry.d_name.name,d_parent 0xffff89443cecbd40
  d_name.name = 0xffff89443cecbd78 "net"
  d_parent = 0xffff89443cc1f740
crash> dentry.d_name.name,d_parent 0xffff89443cc1f740
  d_name.name = 0xffff89443cc1f778 "0000:12:00.0"
  d_parent = 0xffff8924f37eae40
+++


I have submitted below patch to upstream:


   net-sysfs: add check for netdevice being present to speed_show
    
    When bringing down the netdevice or system shutdown, a panic can be
    triggered while accessing the sysfs path because the device is already
    removed.
    
        [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
        [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
        ...
        [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
        [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
    
        crash> bt
        ...
        PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
        ...
         #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
            [exception RIP: dma_pool_alloc+0x1ab]
            RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
            RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
            RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
            RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
            R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
            R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
        #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
        #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
        #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
        #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
        #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
        #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
        #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
        #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
        #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
        #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
        #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
        #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
        #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
        #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
        #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
        #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
    
        crash> net_device.state ffff89443b0c0000
          state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
    
    To prevent this scenario, we also make sure that the netdevice is present.
    
    Signed-off-by: suresh kumar <suresh2514>

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 53ea262ecafd..fbddf966206b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -213,7 +213,7 @@ static ssize_t speed_show(struct device *dev,
        if (!rtnl_trylock())
                return restart_syscall();
 
-       if (netif_running(netdev)) {
+       if (netif_running(netdev) && netif_device_present(netdev)) {
                struct ethtool_link_ksettings cmd;
 
                if (!__ethtool_get_link_ksettings(netdev, &cmd))



Provided a test kernel to customer and confirmed system does not panic during fence testing


[2]
Similar issue was reported earlier also at https://bugzilla.redhat.com/show_bug.cgi?id=1845694#c9, but more information was not provided and bugzilla got closed

Comment 4 Marcelo Ricardo Leitner 2022-02-17 14:25:48 UTC

(In reply to suresh kumar from comment #3)
> I have submitted below patch to upstream:
> 
> 
>    net-sysfs: add check for netdevice being present to speed_show

Thanks Suresh.

That's:
https://lore.kernel.org/netdev/20220217015518.62719-1-sureshks%40redhat.com/T/

Comment 8 Tianhao 2022-03-28 09:06:10 UTC

Hi suresh,

Could customer help test the bug?

Regards,
Tianhao

Comment 9 suresh kumar 2022-03-29 01:12:11 UTC

Hi Tianhao,

Yes. They earlier helped in testing our test kernel also

Comment 10 Tianhao 2022-03-29 02:08:16 UTC

Based on comment #9, set OtherQA and qa_ack+.

Comment 11 William Zhao 2022-04-11 18:49:07 UTC

Hi Tianhao,

This is currently set to OtherQA and I haven't seen much movement for a while. I was wondering if there is any action needed on my side.

Comment 20 Tianhao 2022-05-09 06:20:33 UTC

The tier2 nic functional tests pass on kernel-3.10.0-1160.66.1.el7 on mlx5_core driver.

Test includes:
scaling: pass
setup topo via NetworkManager and reboot: mostly pass, vlan over bridge topo also failed on RHEL-7.9

related job:
https://beaker.engineering.redhat.com/jobs/6093585
https://beaker.engineering.redhat.com/jobs/6093586

There is no regressions found in testing, based on the tier1 test results on dt kernel and tier2 test results on candidate kernel, set VERFIED.

Comment 21 Tianhao 2022-05-09 06:21:46 UTC

(In reply to Tianhao from comment #20)
> https://beaker.engineering.redhat.com/jobs/6093585
> https://beaker.engineering.redhat.com/jobs/6093586
Here should be:
https://beaker.engineering.redhat.com/jobs/6600152

Comment 25 errata-xmlrpc 2022-05-18 16:15:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4642