Bug 1897576
Summary: | SAN Switch rebooted and caused (?) OpenStack compute node to reboot | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | ggrimaux | |
Component: | kernel | Assignee: | Dick Kennedy (Broadcom ECD) <dkennedy> | |
kernel sub component: | Storage Drivers | QA Contact: | Lin Li <lilin> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | acaringi, agk, bmarzins, bubrown, cwei, dkennedy, emilne, gconsalv, lilin, loberman, mircea.vutcovici, msnitzer, nmurray, nyewale, revers, toneata | |
Version: | 7.7 | Keywords: | Triaged, ZStream | |
Target Milestone: | rc | |||
Target Release: | 7.9 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-1160.39.1.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1984118 (view as bug list) | Environment: | ||
Last Closed: | 2021-08-31 09:09:32 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1982096, 1984118 |
Description
ggrimaux
2020-11-13 13:56:08 UTC
There's nothing in that stack trace that points to multipath. That code path is for the LPFC Fibre Channel driver. Reassigning to the storage drivers team. Do you still have the logs from that machine or the vmcore-dmesg? Hi dick, Yes I have that info. I am adding the last part of it which I think is what you need/want: [571553.704834] sd 15:0:9:0: alua: port group 01 state A preferred supports tolusnA [571553.705012] sd 15:0:9:0: alua: port group 01 state A preferred supports tolusnA [571553.711131] scsi 17:0:22:0: alua: Detached [571553.711255] device-mapper: multipath: Failing path 128:176. [571553.711265] device-mapper: multipath: Failing path 70:160. [571553.711274] device-mapper: multipath: Failing path 71:64. [571553.711283] device-mapper: multipath: Failing path 71:32. [571553.711292] device-mapper: multipath: Failing path 71:208. [571553.711300] device-mapper: multipath: Failing path 71:240. [571553.711309] device-mapper: multipath: Failing path 128:144. [571553.711317] device-mapper: multipath: Failing path 128:112. [571553.711331] device-mapper: multipath: Failing path 128:96. [571553.711342] device-mapper: multipath: Failing path 128:0. [571553.719778] sd 15:0:9:0: alua: port group 01 state A preferred supports tolusnA [571553.719939] sd 15:0:9:0: alua: port group 01 state A preferred supports tolusnA [571553.721913] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 [571553.730230] IP: [<ffffffffc069ce13>] __lpfc_sli_release_iocbq_s4+0x63/0x260 [lpfc] [571553.738094] PGD 800000af8eb13067 PUD bdd0685067 PMD 0 [571553.743514] Oops: 0000 [#1] SMP [571553.747017] Modules linked in: veth macsec binfmt_misc vhost_net vhost macvtap macvlan tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag fuse ebtable_filter ebtables tun ip6table_security ip6table_raw ip6table_mangle iptable_raw iptable_mangle overlay(T) vrouter(OE) 8021q garp mrp sch_ingress bonding openvswitch nf_nat_ipv6 nls_utf8 isofs nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_comment xt_multiport xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm libiscsi iTCO_wdt iTCO_vendor_support [571553.820894] bnxt_re ib_core skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass dell_smbios dcdbas dell_wmi_descriptor pcspkr pcc_cpufreq i2c_i801 mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter acpi_pad ses enclosure scsi_transport_sas nf_conntrack sg br_netfilter bridge stp llc ip_tables xfs libcrc32c dm_service_time sd_mod lpfc mgag200 i2c_algo_bit drm_kms_helper crc32_pclmul crc32c_intel ghash_clmulni_intel syscopyarea aesni_intel sysfillrect lrw sysimgblt gf128mul fb_sys_fops glue_helper nvmet_fc ablk_helper ttm cryptd nvmet tg3 scsi_transport_iscsi crc_t10dif crct10dif_generic ahci crct10dif_pclmul nvme_fc drm nvme_fabrics libahci ptp bnxt_en megaraid_sas nvme_core pps_core scsi_transport_fc libata scsi_tgt crct10dif_common devlink [571553.893851] wmi drm_panel_orientation_quirks dm_multipath nfit libnvdimm sunrpc dm_mirror dm_region_hash dm_log dm_mod [571553.904439] CPU: 15 PID: 2674 Comm: lpfc_worker_2 Kdump: loaded Tainted: G OE ------------ T 3.10.0-1062.12.1.el7.x86_64 #1 [571553.917568] Hardware name: Dell Inc. PowerEdge R740/0JMK61, BIOS 2.6.4 04/09/2020 [571553.925642] task: ffff9fe7f067b150 ti: ffff9fe7e6fe4000 task.ti: ffff9fe7e6fe4000 [571553.933732] RIP: 0010:[<ffffffffc069ce13>] [<ffffffffc069ce13>] __lpfc_sli_release_iocbq_s4+0x63/0x260 [lpfc] [571553.944407] RSP: 0018:ffff9fe7e6fe7b20 EFLAGS: 00010046 [571553.950373] RAX: 0000000000100000 RBX: ffffa047eab88e00 RCX: 000000010040002e [571553.958166] RDX: 0000000000000000 RSI: ffffa047eab88e00 RDI: ffffa047f4c1c000 [571553.965958] RBP: ffff9fe7e6fe7b50 R08: ffffa0478ffc7c40 R09: 000000010040002e [571553.974153] R10: 000000008ffc7f01 R11: ffffa0478ffc7c40 R12: ffffa047f4c1c000 [571553.981969] R13: ffff9fe7f8874060 R14: ffffa047eab88e00 R15: ffffa047f4c1c000 [571553.989788] FS: 0000000000000000(0000) GS:ffffa0487d1c0000(0000) knlGS:0000000000000000 [571553.999033] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [571554.005500] CR2: 0000000000000090 CR3: 0000009a9fab4000 CR4: 00000000007627e0 [571554.013454] PKRU: 00000000 [571554.016910] Call Trace: [571554.020139] [<ffffffffc069f407>] lpfc_sli_release_iocbq+0x37/0x60 [lpfc] [571554.027696] [<ffffffffc06bd83e>] lpfc_els_free_iocb+0x14e/0x1d0 [lpfc] [571554.035084] [<ffffffffc06c1b03>] lpfc_cmpl_els_prli+0xe3/0x210 [lpfc] [571554.042392] [<ffffffffc06a60bd>] lpfc_sli_sp_handle_rspiocb+0x3fd/0x780 [lpfc] [571554.050491] [<ffffffffc06cea36>] ? lpfc_mbx_cmpl_reg_login+0xe6/0x160 [lpfc] [571554.058423] [<ffffffff9c2af1f5>] ? mod_timer+0x1b5/0x230 [571554.064628] [<ffffffffc06b0172>] lpfc_sli_handle_slow_ring_event_s4+0x192/0x260 [lpfc] [571554.073466] [<ffffffffc06a03e2>] lpfc_sli_handle_slow_ring_event+0x12/0x20 [lpfc] [571554.081862] [<ffffffffc06d3afc>] lpfc_work_done+0x94c/0x14a0 [lpfc] [571554.089046] [<ffffffff9c9805c2>] ? __schedule+0x402/0x840 [571554.095378] [<ffffffffc06d46c0>] lpfc_do_work+0x70/0x1e0 [lpfc] [571554.102236] [<ffffffff9c2c72e0>] ? wake_up_atomic_t+0x30/0x30 [571554.108927] [<ffffffffc06d4650>] ? lpfc_work_done+0x14a0/0x14a0 [lpfc] [571554.116398] [<ffffffff9c2c61f1>] kthread+0xd1/0xe0 [571554.122246] [<ffffffff9c2c6120>] ? insert_kthread_work+0x40/0x40 [571554.129181] [<ffffffff9c98dd1d>] ret_from_fork_nospec_begin+0x7/0x21 [571554.136453] [<ffffffff9c2c6120>] ? insert_kthread_work+0x40/0x40 [571554.143383] Code: 28 48 c7 00 00 00 00 00 4d 85 ed 0f 84 a6 00 00 00 8b 86 74 01 00 00 a9 00 00 80 00 0f 85 76 01 00 00 48 8b 97 98 02 00 00 a8 40 <4c> 8b b2 90 00 00 00 74 0b 41 83 7d 24 02 0f 85 19 01 00 00 4d [571554.165427] RIP [<ffffffffc069ce13>] __lpfc_sli_release_iocbq_s4+0x63/0x260 [lpfc] [571554.173936] RSP <ffff9fe7e6fe7b20> [571554.178233]@ CR2: 0000000000000090 Above that part its only more multipath failure which I dont think you want. Let me know if thats enough or not. Thanks a lot! Hi Dick. Any update on this ? Thanks. Can you attach the whole vmcore-dmesg file? Hi Dick, I added the two vmcore-dmesg about two distinct crashes that seem to point to the same issue. If you need anything else please let me know. Thank you. Hi Dick, Could we have an update on this please ? Thank you. Hi Dick, Could we have an update on this please ? Thank you. I was hoping that you would provide the whole vmcore-dmesg file. The output you have out in the bz is a start but I wanted to see if when the switch is rebooted if it sends RSCNs first or does the link go down. I have not seen this on rhel7 before that is why I want to understand the actual steps that the switch takes. The null dereference in the release might be an overwrite because all the funcs in the call stack all had to use that same pointer? I will look closer at the trace and update the bz. Hi Dick, The vmcore-dmesg is complete. The problem is that dmesg is stored in a circular buffer which is 1MB, which is mathcing the file size. The ring size is configured by CONFIG_LOG_BUF_SHIFT kernel option and it is 20 (2^20 = 1MB). You can see this config var with: grep CONFIG_LOG_BUF_SHIFT /boot/config-$(uname -r) Thank you, Mircea https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34523484 Can you try this brew build. I added a check for els_wq before the dereference. Hi Dick, Sorry I should have put the flag for backport to RHEL 7.7. We need to keep the kernel the same for the OpenStack version. For this particular setup it is using a RHEL7.7 kernel. Also is this safe to run in production ? Thank you. DO NOT RUN TEST KERNELS FROM ENGINEERING IN A PRODUCTION ENVIRONMENT We need GSS involvement in case there are other support issues. cc: Laurence Dick, please attach the patch you used. Your kernel 3.10.0-1062.12.1.el7 is pretty old, you are missing a bunch of CVE fixes, can you run a later version of 7.7.z ? @ggrimaux Support will usually hand out test kernels but we always use the BZ #. We also always save the src.rpm Just a best practice please if Eng builds these. This is done so that we can track the kernel back to a BZ if a customer installs a test kernel and just leaves it there even though they are not supposed to. If we build it not from a BZ, we use the sfdc case # We also always add this. NOTE: This RPM has been provided by Red Hat for testing purposes only and is NOT supported for any other use. This RPM may contain changes that are necessary for debugging but that are not appropriate for other uses, or that are not compatible with third-party hardware or software. This RPM should NOT be deployed for purposes other than testing and debugging. Thanks a lot Laurence Oberman Also note. A best practice is to add log_buf_len=64M to the kernel grub line to avoid the ring buffer wrap around. I am hoping to get that to be the default as we seldom have small memory servers in support now. Regards Laurence rom a8548c7fac49bf8bfe456215a55219983488ba06 Mon Sep 17 00:00:00 2001 From: Dick Kennedy <dkennedy> Date: Tue, 26 Jan 2021 11:32:16 -0500 Subject: [rhel-7.7.z PATCH e-stor] Fix for crash in __lpfc_sli_release_iocbq_s4 Moved the pring assignment inside a if of the els_wq. --- drivers/scsi/lpfc/lpfc_sli.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index e73b5866d062..b3c35fb36047 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -1254,8 +1254,6 @@ __lpfc_sli_release_iocbq_s4(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq) lockdep_assert_held(&phba->hbalock); - lockdep_assert_held(&phba->hbalock); << THis needs to be removed in a different patch. - if (iocbq->sli4_xritag == NO_XRI) sglq = NULL; else @@ -1275,7 +1273,6 @@ __lpfc_sli_release_iocbq_s4(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq) goto out; } - pring = phba->sli4_hba.els_wq->pring; if ((iocbq->iocb_flag & LPFC_EXCHANGE_BUSY) && (sglq->state != SGL_XRI_ABORTED)) { spin_lock_irqsave(&phba->sli4_hba.sgl_list_lock, @@ -1295,8 +1292,11 @@ __lpfc_sli_release_iocbq_s4(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq) &phba->sli4_hba.sgl_list_lock, iflag); /* Check if TXQ queue needs to be serviced */ - if (!list_empty(&pring->txq)) - lpfc_worker_wake_up(phba); + if (phba->sli4_hba.els_wq) { + pring = phba->sli4_hba.els_wq->pring; + if (!list_empty(&pring->txq)) + lpfc_worker_wake_up(phba); + } } } -- 2.18.1 That patch is not upstream yet, it is in our queue to go upstream https://marc.info/?l=linux-scsi&m=161308728219646&w=2 Posted but not accepted yet. Dick Kennedy is a partner engineer from Broadcom and cannot see private comments. You need to make your comments and attachments public for him to read them. Or, we can add Broadcom ECD group to this BZ so they can see more things (I think you can do this on a per-comment in BZ now). Having said that, the stack trace does look like the same problem. [170618.556788] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 [170618.564978] IP: [<ffffffffc0647e13>] __lpfc_sli_release_iocbq_s4+0x63/0x260 [lpfc] [170618.572896] PGD 0 [170618.575206] Oops: 0000 [#1] SMP [170618.578754] Modules linked in: vhost_net vhost macvtap macvlan dm_service_time udp_diag unix_diag af_packet_diag netlink_diag tcp_diag inet_diag fuse ebtable_filter ebtables ip6table_security ip6table_raw i\ p6table_mangle iptable_raw iptable_mangle tun overlay(T) vrouter(OE) 8021q garp mrp sch_ingress bonding openvswitch nf_nat_ipv6 nls_utf8 isofs nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_ta\ bles nf_log_ipv4 nf_log_common xt_LOG xt_comment xt_multiport xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ib_ise\ rt iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm iw_cm dm_multipath [170618.651968] skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel dm_mod lrw gf128mul glue_helper ablk_helper cryptd pcs\ pkr ses enclosure mlx5_ib sg mei_me mei ib_uverbs hpwdt hpilo lpc_ich wmi ib_core tpm_crb ipmi_si pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter nf_conntrack br_netfilter bridge stp llc ip_tables xfs\ libcrc32c sd_mod lpfc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core ttm drm nvmet_fc nvmet crc32c_intel crc_t10dif crct10dif_generic crct10dif_pclmul nvme_fc smart\ pqi nvme_fabrics tg3 nvme_core mlxfw scsi_transport_fc devlink scsi_transport_sas scsi_tgt ptp crct10dif_common drm_panel_orientation_quirks pps_core uas usb_storage [170618.723887] CPU: 14 PID: 15800 Comm: lpfc_worker_1 Kdump: loaded Tainted: G OE ------------ T 3.10.0-1062.9.1.el7.x86_64 #1 [170618.737113] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/13/2019 [170618.746288] task: ffffa144ef06d230 ti: ffffa144e6620000 task.ti: ffffa144e6620000 [170618.754426] RIP: 0010:[<ffffffffc0647e13>] [<ffffffffc0647e13>] __lpfc_sli_release_iocbq_s4+0x63/0x260 [lpfc] [170618.765164] RSP: 0018:ffffa144e6623b20 EFLAGS: 00010046 [170618.771156] RAX: 0000000000100000 RBX: ffffa084f12d4800 RCX: 0000000100400035 [170618.778982] RDX: 0000000000000000 RSI: ffffa084f12d4800 RDI: ffffa084ccf58000 [170618.786803] RBP: ffffa144e6623b50 R08: ffffa144ddb48900 R09: 0000000100400035 [170618.794636] R10: 00000000ddb48e01 R11: ffffa144ddb48900 R12: ffffa084ccf58000 [170618.802472] R13: ffffa084ca85ede0 R14: ffffa084f12d4800 R15: ffffa084ccf58000 [170618.810320] FS: 0000000000000000(0000) GS:ffffa0857fb80000(0000) knlGS:0000000000000000 [170618.819158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [170618.825632] CR2: 0000000000000090 CR3: 000000a5c1e56000 CR4: 00000000007627e0 [170618.833514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [170618.841392] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [170618.849272] PKRU: 00000000 [170618.852703] Call Trace: [170618.855904] [<ffffffffc064a407>] lpfc_sli_release_iocbq+0x37/0x60 [lpfc] [170618.863466] [<ffffffffc066883e>] lpfc_els_free_iocb+0x14e/0x1d0 [lpfc] [170618.870858] [<ffffffffc066cb03>] lpfc_cmpl_els_prli+0xe3/0x210 [lpfc] [170618.878165] [<ffffffffc06510bd>] lpfc_sli_sp_handle_rspiocb+0x3fd/0x780 [lpfc] [170618.886272] [<ffffffffc0679a36>] ? lpfc_mbx_cmpl_reg_login+0xe6/0x160 [lpfc] [170618.894210] [<ffffffffb2eaf1f5>] ? mod_timer+0x1b5/0x230 [170618.900410] [<ffffffffc065b172>] lpfc_sli_handle_slow_ring_event_s4+0x192/0x260 [lpfc] [170618.909240] [<ffffffffc064b3e2>] lpfc_sli_handle_slow_ring_event+0x12/0x20 [lpfc] [170618.917641] [<ffffffffc067eafc>] lpfc_work_done+0x94c/0x14a0 [lpfc] [170618.924821] [<ffffffffb35805a2>] ? __schedule+0x402/0x840 [170618.931139] [<ffffffffc067f6c0>] lpfc_do_work+0x70/0x1e0 [lpfc] [170618.937983] [<ffffffffb2ec72e0>] ? wake_up_atomic_t+0x30/0x30 [170618.944637] [<ffffffffc067f650>] ? lpfc_work_done+0x14a0/0x14a0 [lpfc] [170618.952063] [<ffffffffb2ec61f1>] kthread+0xd1/0xe0 [170618.957736] [<ffffffffb2ec6120>] ? insert_kthread_work+0x40/0x40 [170618.964626] [<ffffffffb358dd1d>] ret_from_fork_nospec_begin+0x7/0x21 [170618.971871] [<ffffffffb2ec6120>] ? insert_kthread_work+0x40/0x40 [170618.978750] Code: 28 48 c7 00 00 00 00 00 4d 85 ed 0f 84 a6 00 00 00 8b 86 74 01 00 00 a9 00 00 80 00 0f 85 76 01 00 00 48 8b 97 98 02 00 00 a8 40 <4c> 8b b2 90 00 00 00 74 0b 41 83 7d 24 02 0f 85 19 01 00 \ 00 4d [170618.999774] RIP [<ffffffffc0647e13>] __lpfc_sli_release_iocbq_s4+0x63/0x260 [lpfc] [170619.008228] RSP <ffffa144e6623b20> [170619.012467] CR2: 0000000000000090 The stack trace that Ewan is talking about is the same problem. It is the same kernel as before. The kernel that I built had a different rev. Anyway I have not posted the patch, still trying to get my git lab config up. Ewan how do I proceed? The BZ has to be cloned for RHEL8, and the problem fixed in RHEL8 before it is fixed in RHEL7, to avoid a regression if an upgrade occurs to RHEL8. When patch is accepted upstream, you need to submit a MR to 8.5 and 7.9.z. If a fix is desired in an earlier zStream, then we need to request that it be fixed e.g. in 7.7.z and 7.8.z, unless the system can upgrade to 7.9.z Gregoire, please advise if the customer is able to upgrade to 7.9.z, it could take a while for a zStream fix to be available once the patch is accepted upstream. Hi Ewan, Client can't update to 7.9 Kernel. We will need this to be backported to the 7.7 branch. I added a hotfix flag on this BZ. Thank you. Dick, what is the status of the fix for this? It will need an MR for 7.9.z after being fixed in RHEL8. Hi, Could we have the status of this backport to RHEL 7.7? thanks Gianluca Dick, Is the patch appropriate for backport to rhel7.7.z do you know? Hi Dick/Rob/Ewan, I hit a firmware bug with kernel-3.10.0-1160.39.1.el7. Could you check if it is related to your patches? Thanks in advanceļ¼ beaker job: https://beaker.engineering.redhat.com/recipes/10442345#task130056041 console log: https://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2021/08/56724/5672435/10442345/console.log [19074.943674] WARNING: CPU: 0 PID: 44837 at drivers/base/firmware_class.c:1035 _request_firmware.isra.9+0x686/0x6d0 [19074.997999] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel dm_service_time lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif sp5100_tco joydev pcspkr sg i2c_piix4 k10temp fam15h_power hpwdt hpilo ipmi_si ipmi_devintf ipmi_msghandler dm_multipath acpi_power_meter ip_tables xfs libcrc32c sd_mod radeon lpfc i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm nvmet_fc nvmet ahci crc_t10dif crct10dif_generic ata_generic nvme_fc pata_acpi nvme_fabrics drm nvme_core libahci pata_atiixp scsi_transport_fc be2net libata crct10dif_pclmul crc32c_intel hpsa scsi_tgt serio_raw crct10dif_common netxen_nic drm_panel_orientation_quirks scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [19075.387473] CPU: 0 PID: 44837 Comm: kworker/0:1 Kdump: loaded Not tainted 3.10.0-1160.39.1.el7.x86_64 #1 [19075.432443] Hardware name: HP ProLiant DL585 G7, BIOS A16 06/04/2013 [19075.465445] Workqueue: events netxen_fwinit_work [netxen_nic] [19075.495491] Call Trace: [19075.508598] [<ffffffffb8783539>] dump_stack+0x19/0x1b [19075.536225] [<ffffffffb809b278>] __warn+0xd8/0x100 [19075.562766] [<ffffffffb809b3bd>] warn_slowpath_null+0x1d/0x20 [19075.593099] [<ffffffffb84cdb96>] _request_firmware.isra.9+0x686/0x6d0 [19075.624609] [<ffffffffb80e8ea3>] ? load_balance+0x1a3/0xa10 [19075.651368] [<ffffffffb84cdc0e>] request_firmware+0x2e/0x40 [19075.680176] [<ffffffffc02b3d4c>] netxen_request_firmware+0xec/0x640 [netxen_nic] [19075.715214] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19075.753809] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19075.790748] [<ffffffffc02ae829>] netxen_start_firmware+0x1e9/0xbd0 [netxen_nic] [19075.826922] [<ffffffffb8035c19>] ? sched_clock+0x9/0x10 [19075.851777] [<ffffffffb80de305>] ? sched_clock_cpu+0x85/0xc0 [19075.880966] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19075.917192] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19075.955616] [<ffffffffc02af2f6>] netxen_fwinit_work+0xe6/0x230 [netxen_nic] [19075.989153] [<ffffffffb80bde8f>] process_one_work+0x17f/0x440 [19076.020666] [<ffffffffb80befa6>] worker_thread+0x126/0x3c0 [19076.050745] [<ffffffffb80bee80>] ? manage_workers.isra.26+0x2a0/0x2a0 [19076.084552] [<ffffffffb80c5e61>] kthread+0xd1/0xe0 [19076.109377] [<ffffffffb80c5d90>] ? insert_kthread_work+0x40/0x40 [19076.140165] [<ffffffffb8795de4>] ret_from_fork_nospec_begin+0xe/0x21 [19076.174197] [<ffffffffb80c5d90>] ? insert_kthread_work+0x40/0x40 [19076.203364] ---[ end trace a779021c7b160b5c ]--- [19076.226207] netxen_nic 0000:04:00.0: firmware: phanfw.bin will not be loaded [19076.261036] ------------[ cut here ]------------ [19076.282669] WARNING: CPU: 0 PID: 44837 at drivers/base/firmware_class.c:1035 _request_firmware.isra.9+0x686/0x6d0 [19076.333154] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel dm_service_time lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif sp5100_tco joydev pcspkr sg i2c_piix4 k10temp fam15h_power hpwdt hpilo ipmi_si ipmi_devintf ipmi_msghandler dm_multipath acpi_power_meter ip_tables xfs libcrc32c sd_mod radeon lpfc i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm nvmet_fc nvmet ahci crc_t10dif crct10dif_generic ata_generic nvme_fc pata_acpi nvme_fabrics drm nvme_core libahci pata_atiixp scsi_transport_fc be2net libata crct10dif_pclmul crc32c_intel hpsa scsi_tgt serio_raw crct10dif_common netxen_nic drm_panel_orientation_quirks scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [19076.717631] CPU: 0 PID: 44837 Comm: kworker/0:1 Kdump: loaded Tainted: G W ------------ 3.10.0-1160.39.1.el7.x86_64 #1 [19076.774951] Hardware name: HP ProLiant DL585 G7, BIOS A16 06/04/2013 [19076.806867] Workqueue: events netxen_fwinit_work [netxen_nic] [19076.833392] Call Trace: [19076.845041] [<ffffffffb8783539>] dump_stack+0x19/0x1b [19076.871202] [<ffffffffb809b278>] __warn+0xd8/0x100 [19076.894064] [<ffffffffb809b3bd>] warn_slowpath_null+0x1d/0x20 [19076.922778] [<ffffffffb84cdb96>] _request_firmware.isra.9+0x686/0x6d0 [19076.954232] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19076.992218] [<ffffffffb84cdc0e>] request_firmware+0x2e/0x40 [19077.020032] [<ffffffffc02b3d4c>] netxen_request_firmware+0xec/0x640 [netxen_nic] [19077.058137] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19077.098123] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19077.140465] [<ffffffffc02ae829>] netxen_start_firmware+0x1e9/0xbd0 [netxen_nic] [19077.179219] [<ffffffffb8035c19>] ? sched_clock+0x9/0x10 [19077.207699] [<ffffffffb80de305>] ? sched_clock_cpu+0x85/0xc0 [19077.243718] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19077.282066] [<ffffffffc02aab7a>] ? netxen_nic_hw_read_wx_2M+0x3a/0x100 [netxen_nic] [19077.318868] [<ffffffffc02af2f6>] netxen_fwinit_work+0xe6/0x230 [netxen_nic] [19077.354382] [<ffffffffb80bde8f>] process_one_work+0x17f/0x440 [19077.383260] [<ffffffffb80befa6>] worker_thread+0x126/0x3c0 [19077.411161] [<ffffffffb80bee80>] ? manage_workers.isra.26+0x2a0/0x2a0 [19077.443724] [<ffffffffb80c5e61>] kthread+0xd1/0xe0 [19077.466607] [<ffffffffb80c5d90>] ? insert_kthread_work+0x40/0x40 [19077.497795] [<ffffffffb8795de4>] ret_from_fork_nospec_begin+0xe/0x21 [19077.527818] [<ffffffffb80c5d90>] ? insert_kthread_work+0x40/0x40 [19077.557837] ---[ end trace a779021c7b160b5d ]--- [19077.580682] netxen_nic 0000:04:00.0: firmware: nx3fwct.bin will not be loaded [19087.293997] netxen_nic: failed card response code:0x10 [19087.321820] netxen_nic 0000:04:00.0: Failed to setup minidump rcode = -5 [19087.738398] Restarting system. That looks like a longstanding problem, see e.g. bug 1425130 which was CLOSED WONTFIX. Search BZ for RHEL7 kernel "Comment contains the string 'drivers/base/firmware_class.c' In any case the netxen nic driver does not have anything to do with the lpfc driver. So it does not appear to be related. Move to verified according to comment 48 and comment 49. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3327 |