Hide Forgot
Description of problem: systool or other diagnosys tools that uses systool like emcgrab cause kernel panic. Version-Release number of selected component (if applicable): 2.6.32-696.6.3.el6.x86_64 kernel Additional info: https://access.redhat.com/solutions/2832161 and the customers reported behaviour is the same. The only difference is that their machines use Emulex FCOE, which are deprecated in rhel 7 seems like. Here are the details of our vmcore analysis, and also their lspci outputs showing the deprecated boards mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1312460 ---------------------------------------------------------------------------------------------------------------- KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-696.10.3.el6.x86_64/vmlinux DUMPFILE: /cores/retrace/tasks/235823940/crash/vmcore [PARTIAL DUMP] CPUS: 48 DATE: Thu Sep 28 03:45:40 2017 UPTIME: 00:06:53 LOAD AVERAGE: 4.83, 3.85, 1.97 TASKS: 2837 NODENAME: ultc1r01.osasunet RELEASE: 2.6.32-696.10.3.el6.x86_64 VERSION: #1 SMP Thu Sep 21 12:12:50 EDT 2017 MACHINE: x86_64 (2600 Mhz) MEMORY: 256 GB PANIC: "general protection fault: 0000 [#1] SMP " PID: 42300 COMMAND: "systool" TASK: ffff88232d0c8040 [THREAD_INFO: ffff88232d0b8000] CPU: 39 STATE: TASK_RUNNING (PANIC) bt PID: 42300 TASK: ffff88232d0c8040 CPU: 39 COMMAND: "systool" #0 [ffff88232d0bba90] machine_kexec at ffffffff8103fdbb #1 [ffff88232d0bbaf0] crash_kexec at ffffffff810d1f02 #2 [ffff88232d0bbbc0] oops_end at ffffffff8154f060 #3 [ffff88232d0bbbf0] die at ffffffff8101101b #4 [ffff88232d0bbc20] do_general_protection at ffffffff8154eb42 #5 [ffff88232d0bbc50] general_protection at ffffffff8154e2b5 [exception RIP: strnlen+9] RIP: ffffffff812a3379 RSP: ffff88232d0bbd08 RFLAGS: 00010286 RAX: ffffffff817c6126 RBX: ffff88333d812000 RCX: 0000000000000002 RDX: 7665642073696854 RSI: ffffffffffffffff RDI: 7665642073696854 RBP: ffff88232d0bbd08 R8: 0000000000000073 R9: ffff8808332d43b0 R10: 0000000000000000 R11: 0000000000000246 R12: ffff88333d811013 R13: 7665642073696854 R14: 00000000ffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff88232d0bbd10] string at ffffffff812a47a0 #7 [ffff88232d0bbd50] vsnprintf at ffffffff812a6258 #8 [ffff88232d0bbdf0] snprintf at ffffffff812a66c4 #9 [ffff88232d0bbe50] beiscsi_adap_family_disp at ffffffffa05b1a79 [be2iscsi] #10 [ffff88232d0bbe60] dev_attr_show at ffffffff8137f907 #11 [ffff88232d0bbe90] sysfs_read_file at ffffffff81217551 #12 [ffff88232d0bbef0] vfs_read at ffffffff8119a4f5 #13 [ffff88232d0bbf30] sys_read at ffffffff8119a841 #14 [ffff88232d0bbf80] system_call_fastpath at ffffffff8100b0d2 RIP: 000000321a4db770 RSP: 00007ffeac4b4940 RFLAGS: 00010216 RAX: 0000000000000000 RBX: ffffffff8100b0d2 RCX: 0000000000020000 RDX: 0000000000001000 RSI: 0000000002200330 RDI: 0000000000000004 RBP: 0000000002200330 R8: 0000000000000008 R9: 0000000000260000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000001000 R14: 00000000022001d0 R15: 0000000000001000 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b Below is the HBA driver version ---- cat sos_commands/kernel/modinfo_tpm_tpm_tis_libata_efivars_tcp_cubic_kernel_printk_kgdb_spurious_pstore_dynamic_debug_pcie_aspm_pci_hotplug_pciehp_acpiphp_intel_idle_acpi_pci_slot_processor_thermal_acpi_memhotplug_battery_keyboard_vt_8250_kgdboc_kgdbts_scsi_mod_pcmcia_core_pcmci | grep -i lpfc filename: /lib/modules/2.6.32-696.6.3.el6.x86_64/kernel/drivers/scsi/lpfc/lpfc.ko I can also see below call traces. Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8 freq_table mperf 8021q garp stp llc bonding dm_service_time dm_round_robin dm_multipath microcode e1000e ptp pps_core hpilo hpwdt be2iscsi iscsi_boot_sysfs libiscsi scsi_transport_iscsi serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 power_meter acpi_ipmi ipmi_si ipmi_msghandler sg be2net shpchp ext4 jbd2 mbcache sd_mod hpsa lpfc scsi_transport_fc scsi_tgt crc_t10dif ata_generic pata_acpi pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 42300, comm: systool Tainted: P -- ------------ 2.6.32-696.10.3.el6.x86_64 #1 HP ProLiant BL685c G7 RIP: 0010:[<ffffffff812a3379>] [<ffffffff812a3379>] strnlen+0x9/0x40 RSP: 0018:ffff88232d0bbd08 EFLAGS: 00010286 RAX: ffffffff817c6126 RBX: ffff88333d812000 RCX: 0000000000000002 RDX: 7665642073696854 RSI: ffffffffffffffff RDI: 7665642073696854 RBP: ffff88232d0bbd08 R08: 0000000000000073 R09: ffff8808332d43b0 R10: 0000000000000000 R11: 0000000000000246 R12: ffff88333d811013 R13: 7665642073696854 R14: 00000000ffffffff R15: 0000000000000000 FS: 00007f3123212700(0000) GS:ffff88305c6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002201188 CR3: 000000333bf69000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process systool (pid: 42300, threadinfo ffff88232d0b8000, task ffff88232d0c8040) Stack: ffff88232d0bbd48 ffffffff812a47a0 ffffffff81217040 ffff88333d811013 <d> ffffffffa05b8ab1 ffffffffa05b8aaf ffff88232d0bbdf8 ffff88333d812000 <d> ffff88232d0bbde8 ffffffff812a6258 0000000000000004 0000000affffffff Call Trace: [<ffffffff812a47a0>] string+0x40/0x100 [<ffffffff81217040>] ? sysfs_open_file+0x0/0x290 [<ffffffff812a6258>] vsnprintf+0x218/0x5e0 [<ffffffff812a66c4>] snprintf+0x34/0x40 [<ffffffffa05b1a79>] beiscsi_adap_family_disp+0x49/0xc0 [be2iscsi] [<ffffffff8137f907>] dev_attr_show+0x27/0x50 [<ffffffff8113bc2e>] ? __get_free_pages+0xe/0x50 [<ffffffff81217551>] sysfs_read_file+0x111/0x200 [<ffffffff8119a4f5>] vfs_read+0xb5/0x1a0 [<ffffffff8119b2a6>] ? fget_light_pos+0x16/0x50 [<ffffffff8119a841>] sys_read+0x51/0xb0 [<ffffffff810ee47e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b Code: 66 90 48 83 c2 01 80 3a 00 75 f7 48 89 d0 48 29 f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 f6 48 89 e5 74 2e <80> 3f 00 74 29 48 83 ee 01 48 89 f8 eb 12 66 0f 1f 84 00 00 00 RIP [<ffffffff812a3379>] strnlen+0x9/0x40 RSP <ffff88232d0bbd08> It seems there is some issue with below 3 modules. NAME TAINTS oracleoks P(U) oracleadvm P(U) oracleacfs P(U) I found below details. https://access.redhat.com/solutions/2537611 https://bugzilla.redhat.com/show_bug.cgi?id=1312460
The crash is from the be2iscsi driver using a bad extern declaration in be_mgmt.c for the beiscsi_obsolete_adapter_msg string. The string exists in be_main.c as "char const beiscsi_obsolete_adapter_msg[]" but be_mgmt.c redefined it as "extern char const *beiscsi_obsolete_adapter_msg". While you can use arrays as pointers, altering the variable declaration like this to switch between an array and a pointer breaks how gcc handles the variable. When the code in be_mgmt.c tries to use beiscsi_obsolete_adapter_msg as a pointer, it instead passes the beginning of the string itself. The kernel then crashes when strnlen tries to use the copied part of a string as a pointer. The extern declaration in be_mgmt.c should be the same as how the variable is declared in be_main.c. A simple patch like this should fix the issue: --- linux-2.6.32-696.15.1.el6.x86_64/drivers/scsi/be2iscsi/be_mgmt.c 2017-09-26 06:52:08.000000000 -0400 +++ linux-2.6.32-696.15.1.el6.x86_64/drivers/scsi/be2iscsi/be_mgmt.c 2017-09-29 11:15:43.256589278 -0400 @@ -24,7 +24,7 @@ #include "be_iscsi.h" #include "be_main.h" -extern char const *beiscsi_obsolete_adapter_msg; +extern char const beiscsi_obsolete_adapter_msg[]; /* UE Status Low CSR */ static const char * const desc_ue_status_low[] = {
This bug was introduced in kernel-2.6.32-642.el6 (GA 6.7) The exact commit was f628daf643edeadf2957921f77992df0f5aa61c3 commit f628daf643edeadf2957921f77992df0f5aa61c3 Author: Maurizio Lombardi <mlombard> Date: Wed Mar 16 10:01:49 2016 -0400 [scsi] be2iscsi: Add warning message for unsupported adapter Message-id: <56E92F0D.5060802> Patchwork-id: 138554 O-Subject: Re: [RHEL6.8 e-stor PATCH V2 09/10] be2iscsi : Add warning message for unsupported adapter Bugzilla: 1253016 RH-Acked-by: Rob Evers <revers> RH-Acked-by: Tomas Henzl <thenzl> RHEL-only Add a warning message to indicate obsolete BE2 Adapter Family devices Partially base on this patch: http://www.spinics.net/lists/linux-scsi/msg87182.html Signed-off-by: Maurizio Lombardi <mlombard> Signed-off-by: Aristeu Rozanski <aris> ---------------------------------- commit f628daf643edeadf2957921f77992df0f5aa61c3 Author: Maurizio Lombardi <mlombard> Date: Wed Mar 16 10:01:49 2016 -0400 [scsi] be2iscsi: Add warning message for unsupported adapter Message-id: <56E92F0D.5060802> Patchwork-id: 138554 O-Subject: Re: [RHEL6.8 e-stor PATCH V2 09/10] be2iscsi : Add warning message for unsupported adapter Bugzilla: 1253016 RH-Acked-by: Rob Evers <revers> RH-Acked-by: Tomas Henzl <thenzl> RHEL-only Add a warning message to indicate obsolete BE2 Adapter Family devices Partially base on this patch: http://www.spinics.net/lists/linux-scsi/msg87182.html Signed-off-by: Maurizio Lombardi <mlombard> Signed-off-by: Aristeu Rozanski <aris> diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c index 9045129..d40b553 100644 --- a/drivers/scsi/be2iscsi/be_main.c +++ b/drivers/scsi/be2iscsi/be_main.c @@ -208,6 +208,14 @@ static char const *cqe_desc[] = { "CXN_KILLED_IMM_DATA_RCVD" }; +char const beiscsi_obsolete_adapter_msg[] = { + "This device is not recommended for new deployments. It continues to be\n" + "supported in this RHEL release, but it is likely to be removed in the\n" + "next major release. Driver updates and fixes for this device will be\n" + "limited to critical issues. Please contact your device's hardware\n" + "vendor for additional information.\n" +}; + static int beiscsi_slave_configure(struct scsi_device *sdev) { blk_queue_max_segment_size(sdev->request_queue, 65536); @@ -5714,6 +5722,7 @@ static int __devinit beiscsi_dev_probe(struct pci_dev *pcidev, case OC_DEVICE_ID2: phba->generation = BE_GEN2; phba->iotask_fn = beiscsi_iotask; + dev_warn(&pcidev->dev, beiscsi_obsolete_adapter_msg); break; case BE_DEVICE_ID2: case OC_DEVICE_ID3: diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c index 2d128f8..dbb679a 100644 --- a/drivers/scsi/be2iscsi/be_mgmt.c +++ b/drivers/scsi/be2iscsi/be_mgmt.c @@ -24,6 +24,8 @@ #include "be_iscsi.h" #include "be_main.h" +extern char const *beiscsi_obsolete_adapter_msg; <-------- + /* UE Status Low CSR */ static const char * const desc_ue_status_low[] = { "CEV", @@ -1536,7 +1538,8 @@ beiscsi_adap_family_disp(struct device *dev, struct device_attribute *attr, case BE_DEVICE_ID1: case OC_DEVICE_ID1: case OC_DEVICE_ID2: - return snprintf(buf, PAGE_SIZE, "BE2 Adapter Family\n"); + return snprintf(buf, PAGE_SIZE, "%s\n%s", "BE2 Adapter Family", + beiscsi_obsolete_adapter_msg); break; case BE_DEVICE_ID2: case OC_DEVICE_ID3: Can this be fixed in main, as well as a hotfix for the customer (who is using kernel-2.6.32-696.10.3.el6 ) We already provided a test kernel with the fix, and it works fine.
Patch(es) committed on kernel repository and kernel is undergoing testing
Patch(es) available on kernel-2.6.32-722.el6
Tested on system with OCe10102-IM adapter using kernel-2.6.32-749.el6.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1854