RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1497152 - systool causes panic on 2.6.32-696.6.3.el6.x86_64 using be2iscsi
Summary: systool causes panic on 2.6.32-696.6.3.el6.x86_64 using be2iscsi
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Maurizio Lombardi
QA Contact: Martin Hoyer
URL:
Whiteboard:
Depends On:
Blocks: 1507512
TreeView+ depends on / blocked
 
Reported: 2017-09-29 10:29 UTC by nikhil kshirsagar
Modified: 2018-06-19 04:59 UTC (History)
7 users (show)

Fixed In Version: kernel-2.6.32-722.el6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1507512 (view as bug list)
Environment:
Last Closed: 2018-06-19 04:59:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1854 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2018-06-19 08:58:56 UTC

Description nikhil kshirsagar 2017-09-29 10:29:41 UTC
Description of problem:

systool or other diagnosys tools that uses systool like emcgrab cause kernel panic.

Version-Release number of selected component (if applicable):
2.6.32-696.6.3.el6.x86_64 kernel



Additional info:

https://access.redhat.com/solutions/2832161
and the customers reported behaviour is the same. 
The only difference is that their machines use Emulex FCOE, which are deprecated in rhel 7 seems like. Here are the details of our vmcore analysis, and also their lspci outputs showing the deprecated boards mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1312460

----------------------------------------------------------------------------------------------------------------

   KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-696.10.3.el6.x86_64/vmlinux
    DUMPFILE: /cores/retrace/tasks/235823940/crash/vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Thu Sep 28 03:45:40 2017
      UPTIME: 00:06:53
LOAD AVERAGE: 4.83, 3.85, 1.97
       TASKS: 2837
    NODENAME: ultc1r01.osasunet
     RELEASE: 2.6.32-696.10.3.el6.x86_64
     VERSION: #1 SMP Thu Sep 21 12:12:50 EDT 2017
     MACHINE: x86_64  (2600 Mhz)
      MEMORY: 256 GB
       PANIC: "general protection fault: 0000 [#1] SMP "
         PID: 42300
     COMMAND: "systool"
        TASK: ffff88232d0c8040  [THREAD_INFO: ffff88232d0b8000]
         CPU: 39
       STATE: TASK_RUNNING (PANIC)

 bt
PID: 42300  TASK: ffff88232d0c8040  CPU: 39  COMMAND: "systool"
 #0 [ffff88232d0bba90] machine_kexec at ffffffff8103fdbb
 #1 [ffff88232d0bbaf0] crash_kexec at ffffffff810d1f02
 #2 [ffff88232d0bbbc0] oops_end at ffffffff8154f060
 #3 [ffff88232d0bbbf0] die at ffffffff8101101b
 #4 [ffff88232d0bbc20] do_general_protection at ffffffff8154eb42
 #5 [ffff88232d0bbc50] general_protection at ffffffff8154e2b5
    [exception RIP: strnlen+9]
    RIP: ffffffff812a3379  RSP: ffff88232d0bbd08  RFLAGS: 00010286
    RAX: ffffffff817c6126  RBX: ffff88333d812000  RCX: 0000000000000002
    RDX: 7665642073696854  RSI: ffffffffffffffff  RDI: 7665642073696854
    RBP: ffff88232d0bbd08   R8: 0000000000000073   R9: ffff8808332d43b0
    R10: 0000000000000000  R11: 0000000000000246  R12: ffff88333d811013
    R13: 7665642073696854  R14: 00000000ffffffff  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88232d0bbd10] string at ffffffff812a47a0
 #7 [ffff88232d0bbd50] vsnprintf at ffffffff812a6258
 #8 [ffff88232d0bbdf0] snprintf at ffffffff812a66c4
 #9 [ffff88232d0bbe50] beiscsi_adap_family_disp at ffffffffa05b1a79 [be2iscsi]
#10 [ffff88232d0bbe60] dev_attr_show at ffffffff8137f907
#11 [ffff88232d0bbe90] sysfs_read_file at ffffffff81217551
#12 [ffff88232d0bbef0] vfs_read at ffffffff8119a4f5
#13 [ffff88232d0bbf30] sys_read at ffffffff8119a841
#14 [ffff88232d0bbf80] system_call_fastpath at ffffffff8100b0d2
    RIP: 000000321a4db770  RSP: 00007ffeac4b4940  RFLAGS: 00010216
    RAX: 0000000000000000  RBX: ffffffff8100b0d2  RCX: 0000000000020000
    RDX: 0000000000001000  RSI: 0000000002200330  RDI: 0000000000000004
    RBP: 0000000002200330   R8: 0000000000000008   R9: 0000000000260000
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000004
    R13: 0000000000001000  R14: 00000000022001d0  R15: 0000000000001000
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

Below is the HBA driver version ----

cat sos_commands/kernel/modinfo_tpm_tpm_tis_libata_efivars_tcp_cubic_kernel_printk_kgdb_spurious_pstore_dynamic_debug_pcie_aspm_pci_hotplug_pciehp_acpiphp_intel_idle_acpi_pci_slot_processor_thermal_acpi_memhotplug_battery_keyboard_vt_8250_kgdboc_kgdbts_scsi_mod_pcmcia_core_pcmci | grep -i lpfc
filename:       /lib/modules/2.6.32-696.6.3.el6.x86_64/kernel/drivers/scsi/lpfc/lpfc.ko

I can also see below call traces.

Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8 freq_table mperf 8021q garp stp llc bonding dm_service_time dm_round_robin dm_multipath microcode e1000e ptp pps_core hpilo hpwdt be2iscsi iscsi_boot_sysfs libiscsi scsi_transport_iscsi serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 power_meter acpi_ipmi ipmi_si ipmi_msghandler sg be2net shpchp ext4 jbd2 mbcache sd_mod hpsa lpfc scsi_transport_fc scsi_tgt crc_t10dif ata_generic pata_acpi pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 42300, comm: systool Tainted: P           -- ------------    2.6.32-696.10.3.el6.x86_64 #1 HP ProLiant BL685c G7
RIP: 0010:[<ffffffff812a3379>]  [<ffffffff812a3379>] strnlen+0x9/0x40
RSP: 0018:ffff88232d0bbd08  EFLAGS: 00010286
RAX: ffffffff817c6126 RBX: ffff88333d812000 RCX: 0000000000000002
RDX: 7665642073696854 RSI: ffffffffffffffff RDI: 7665642073696854
RBP: ffff88232d0bbd08 R08: 0000000000000073 R09: ffff8808332d43b0
R10: 0000000000000000 R11: 0000000000000246 R12: ffff88333d811013
R13: 7665642073696854 R14: 00000000ffffffff R15: 0000000000000000
FS:  00007f3123212700(0000) GS:ffff88305c6c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002201188 CR3: 000000333bf69000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process systool (pid: 42300, threadinfo ffff88232d0b8000, task ffff88232d0c8040)
Stack:
 ffff88232d0bbd48 ffffffff812a47a0 ffffffff81217040 ffff88333d811013
<d> ffffffffa05b8ab1 ffffffffa05b8aaf ffff88232d0bbdf8 ffff88333d812000
<d> ffff88232d0bbde8 ffffffff812a6258 0000000000000004 0000000affffffff
Call Trace:
 [<ffffffff812a47a0>] string+0x40/0x100
 [<ffffffff81217040>] ? sysfs_open_file+0x0/0x290
 [<ffffffff812a6258>] vsnprintf+0x218/0x5e0
 [<ffffffff812a66c4>] snprintf+0x34/0x40
 [<ffffffffa05b1a79>] beiscsi_adap_family_disp+0x49/0xc0 [be2iscsi]
 [<ffffffff8137f907>] dev_attr_show+0x27/0x50
 [<ffffffff8113bc2e>] ? __get_free_pages+0xe/0x50
 [<ffffffff81217551>] sysfs_read_file+0x111/0x200
 [<ffffffff8119a4f5>] vfs_read+0xb5/0x1a0
 [<ffffffff8119b2a6>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119a841>] sys_read+0x51/0xb0
 [<ffffffff810ee47e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Code: 66 90 48 83 c2 01 80 3a 00 75 f7 48 89 d0 48 29 f8 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 f6 48 89 e5 74 2e <80> 3f 00 74 29 48 83 ee 01 48 89 f8 eb 12 66 0f 1f 84 00 00 00 
RIP  [<ffffffff812a3379>] strnlen+0x9/0x40
 RSP <ffff88232d0bbd08>

It seems there is some issue with below 3 modules.

NAME        TAINTS
oracleoks   P(U)
oracleadvm  P(U)
oracleacfs  P(U)

I found below details.

https://access.redhat.com/solutions/2537611

https://bugzilla.redhat.com/show_bug.cgi?id=1312460

Comment 3 David Jeffery 2017-09-29 15:22:14 UTC
The crash is from the be2iscsi driver using a bad extern declaration in be_mgmt.c for the beiscsi_obsolete_adapter_msg string.  The string exists in be_main.c as "char const beiscsi_obsolete_adapter_msg[]" but be_mgmt.c redefined it as "extern char const *beiscsi_obsolete_adapter_msg".

While you can use arrays as pointers, altering the variable declaration like this to switch between an array and a pointer breaks how gcc handles the variable.  When the code in be_mgmt.c tries to use beiscsi_obsolete_adapter_msg as a pointer, it instead passes the beginning of the string itself.  The kernel then crashes when strnlen tries to use the copied part of a string as a pointer.


The extern declaration in be_mgmt.c should be the same as how the variable is declared in be_main.c.  A simple patch like this should fix the issue:


--- linux-2.6.32-696.15.1.el6.x86_64/drivers/scsi/be2iscsi/be_mgmt.c 2017-09-26 06:52:08.000000000 -0400
+++ linux-2.6.32-696.15.1.el6.x86_64/drivers/scsi/be2iscsi/be_mgmt.c 2017-09-29 11:15:43.256589278 -0400
@@ -24,7 +24,7 @@
 #include "be_iscsi.h"
 #include "be_main.h"
 
-extern char const *beiscsi_obsolete_adapter_msg;
+extern char const beiscsi_obsolete_adapter_msg[];
 
 /* UE Status Low CSR */
 static const char * const desc_ue_status_low[] = {

Comment 7 nikhil kshirsagar 2017-10-04 03:36:10 UTC
This bug was introduced in kernel-2.6.32-642.el6 (GA 6.7)

The exact commit was f628daf643edeadf2957921f77992df0f5aa61c3


commit f628daf643edeadf2957921f77992df0f5aa61c3
Author: Maurizio Lombardi <mlombard>
Date:   Wed Mar 16 10:01:49 2016 -0400

    [scsi] be2iscsi: Add warning message for unsupported adapter
    
    Message-id: <56E92F0D.5060802>
    Patchwork-id: 138554
    O-Subject: Re: [RHEL6.8 e-stor PATCH V2 09/10] be2iscsi : Add warning message for unsupported adapter
    Bugzilla: 1253016
    RH-Acked-by: Rob Evers <revers>
    RH-Acked-by: Tomas Henzl <thenzl>
    
    RHEL-only
    
    Add a warning message to indicate obsolete
    BE2 Adapter Family devices
    
    Partially base on this patch:
    http://www.spinics.net/lists/linux-scsi/msg87182.html
    
    Signed-off-by: Maurizio Lombardi <mlombard>
    Signed-off-by: Aristeu Rozanski <aris>
----------------------------------

commit f628daf643edeadf2957921f77992df0f5aa61c3
Author: Maurizio Lombardi <mlombard>
Date:   Wed Mar 16 10:01:49 2016 -0400

    [scsi] be2iscsi: Add warning message for unsupported adapter
    
    Message-id: <56E92F0D.5060802>
    Patchwork-id: 138554
    O-Subject: Re: [RHEL6.8 e-stor PATCH V2 09/10] be2iscsi : Add warning message for unsupported adapter
    Bugzilla: 1253016
    RH-Acked-by: Rob Evers <revers>
    RH-Acked-by: Tomas Henzl <thenzl>
    
    RHEL-only
    
    Add a warning message to indicate obsolete
    BE2 Adapter Family devices
    
    Partially base on this patch:
    http://www.spinics.net/lists/linux-scsi/msg87182.html
    
    Signed-off-by: Maurizio Lombardi <mlombard>
    Signed-off-by: Aristeu Rozanski <aris>

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 9045129..d40b553 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -208,6 +208,14 @@ static char const *cqe_desc[] = {
        "CXN_KILLED_IMM_DATA_RCVD"
 };
 
+char const beiscsi_obsolete_adapter_msg[] = {
+       "This device is not recommended for new deployments. It continues to be\n"
+       "supported in this RHEL release, but it is likely to be removed in the\n"
+       "next major release. Driver updates and fixes for this device will be\n"
+       "limited to critical issues.  Please contact your device's hardware\n"
+       "vendor for additional information.\n"
+};
+
 static int beiscsi_slave_configure(struct scsi_device *sdev)
 {
        blk_queue_max_segment_size(sdev->request_queue, 65536);
@@ -5714,6 +5722,7 @@ static int __devinit beiscsi_dev_probe(struct pci_dev *pcidev,
        case OC_DEVICE_ID2:
                phba->generation = BE_GEN2;
                phba->iotask_fn = beiscsi_iotask;
+               dev_warn(&pcidev->dev, beiscsi_obsolete_adapter_msg);
                break;
        case BE_DEVICE_ID2:
        case OC_DEVICE_ID3:
diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c
index 2d128f8..dbb679a 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.c
+++ b/drivers/scsi/be2iscsi/be_mgmt.c
@@ -24,6 +24,8 @@
 #include "be_iscsi.h"
 #include "be_main.h"
 
+extern char const *beiscsi_obsolete_adapter_msg; <--------
+
 /* UE Status Low CSR */
 static const char * const desc_ue_status_low[] = {
        "CEV",
@@ -1536,7 +1538,8 @@ beiscsi_adap_family_disp(struct device *dev, struct device_attribute *attr,
        case BE_DEVICE_ID1:
        case OC_DEVICE_ID1:
        case OC_DEVICE_ID2:
-               return snprintf(buf, PAGE_SIZE, "BE2 Adapter Family\n");
+               return snprintf(buf, PAGE_SIZE, "%s\n%s", "BE2 Adapter Family",
+                               beiscsi_obsolete_adapter_msg);
                break;
        case BE_DEVICE_ID2:
        case OC_DEVICE_ID3:

Can this be fixed in main, as well as a hotfix for the customer (who is using kernel-2.6.32-696.10.3.el6 )

We already provided a test kernel with the fix, and it works fine.

Comment 13 Phillip Lougher 2017-10-13 20:30:09 UTC
Patch(es) committed on kernel repository and kernel is undergoing testing

Comment 16 Phillip Lougher 2017-10-18 19:38:50 UTC
Patch(es) available on kernel-2.6.32-722.el6

Comment 21 Martin Hoyer 2018-04-25 14:12:33 UTC
Tested on system with OCe10102-IM adapter using kernel-2.6.32-749.el6.x86_64

Comment 23 errata-xmlrpc 2018-06-19 04:59:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1854


Note You need to log in before you can comment on or make changes to this bug.