417661 – hpacucli cause kernel panic

Bug 417661 - hpacucli cause kernel panic

Summary: hpacucli cause kernel panic

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Tony Camuso
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	KernelPrio5.3 483701 533192
TreeView+	depends on / blocked

Reported:	2007-12-10 04:34 UTC by Con Tassios
Modified:	2018-11-29 20:38 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-10-01 19:44:14 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	426873	1	None	None	None	2021-01-20 06:05:38 UTC
Red Hat Bugzilla	429515	0	urgent	CLOSED	scsi: cciss - incompatability between hpacucli and RHEL 5.1 Kernel	2021-02-22 00:41:40 UTC

Internal Links: 429515

Description Con Tassios 2007-12-10 04:34:58 UTC

Description of problem:

The HP Command Line Array Configuration Utility (hpacucli) consistently causes a
kernel panic on RHEL v5.1 (2.6.18-53.1.4.el5).  This problem does not occur on
RHEL v5.0 2.6.18-8* kernels. 

Version-Release number of selected component (if applicable):

RHEL v5.1, 2.6.18-53.1.4.el5
Hardware: HP Proliant DL380 G5, 4GB RAM

How reproducible:

Consistently causes kernel panic after 2 or 3 command operations

Steps to Reproduce:

1. Install hpacucli-7.85-18.linux.rpm available from
http://h18023.www1.hp.com/support/files/server/us/download/27573.html

2. Run the command "hpacucli ctrl all show status".

3. Server will panic running command 2 or 3 times
  
Actual results:

kernel panic

crash> bt
PID: 4623   TASK: ffff81012fc08100  CPU: 0   COMMAND: ".hpacucli"
 #0 [ffff81011d4939d0] die at ffffffff80069681
 #1 [ffff81011d493a00] do_invalid_op at ffffffff80069c37
 #2 [ffff81011d493ac0] error_exit at ffffffff8005bde9
    [exception RIP: __list_add+72]
    RIP: ffffffff80143640  RSP: ffff81011d493b78  RFLAGS: 00013282
    RAX: 0000000000000058  RBX: ffffffff802fcf90  RCX: ffffffff802e5728
    RDX: ffffffff802e5728  RSI: 0000000000000000  RDI: ffffffff802e5720
    RBP: ffff81012ed26e90   R8: ffffffff802e5728   R9: 0000000000003046
    R10: 0000000000000000  R11: 0000000000000180  R12: ffff81012ed26e90
    R13: ffff81012ed26e70  R14: 00000000fffffffe  R15: ffffffff802fcfa8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #3 [ffff81011d493b70] __list_add at ffffffff80143640
 #4 [ffff81011d493b90] kobject_add at ffffffff801410a6
 #5 [ffff81011d493bd0] register_disk at ffffffff800fcf4c
 #6 [ffff81011d493c00] add_disk at ffffffff80138d9a
 #7 [ffff81011d493c10] rebuild_lun_table at ffffffff880b49a7
 #8 [ffff81011d493c90] cciss_ioctl at ffffffff880b4e21
 #9 [ffff81011d493dc0] do_ioctl at ffffffff880b56b6
#10 [ffff81011d493de0] cciss_compat_ioctl at ffffffff880b5774
#11 [ffff81011d493ef0] compat_blkdev_ioctl at ffffffff80137ffc
#12 [ffff81011d493f20] compat_sys_ioctl at ffffffff800edf00
#13 [ffff81011d493f80] sysenter_do_call at ffffffff8005f49b
    RIP: 00000000ffffe410  RSP: 00000000ffe201d8  RFLAGS: 00003296
    RAX: ffffffffffffffda  RBX: ffffffff8005f49b  RCX: 000000000000420e
    RDX: 0000000000000000  RSI: 0000000000000015  RDI: 00000000ffe20220
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 00000000ffe201d8
    ORIG_RAX: 0000000000000036  CS: 0023  SS: 002b


Expected results:

# hpacucli ctrl all show status

Smart Array P400 in Slot 1
   Controller Status: OK
   Cache Status: OK


Additional info:

Comment 1 Paul D. Mitcheson 2008-01-05 15:26:32 UTC

I am seeing this too.

I suggest that the priority be moved to high - as it is at the moment, EL5
cannot be used on HP hardware - which is a serious problem for me and I'm sufre
for many other people.

Redhat - do e expect a fix from you or from HP?

Here may be something useful from /var/log/messages:

Dec 12 13:22:54 capfs1 kernel:       blocks= 2293469488 block_size= 512
Dec 12 13:22:55 capfs1 snmpd[3393]: Connection from UDP: [127.0.0.1]:32779
Dec 12 13:22:56 capfs1 kernel:       blocks= 2293469488 block_size= 512
Dec 12 13:22:56 capfs1 kernel: kobject_add failed for cciss!c2d1 with -EEXIST,
don't try to register things with the same name in the same directory.
Dec 12 13:22:56 capfs1 kernel:
Dec 12 13:22:56 capfs1 kernel: Call Trace:
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8014115f>] kobject_add+0x16e/0x199
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80058ee5>] exact_lock+0x0/0x14
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff800fcf4c>] register_disk+0x43/0x199
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80138d9a>] add_disk+0x34/0x3d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b49a7>]
:cciss:rebuild_lun_table+0x48f/0x50f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff800c35ff>] zone_statistics+0x3e/0x6d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b4e21>] :cciss:cciss_ioctl+0x3fa/0xc65
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80142a05>] snprintf+0x44/0x4c
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8002ae65>] flush_tlb_page+0xac/0xda
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80010b1d>] do_wp_page+0x246/0x67d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8003a613>] d_lookup+0x1e/0x42
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b5774>]
:cciss:cciss_compat_ioctl+0xaf/0x25f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80022c4a>] flush_tlb_others+0x84/0xbc
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80022c5f>] flush_tlb_others+0x99/0xbc
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80021d28>] __up_read+0x19/0x7f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8000df59>]
free_pages_and_swap_cache+0x73/0x8f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80137ffc>] compat_blkdev_ioctl+0x4c/0x5f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff800edf00>] compat_sys_ioctl+0xc5/0x2b1
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67


Thanks,

Paul

Comment 2 David Elliott 2008-01-08 17:46:14 UTC

Hi,

We're seeing the same problem here with DL580G4 server

Jan  8 17:05:12 localhost kernel: kobject_add failed for cciss!c2d1 with
-EEXIST, don't try to register things with the same name in the same directory.
Jan  8 17:05:12 localhost kernel:
Jan  8 17:05:12 localhost kernel: Call Trace:
Jan  8 17:05:12 localhost kernel:  [<ffffffff80141148>] kobject_add+0x16e/0x199
Jan  8 17:05:12 localhost kernel:  [<ffffffff80058eec>] exact_lock+0x0/0x14
Jan  8 17:05:12 localhost kernel:  [<ffffffff800fcf35>] register_disk+0x43/0x199
Jan  8 17:05:12 localhost kernel:  [<ffffffff80138d83>] add_disk+0x34/0x3d
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b49a7>]
:cciss:rebuild_lun_table+0x48f/0x50f
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b4e21>]
:cciss:cciss_ioctl+0x3fa/0xc65
Jan  8 17:05:12 localhost kernel:  [<ffffffff801429ee>] snprintf+0x44/0x4c
Jan  8 17:05:12 localhost kernel:  [<ffffffff8002ae6c>] flush_tlb_page+0xac/0xda
Jan  8 17:05:12 localhost kernel:  [<ffffffff80010b0c>] do_wp_page+0x246/0x67d
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b5774>]
:cciss:cciss_compat_ioctl+0xaf/0x25f
Jan  8 17:05:12 localhost kernel:  [<ffffffff80021d2f>] __up_read+0x19/0x7f
Jan  8 17:05:12 localhost kernel:  [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d
Jan  8 17:05:12 localhost kernel:  [<ffffffff8000df48>]
free_pages_and_swap_cache+0x73/0x8f
Jan  8 17:05:12 localhost kernel:  [<ffffffff80137fe5>]
compat_blkdev_ioctl+0x4c/0x5f
Jan  8 17:05:12 localhost kernel:  [<ffffffff800edee9>] compat_sys_ioctl+0xc5/0x2b1
Jan  8 17:05:12 localhost kernel:  [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67

redhat 5.1 kernel 2.6.18-53.el5

Comment 3 Tony Procaccini 2008-01-14 22:56:27 UTC

I am also seeing the same as above running running on DL360G5 redhat 5.1 kernel 
2.6.18-53.el5PAE. This only seems to be an issue when using multiple logical 
volumes on the same cciss controller.  No issues when only one array 
configured.  The hp cpq_cciss driver isnt supported yet on 5.1 base kernel.  Is 
HP aware of this issue?  Need this fixed soon!!!!

Comment 4 VK 2008-03-22 21:10:40 UTC

Based on my interaction with HP support, this may be a dup of bug #429515 which
is a copy of bug #426873 which I do not have access to.

I've also seen this on an HP-DL-360-G5 with 6-drives and three logical volumes
on the sole SmartArray controller running RHEL 5.1 x86_64. Serial console shows:

list_add corruption. prev-&gt;next should be ffffffff802fcf90, but 
was ffff81042f434490 
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lib/list_debug.c:31 
invalid opcode: 0000 [1] SMP 
last sysfs file: 
/devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:00.0/0000 
:0b:00.1/irq 
CPU 0 
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler hidp rfcomm l2cap 
blueto 
oth sunrpc ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath dm_mod 
vide 
o sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac 
parp 
ort_pc lp parport shpchp e1000 serio_raw bnx2 pcspkr ata_piix libata cciss 
sd_mo 
d scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd 
Pid: 8874, comm: .hpacucli Not tainted 2.6.18-53.el5 #1 
RIP: 0010:[&lt;ffffffff80143629&gt;]  [&lt;ffffffff80143629&gt;]
__list_add+0x48/0x68 
RSP: 0018:ffff810413cefb78  EFLAGS: 00013282 
RAX: 0000000000000058 RBX: ffffffff802fcf90 RCX: ffffffff802e5728 
RDX: ffffffff802e5728 RSI: 0000000000000000 RDI: ffffffff802e5720 
RBP: ffff81042f434490 R08: ffffffff802e5728 R09: 0000000000003046 
R10: 0000000000000000 R11: 0000000000000180 R12: ffff81042f434c90 
R13: ffff81042f434c70 R14: 00000000fffffffe R15: ffffffff802fcfa8 
FS:  0000000000000000(0000) GS:ffffffff80396000(0063) knlGS:00000000f7fab6c0 
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b 
CR2: 00000000085f8004 CR3: 00000004190fd000 CR4: 00000000000006e0           
Process .hpacucli (pid: 8874, threadinfo ffff810413cee000, task 
ffff81040b2dc7e0) 
Stack:  ffff81042f434c70 ffff81042f434c00 ffff81042f9a0000 ffffffff8014108f 
 ffffffff80058eec ffff81042f434c78 ffff81042f434c00 ffff81042f9a0000 
 ffff81042f434c70 0000000000000001 ffff81042f9a0000 ffffffff800fcf35 
Call Trace: 
 [&lt;ffffffff8014108f&gt;] kobject_add+0xb5/0x199 
 [&lt;ffffffff80058eec&gt;] exact_lock+0x0/0x14 
 [&lt;ffffffff800fcf35&gt;] register_disk+0x43/0x199 
 [&lt;ffffffff80138d83&gt;] add_disk+0x34/0x3d 
 [&lt;ffffffff880b49a7&gt;] :cciss:rebuild_lun_table+0x48f/0x50f 
 [&lt;ffffffff800c35e8&gt;] zone_statistics+0x3e/0x6d 
 [&lt;ffffffff880b4e21&gt;] :cciss:cciss_ioctl+0x3fa/0xc65 
 [&lt;ffffffff801429ee&gt;] snprintf+0x44/0x4c 
 [&lt;ffffffff8002ae6c&gt;] flush_tlb_page+0xac/0xda 
 [&lt;ffffffff880b56b6&gt;] :cciss:do_ioctl+0x2a/0x39 
 [&lt;ffffffff880b5774&gt;] :cciss:cciss_compat_ioctl+0xaf/0x25f 
 [&lt;ffffffff8011d242&gt;] inode_has_perm+0x56/0x63 
 [&lt;ffffffff80021d2f&gt;] __up_read+0x19/0x7f 
 [&lt;ffffffff8011d2e3&gt;] file_has_perm+0x94/0xa3 
 [&lt;ffffffff80137fe5&gt;] compat_blkdev_ioctl+0x4c/0x5f 
 [&lt;ffffffff800edee9&gt;] compat_sys_ioctl+0xc5/0x2b1 
 [&lt;ffffffff8005f49b&gt;] sysenter_do_call+0x1b/0x67 
 
Code: 0f 0b 68 87 5c 29 80 c2 1f 00 4c 89 63 08 49 89 1c 24 4c 89 
RIP  [&lt;ffffffff80143629&gt;] __list_add+0x48/0x68 
 RSP &lt;ffff810413cefb78&gt; 
 &lt;0&gt;Kernel panic - not syncing: Fatal exception

Comment 5 VK 2008-03-22 21:14:20 UTC

I tested with kernel 2.6.18-53.1.14.el5 x86_64 and I cannot trigger the panic
any more.

However, something is still being tickled in the kernel because several blank
lines are printed to the serial console every time I run hpacucli.

Comment 6 RHEL Program Management 2009-02-16 15:23:30 UTC

Updating PM score.

Comment 9 Tony Camuso 2009-11-11 16:48:53 UTC

Mike, Scott, any comments?

Comment 11 Mike Miller (OS Dev) 2010-04-29 18:47:50 UTC

I suspect the blank lines are noise where the driver is printing the geometry of the logical volumes. This should have addressed by commit:
983333cb0c445c56808502461bbb34876c63eb2b.

Comment 12 Tony Camuso 2011-01-10 10:27:33 UTC

According to the git log, this commit was backported into RHEL5 last April and should be 

commit a1fcf3f8fa7ef40ba3a829f781d639632391bc21
Author: Tomas Henzl <thenzl>
Date:   Mon Apr 26 12:19:34 2010 -0400

    [cciss] remove extraneous printk
    
This patch did not make it into RHEL5.5, but is in the RHEL5.6 code. 

Has anybody seen this problem in RHEL5.6?

Comment 13 Barry Donahue 2011-06-08 16:48:02 UTC

Verified with hpacucli-8.70-8.0 on a ProLiant DL380 G6.

Comment 14 Tony Camuso 2012-10-01 19:44:14 UTC

This was verified and should have been closed.

Note You need to log in before you can comment on or make changes to this bug.