Bug 417661 - hpacucli cause kernel panic
hpacucli cause kernel panic
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
high Severity high
: rc
: ---
Assigned To: Tony Camuso
Red Hat Kernel QE team
:
Depends On:
Blocks: 533192 KernelPrio5.3 483701
  Show dependency treegraph
 
Reported: 2007-12-09 23:34 EST by Con Tassios
Modified: 2012-10-01 15:44 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-01 15:44:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 426873 None None None Never
Red Hat Bugzilla 429515 None None None Never

  None (edit)
Description Con Tassios 2007-12-09 23:34:58 EST
Description of problem:

The HP Command Line Array Configuration Utility (hpacucli) consistently causes a
kernel panic on RHEL v5.1 (2.6.18-53.1.4.el5).  This problem does not occur on
RHEL v5.0 2.6.18-8* kernels. 

Version-Release number of selected component (if applicable):

RHEL v5.1, 2.6.18-53.1.4.el5
Hardware: HP Proliant DL380 G5, 4GB RAM

How reproducible:

Consistently causes kernel panic after 2 or 3 command operations

Steps to Reproduce:

1. Install hpacucli-7.85-18.linux.rpm available from
http://h18023.www1.hp.com/support/files/server/us/download/27573.html

2. Run the command "hpacucli ctrl all show status".

3. Server will panic running command 2 or 3 times
  
Actual results:

kernel panic

crash> bt
PID: 4623   TASK: ffff81012fc08100  CPU: 0   COMMAND: ".hpacucli"
 #0 [ffff81011d4939d0] die at ffffffff80069681
 #1 [ffff81011d493a00] do_invalid_op at ffffffff80069c37
 #2 [ffff81011d493ac0] error_exit at ffffffff8005bde9
    [exception RIP: __list_add+72]
    RIP: ffffffff80143640  RSP: ffff81011d493b78  RFLAGS: 00013282
    RAX: 0000000000000058  RBX: ffffffff802fcf90  RCX: ffffffff802e5728
    RDX: ffffffff802e5728  RSI: 0000000000000000  RDI: ffffffff802e5720
    RBP: ffff81012ed26e90   R8: ffffffff802e5728   R9: 0000000000003046
    R10: 0000000000000000  R11: 0000000000000180  R12: ffff81012ed26e90
    R13: ffff81012ed26e70  R14: 00000000fffffffe  R15: ffffffff802fcfa8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #3 [ffff81011d493b70] __list_add at ffffffff80143640
 #4 [ffff81011d493b90] kobject_add at ffffffff801410a6
 #5 [ffff81011d493bd0] register_disk at ffffffff800fcf4c
 #6 [ffff81011d493c00] add_disk at ffffffff80138d9a
 #7 [ffff81011d493c10] rebuild_lun_table at ffffffff880b49a7
 #8 [ffff81011d493c90] cciss_ioctl at ffffffff880b4e21
 #9 [ffff81011d493dc0] do_ioctl at ffffffff880b56b6
#10 [ffff81011d493de0] cciss_compat_ioctl at ffffffff880b5774
#11 [ffff81011d493ef0] compat_blkdev_ioctl at ffffffff80137ffc
#12 [ffff81011d493f20] compat_sys_ioctl at ffffffff800edf00
#13 [ffff81011d493f80] sysenter_do_call at ffffffff8005f49b
    RIP: 00000000ffffe410  RSP: 00000000ffe201d8  RFLAGS: 00003296
    RAX: ffffffffffffffda  RBX: ffffffff8005f49b  RCX: 000000000000420e
    RDX: 0000000000000000  RSI: 0000000000000015  RDI: 00000000ffe20220
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 00000000ffe201d8
    ORIG_RAX: 0000000000000036  CS: 0023  SS: 002b


Expected results:

# hpacucli ctrl all show status

Smart Array P400 in Slot 1
   Controller Status: OK
   Cache Status: OK


Additional info:
Comment 1 Paul D. Mitcheson 2008-01-05 10:26:32 EST
I am seeing this too.

I suggest that the priority be moved to high - as it is at the moment, EL5
cannot be used on HP hardware - which is a serious problem for me and I'm sufre
for many other people.

Redhat - do e expect a fix from you or from HP?

Here may be something useful from /var/log/messages:

Dec 12 13:22:54 capfs1 kernel:       blocks= 2293469488 block_size= 512
Dec 12 13:22:55 capfs1 snmpd[3393]: Connection from UDP: [127.0.0.1]:32779
Dec 12 13:22:56 capfs1 kernel:       blocks= 2293469488 block_size= 512
Dec 12 13:22:56 capfs1 kernel: kobject_add failed for cciss!c2d1 with -EEXIST,
don't try to register things with the same name in the same directory.
Dec 12 13:22:56 capfs1 kernel:
Dec 12 13:22:56 capfs1 kernel: Call Trace:
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8014115f>] kobject_add+0x16e/0x199
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80058ee5>] exact_lock+0x0/0x14
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff800fcf4c>] register_disk+0x43/0x199
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80138d9a>] add_disk+0x34/0x3d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b49a7>]
:cciss:rebuild_lun_table+0x48f/0x50f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff800c35ff>] zone_statistics+0x3e/0x6d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b4e21>] :cciss:cciss_ioctl+0x3fa/0xc65
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80142a05>] snprintf+0x44/0x4c
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8002ae65>] flush_tlb_page+0xac/0xda
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80010b1d>] do_wp_page+0x246/0x67d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8003a613>] d_lookup+0x1e/0x42
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff880b5774>]
:cciss:cciss_compat_ioctl+0xaf/0x25f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80022c4a>] flush_tlb_others+0x84/0xbc
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80022c5f>] flush_tlb_others+0x99/0xbc
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80021d28>] __up_read+0x19/0x7f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8000df59>]
free_pages_and_swap_cache+0x73/0x8f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff80137ffc>] compat_blkdev_ioctl+0x4c/0x5f
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff800edf00>] compat_sys_ioctl+0xc5/0x2b1
Dec 12 13:22:56 capfs1 kernel:  [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67


Thanks,

Paul
Comment 2 David Elliott 2008-01-08 12:46:14 EST
Hi,

We're seeing the same problem here with DL580G4 server

Jan  8 17:05:12 localhost kernel: kobject_add failed for cciss!c2d1 with
-EEXIST, don't try to register things with the same name in the same directory.
Jan  8 17:05:12 localhost kernel:
Jan  8 17:05:12 localhost kernel: Call Trace:
Jan  8 17:05:12 localhost kernel:  [<ffffffff80141148>] kobject_add+0x16e/0x199
Jan  8 17:05:12 localhost kernel:  [<ffffffff80058eec>] exact_lock+0x0/0x14
Jan  8 17:05:12 localhost kernel:  [<ffffffff800fcf35>] register_disk+0x43/0x199
Jan  8 17:05:12 localhost kernel:  [<ffffffff80138d83>] add_disk+0x34/0x3d
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b49a7>]
:cciss:rebuild_lun_table+0x48f/0x50f
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b4e21>]
:cciss:cciss_ioctl+0x3fa/0xc65
Jan  8 17:05:12 localhost kernel:  [<ffffffff801429ee>] snprintf+0x44/0x4c
Jan  8 17:05:12 localhost kernel:  [<ffffffff8002ae6c>] flush_tlb_page+0xac/0xda
Jan  8 17:05:12 localhost kernel:  [<ffffffff80010b0c>] do_wp_page+0x246/0x67d
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39
Jan  8 17:05:12 localhost kernel:  [<ffffffff880b5774>]
:cciss:cciss_compat_ioctl+0xaf/0x25f
Jan  8 17:05:12 localhost kernel:  [<ffffffff80021d2f>] __up_read+0x19/0x7f
Jan  8 17:05:12 localhost kernel:  [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d
Jan  8 17:05:12 localhost kernel:  [<ffffffff8000df48>]
free_pages_and_swap_cache+0x73/0x8f
Jan  8 17:05:12 localhost kernel:  [<ffffffff80137fe5>]
compat_blkdev_ioctl+0x4c/0x5f
Jan  8 17:05:12 localhost kernel:  [<ffffffff800edee9>] compat_sys_ioctl+0xc5/0x2b1
Jan  8 17:05:12 localhost kernel:  [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67

redhat 5.1 kernel 2.6.18-53.el5


Comment 3 Tony Procaccini 2008-01-14 17:56:27 EST
I am also seeing the same as above running running on DL360G5 redhat 5.1 kernel 
2.6.18-53.el5PAE. This only seems to be an issue when using multiple logical 
volumes on the same cciss controller.  No issues when only one array 
configured.  The hp cpq_cciss driver isnt supported yet on 5.1 base kernel.  Is 
HP aware of this issue?  Need this fixed soon!!!!
Comment 4 Vinod Kutty 2008-03-22 17:10:40 EDT
Based on my interaction with HP support, this may be a dup of bug #429515 which
is a copy of bug #426873 which I do not have access to.

I've also seen this on an HP-DL-360-G5 with 6-drives and three logical volumes
on the sole SmartArray controller running RHEL 5.1 x86_64. Serial console shows:

list_add corruption. prev-&gt;next should be ffffffff802fcf90, but 
was ffff81042f434490 
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lib/list_debug.c:31 
invalid opcode: 0000 [1] SMP 
last sysfs file: 
/devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:00.0/0000 
:0b:00.1/irq 
CPU 0 
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler hidp rfcomm l2cap 
blueto 
oth sunrpc ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath dm_mod 
vide 
o sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac 
parp 
ort_pc lp parport shpchp e1000 serio_raw bnx2 pcspkr ata_piix libata cciss 
sd_mo 
d scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd 
Pid: 8874, comm: .hpacucli Not tainted 2.6.18-53.el5 #1 
RIP: 0010:[&lt;ffffffff80143629&gt;]  [&lt;ffffffff80143629&gt;]
__list_add+0x48/0x68 
RSP: 0018:ffff810413cefb78  EFLAGS: 00013282 
RAX: 0000000000000058 RBX: ffffffff802fcf90 RCX: ffffffff802e5728 
RDX: ffffffff802e5728 RSI: 0000000000000000 RDI: ffffffff802e5720 
RBP: ffff81042f434490 R08: ffffffff802e5728 R09: 0000000000003046 
R10: 0000000000000000 R11: 0000000000000180 R12: ffff81042f434c90 
R13: ffff81042f434c70 R14: 00000000fffffffe R15: ffffffff802fcfa8 
FS:  0000000000000000(0000) GS:ffffffff80396000(0063) knlGS:00000000f7fab6c0 
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b 
CR2: 00000000085f8004 CR3: 00000004190fd000 CR4: 00000000000006e0           
Process .hpacucli (pid: 8874, threadinfo ffff810413cee000, task 
ffff81040b2dc7e0) 
Stack:  ffff81042f434c70 ffff81042f434c00 ffff81042f9a0000 ffffffff8014108f 
 ffffffff80058eec ffff81042f434c78 ffff81042f434c00 ffff81042f9a0000 
 ffff81042f434c70 0000000000000001 ffff81042f9a0000 ffffffff800fcf35 
Call Trace: 
 [&lt;ffffffff8014108f&gt;] kobject_add+0xb5/0x199 
 [&lt;ffffffff80058eec&gt;] exact_lock+0x0/0x14 
 [&lt;ffffffff800fcf35&gt;] register_disk+0x43/0x199 
 [&lt;ffffffff80138d83&gt;] add_disk+0x34/0x3d 
 [&lt;ffffffff880b49a7&gt;] :cciss:rebuild_lun_table+0x48f/0x50f 
 [&lt;ffffffff800c35e8&gt;] zone_statistics+0x3e/0x6d 
 [&lt;ffffffff880b4e21&gt;] :cciss:cciss_ioctl+0x3fa/0xc65 
 [&lt;ffffffff801429ee&gt;] snprintf+0x44/0x4c 
 [&lt;ffffffff8002ae6c&gt;] flush_tlb_page+0xac/0xda 
 [&lt;ffffffff880b56b6&gt;] :cciss:do_ioctl+0x2a/0x39 
 [&lt;ffffffff880b5774&gt;] :cciss:cciss_compat_ioctl+0xaf/0x25f 
 [&lt;ffffffff8011d242&gt;] inode_has_perm+0x56/0x63 
 [&lt;ffffffff80021d2f&gt;] __up_read+0x19/0x7f 
 [&lt;ffffffff8011d2e3&gt;] file_has_perm+0x94/0xa3 
 [&lt;ffffffff80137fe5&gt;] compat_blkdev_ioctl+0x4c/0x5f 
 [&lt;ffffffff800edee9&gt;] compat_sys_ioctl+0xc5/0x2b1 
 [&lt;ffffffff8005f49b&gt;] sysenter_do_call+0x1b/0x67 
 
Code: 0f 0b 68 87 5c 29 80 c2 1f 00 4c 89 63 08 49 89 1c 24 4c 89 
RIP  [&lt;ffffffff80143629&gt;] __list_add+0x48/0x68 
 RSP &lt;ffff810413cefb78&gt; 
 &lt;0&gt;Kernel panic - not syncing: Fatal exception
Comment 5 Vinod Kutty 2008-03-22 17:14:20 EDT
I tested with kernel 2.6.18-53.1.14.el5 x86_64 and I cannot trigger the panic
any more.

However, something is still being tickled in the kernel because several blank
lines are printed to the serial console every time I run hpacucli.
Comment 6 RHEL Product and Program Management 2009-02-16 10:23:30 EST
Updating PM score.
Comment 9 Tony Camuso 2009-11-11 11:48:53 EST
Mike, Scott, any comments?
Comment 11 Mike Miller (OS Dev) 2010-04-29 14:47:50 EDT
I suspect the blank lines are noise where the driver is printing the geometry of the logical volumes. This should have addressed by commit:
983333cb0c445c56808502461bbb34876c63eb2b.
Comment 12 Tony Camuso 2011-01-10 05:27:33 EST
According to the git log, this commit was backported into RHEL5 last April and should be 

commit a1fcf3f8fa7ef40ba3a829f781d639632391bc21
Author: Tomas Henzl <thenzl@redhat.com>
Date:   Mon Apr 26 12:19:34 2010 -0400

    [cciss] remove extraneous printk
    
This patch did not make it into RHEL5.5, but is in the RHEL5.6 code. 

Has anybody seen this problem in RHEL5.6?
Comment 13 Barry Donahue 2011-06-08 12:48:02 EDT
Verified with hpacucli-8.70-8.0 on a ProLiant DL380 G6.
Comment 14 Tony Camuso 2012-10-01 15:44:14 EDT
This was verified and should have been closed.

Note You need to log in before you can comment on or make changes to this bug.