Description of problem: The HP Command Line Array Configuration Utility (hpacucli) consistently causes a kernel panic on RHEL v5.1 (2.6.18-53.1.4.el5). This problem does not occur on RHEL v5.0 2.6.18-8* kernels. Version-Release number of selected component (if applicable): RHEL v5.1, 2.6.18-53.1.4.el5 Hardware: HP Proliant DL380 G5, 4GB RAM How reproducible: Consistently causes kernel panic after 2 or 3 command operations Steps to Reproduce: 1. Install hpacucli-7.85-18.linux.rpm available from http://h18023.www1.hp.com/support/files/server/us/download/27573.html 2. Run the command "hpacucli ctrl all show status". 3. Server will panic running command 2 or 3 times Actual results: kernel panic crash> bt PID: 4623 TASK: ffff81012fc08100 CPU: 0 COMMAND: ".hpacucli" #0 [ffff81011d4939d0] die at ffffffff80069681 #1 [ffff81011d493a00] do_invalid_op at ffffffff80069c37 #2 [ffff81011d493ac0] error_exit at ffffffff8005bde9 [exception RIP: __list_add+72] RIP: ffffffff80143640 RSP: ffff81011d493b78 RFLAGS: 00013282 RAX: 0000000000000058 RBX: ffffffff802fcf90 RCX: ffffffff802e5728 RDX: ffffffff802e5728 RSI: 0000000000000000 RDI: ffffffff802e5720 RBP: ffff81012ed26e90 R8: ffffffff802e5728 R9: 0000000000003046 R10: 0000000000000000 R11: 0000000000000180 R12: ffff81012ed26e90 R13: ffff81012ed26e70 R14: 00000000fffffffe R15: ffffffff802fcfa8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #3 [ffff81011d493b70] __list_add at ffffffff80143640 #4 [ffff81011d493b90] kobject_add at ffffffff801410a6 #5 [ffff81011d493bd0] register_disk at ffffffff800fcf4c #6 [ffff81011d493c00] add_disk at ffffffff80138d9a #7 [ffff81011d493c10] rebuild_lun_table at ffffffff880b49a7 #8 [ffff81011d493c90] cciss_ioctl at ffffffff880b4e21 #9 [ffff81011d493dc0] do_ioctl at ffffffff880b56b6 #10 [ffff81011d493de0] cciss_compat_ioctl at ffffffff880b5774 #11 [ffff81011d493ef0] compat_blkdev_ioctl at ffffffff80137ffc #12 [ffff81011d493f20] compat_sys_ioctl at ffffffff800edf00 #13 [ffff81011d493f80] sysenter_do_call at ffffffff8005f49b RIP: 00000000ffffe410 RSP: 00000000ffe201d8 RFLAGS: 00003296 RAX: ffffffffffffffda RBX: ffffffff8005f49b RCX: 000000000000420e RDX: 0000000000000000 RSI: 0000000000000015 RDI: 00000000ffe20220 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffe201d8 ORIG_RAX: 0000000000000036 CS: 0023 SS: 002b Expected results: # hpacucli ctrl all show status Smart Array P400 in Slot 1 Controller Status: OK Cache Status: OK Additional info:
I am seeing this too. I suggest that the priority be moved to high - as it is at the moment, EL5 cannot be used on HP hardware - which is a serious problem for me and I'm sufre for many other people. Redhat - do e expect a fix from you or from HP? Here may be something useful from /var/log/messages: Dec 12 13:22:54 capfs1 kernel: blocks= 2293469488 block_size= 512 Dec 12 13:22:55 capfs1 snmpd[3393]: Connection from UDP: [127.0.0.1]:32779 Dec 12 13:22:56 capfs1 kernel: blocks= 2293469488 block_size= 512 Dec 12 13:22:56 capfs1 kernel: kobject_add failed for cciss!c2d1 with -EEXIST, don't try to register things with the same name in the same directory. Dec 12 13:22:56 capfs1 kernel: Dec 12 13:22:56 capfs1 kernel: Call Trace: Dec 12 13:22:56 capfs1 kernel: [<ffffffff8014115f>] kobject_add+0x16e/0x199 Dec 12 13:22:56 capfs1 kernel: [<ffffffff80058ee5>] exact_lock+0x0/0x14 Dec 12 13:22:56 capfs1 kernel: [<ffffffff800fcf4c>] register_disk+0x43/0x199 Dec 12 13:22:56 capfs1 kernel: [<ffffffff80138d9a>] add_disk+0x34/0x3d Dec 12 13:22:56 capfs1 kernel: [<ffffffff880b49a7>] :cciss:rebuild_lun_table+0x48f/0x50f Dec 12 13:22:56 capfs1 kernel: [<ffffffff800c35ff>] zone_statistics+0x3e/0x6d Dec 12 13:22:56 capfs1 kernel: [<ffffffff880b4e21>] :cciss:cciss_ioctl+0x3fa/0xc65 Dec 12 13:22:56 capfs1 kernel: [<ffffffff80142a05>] snprintf+0x44/0x4c Dec 12 13:22:56 capfs1 kernel: [<ffffffff8002ae65>] flush_tlb_page+0xac/0xda Dec 12 13:22:56 capfs1 kernel: [<ffffffff80010b1d>] do_wp_page+0x246/0x67d Dec 12 13:22:56 capfs1 kernel: [<ffffffff8003a613>] d_lookup+0x1e/0x42 Dec 12 13:22:56 capfs1 kernel: [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39 Dec 12 13:22:56 capfs1 kernel: [<ffffffff880b5774>] :cciss:cciss_compat_ioctl+0xaf/0x25f Dec 12 13:22:56 capfs1 kernel: [<ffffffff80022c4a>] flush_tlb_others+0x84/0xbc Dec 12 13:22:56 capfs1 kernel: [<ffffffff80022c5f>] flush_tlb_others+0x99/0xbc Dec 12 13:22:56 capfs1 kernel: [<ffffffff80021d28>] __up_read+0x19/0x7f Dec 12 13:22:56 capfs1 kernel: [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d Dec 12 13:22:56 capfs1 kernel: [<ffffffff8000df59>] free_pages_and_swap_cache+0x73/0x8f Dec 12 13:22:56 capfs1 kernel: [<ffffffff80137ffc>] compat_blkdev_ioctl+0x4c/0x5f Dec 12 13:22:56 capfs1 kernel: [<ffffffff800edf00>] compat_sys_ioctl+0xc5/0x2b1 Dec 12 13:22:56 capfs1 kernel: [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67 Thanks, Paul
Hi, We're seeing the same problem here with DL580G4 server Jan 8 17:05:12 localhost kernel: kobject_add failed for cciss!c2d1 with -EEXIST, don't try to register things with the same name in the same directory. Jan 8 17:05:12 localhost kernel: Jan 8 17:05:12 localhost kernel: Call Trace: Jan 8 17:05:12 localhost kernel: [<ffffffff80141148>] kobject_add+0x16e/0x199 Jan 8 17:05:12 localhost kernel: [<ffffffff80058eec>] exact_lock+0x0/0x14 Jan 8 17:05:12 localhost kernel: [<ffffffff800fcf35>] register_disk+0x43/0x199 Jan 8 17:05:12 localhost kernel: [<ffffffff80138d83>] add_disk+0x34/0x3d Jan 8 17:05:12 localhost kernel: [<ffffffff880b49a7>] :cciss:rebuild_lun_table+0x48f/0x50f Jan 8 17:05:12 localhost kernel: [<ffffffff880b4e21>] :cciss:cciss_ioctl+0x3fa/0xc65 Jan 8 17:05:12 localhost kernel: [<ffffffff801429ee>] snprintf+0x44/0x4c Jan 8 17:05:12 localhost kernel: [<ffffffff8002ae6c>] flush_tlb_page+0xac/0xda Jan 8 17:05:12 localhost kernel: [<ffffffff80010b0c>] do_wp_page+0x246/0x67d Jan 8 17:05:12 localhost kernel: [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39 Jan 8 17:05:12 localhost kernel: [<ffffffff880b5774>] :cciss:cciss_compat_ioctl+0xaf/0x25f Jan 8 17:05:12 localhost kernel: [<ffffffff80021d2f>] __up_read+0x19/0x7f Jan 8 17:05:12 localhost kernel: [<ffffffff80064a9d>] do_page_fault+0x4eb/0x81d Jan 8 17:05:12 localhost kernel: [<ffffffff8000df48>] free_pages_and_swap_cache+0x73/0x8f Jan 8 17:05:12 localhost kernel: [<ffffffff80137fe5>] compat_blkdev_ioctl+0x4c/0x5f Jan 8 17:05:12 localhost kernel: [<ffffffff800edee9>] compat_sys_ioctl+0xc5/0x2b1 Jan 8 17:05:12 localhost kernel: [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67 redhat 5.1 kernel 2.6.18-53.el5
I am also seeing the same as above running running on DL360G5 redhat 5.1 kernel 2.6.18-53.el5PAE. This only seems to be an issue when using multiple logical volumes on the same cciss controller. No issues when only one array configured. The hp cpq_cciss driver isnt supported yet on 5.1 base kernel. Is HP aware of this issue? Need this fixed soon!!!!
Based on my interaction with HP support, this may be a dup of bug #429515 which is a copy of bug #426873 which I do not have access to. I've also seen this on an HP-DL-360-G5 with 6-drives and three logical volumes on the sole SmartArray controller running RHEL 5.1 x86_64. Serial console shows: list_add corruption. prev->next should be ffffffff802fcf90, but was ffff81042f434490 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lib/list_debug.c:31 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:09:00.0/0000:0a:00.0/0000 :0b:00.1/irq CPU 0 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler hidp rfcomm l2cap blueto oth sunrpc ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath dm_mod vide o sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parp ort_pc lp parport shpchp e1000 serio_raw bnx2 pcspkr ata_piix libata cciss sd_mo d scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 8874, comm: .hpacucli Not tainted 2.6.18-53.el5 #1 RIP: 0010:[<ffffffff80143629>] [<ffffffff80143629>] __list_add+0x48/0x68 RSP: 0018:ffff810413cefb78 EFLAGS: 00013282 RAX: 0000000000000058 RBX: ffffffff802fcf90 RCX: ffffffff802e5728 RDX: ffffffff802e5728 RSI: 0000000000000000 RDI: ffffffff802e5720 RBP: ffff81042f434490 R08: ffffffff802e5728 R09: 0000000000003046 R10: 0000000000000000 R11: 0000000000000180 R12: ffff81042f434c90 R13: ffff81042f434c70 R14: 00000000fffffffe R15: ffffffff802fcfa8 FS: 0000000000000000(0000) GS:ffffffff80396000(0063) knlGS:00000000f7fab6c0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000085f8004 CR3: 00000004190fd000 CR4: 00000000000006e0 Process .hpacucli (pid: 8874, threadinfo ffff810413cee000, task ffff81040b2dc7e0) Stack: ffff81042f434c70 ffff81042f434c00 ffff81042f9a0000 ffffffff8014108f ffffffff80058eec ffff81042f434c78 ffff81042f434c00 ffff81042f9a0000 ffff81042f434c70 0000000000000001 ffff81042f9a0000 ffffffff800fcf35 Call Trace: [<ffffffff8014108f>] kobject_add+0xb5/0x199 [<ffffffff80058eec>] exact_lock+0x0/0x14 [<ffffffff800fcf35>] register_disk+0x43/0x199 [<ffffffff80138d83>] add_disk+0x34/0x3d [<ffffffff880b49a7>] :cciss:rebuild_lun_table+0x48f/0x50f [<ffffffff800c35e8>] zone_statistics+0x3e/0x6d [<ffffffff880b4e21>] :cciss:cciss_ioctl+0x3fa/0xc65 [<ffffffff801429ee>] snprintf+0x44/0x4c [<ffffffff8002ae6c>] flush_tlb_page+0xac/0xda [<ffffffff880b56b6>] :cciss:do_ioctl+0x2a/0x39 [<ffffffff880b5774>] :cciss:cciss_compat_ioctl+0xaf/0x25f [<ffffffff8011d242>] inode_has_perm+0x56/0x63 [<ffffffff80021d2f>] __up_read+0x19/0x7f [<ffffffff8011d2e3>] file_has_perm+0x94/0xa3 [<ffffffff80137fe5>] compat_blkdev_ioctl+0x4c/0x5f [<ffffffff800edee9>] compat_sys_ioctl+0xc5/0x2b1 [<ffffffff8005f49b>] sysenter_do_call+0x1b/0x67 Code: 0f 0b 68 87 5c 29 80 c2 1f 00 4c 89 63 08 49 89 1c 24 4c 89 RIP [<ffffffff80143629>] __list_add+0x48/0x68 RSP <ffff810413cefb78> <0>Kernel panic - not syncing: Fatal exception
I tested with kernel 2.6.18-53.1.14.el5 x86_64 and I cannot trigger the panic any more. However, something is still being tickled in the kernel because several blank lines are printed to the serial console every time I run hpacucli.
Updating PM score.
Mike, Scott, any comments?
I suspect the blank lines are noise where the driver is printing the geometry of the logical volumes. This should have addressed by commit: 983333cb0c445c56808502461bbb34876c63eb2b.
According to the git log, this commit was backported into RHEL5 last April and should be commit a1fcf3f8fa7ef40ba3a829f781d639632391bc21 Author: Tomas Henzl <thenzl> Date: Mon Apr 26 12:19:34 2010 -0400 [cciss] remove extraneous printk This patch did not make it into RHEL5.5, but is in the RHEL5.6 code. Has anybody seen this problem in RHEL5.6?
Verified with hpacucli-8.70-8.0 on a ProLiant DL380 G6.
This was verified and should have been closed.