Bug 440654
| Summary: | Installation or reboot failures of js20 blade | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Vivek Goyal <vgoyal> | ||||||||
| Component: | kernel | Assignee: | Steve Best <sbest> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Martin Jenner <mjenner> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | 4.6 | CC: | atodorov, bpeters, bugproxy, duck, epollard, jburke, mgahagan, mschick, pbunyan, qcai, sghosh, syeghiay | ||||||||
| Target Milestone: | rc | Keywords: | TestBlocker | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | powerpc | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2010-09-14 03:54:53 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 461304 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Vivek Goyal
2008-04-04 12:40:39 UTC
re-assigning to Brad, I will be glad to help out but Power is his hardware. Brad, let me know if you need something on this. Looking through the watchdog logs, this appears to be caused by a simple HDD failure. Note the following: (...) Checking root filesystem [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/VolGroup00/LogVol00 /dev/VolGroup00/LogVol00 contains a file system with errors, check forced. /dev/VolGroup00/LogVol00: Inode 4866129 has a bad extended attribute block 9732616. /dev/VolGroup00/LogVol00: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) [FAILED] *** An error occurred during the file system check. *** Dropping you to a shell; the system will reboot *** when you leave the shell. (...) I think we need to replace the disk, and see if this problem goes away. Second opinions would be welcome -Brad Created attachment 301020 [details]
Watchdog Log file
Show's apparent HDD failure
Created attachment 301028 [details]
Installer log
Brad,
This has happened again. With the same signature.
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:04.1
AMD8111: chipset revision 3
AMD8111: 0000:00:04.1 (rev 03) UDMA133 controller
AMD8111: 100% native mode on irq 32
ide0: BM-DMA at 0x7c00-0x7c07, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0x7c08-0x7c0f, BIOS settings: hdc:pio, hdd:pio
hda: TOSHIBA MK6026GAXB, ATA DISK drive
Using cfq io scheduler
ide0 at 0x7400-0x7407,0x6c02 on irq 32
hda: max request size: 128KiB
hda: 117210240 sectors (60011 MB), CHS=65535/16/63, UDMA(33)
hda: unknown partition table <---------Note first sign of failure
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 262144 (order: 10, 4194304 bytes)
TCP: Hash tables configured (established 262144 bind 262144)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
Freeing unused kernel memory: 216k freed
Red Hat nash version 4.2.1.13 starting
Mounted /proc filesystem
Mounting sysfs
Creating /dev
Starting udev
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading scsi_transport_fc.ko module
Loading qla2xxx.ko module
QLogic Fibre Channel HBA Driver
Loading qla2300.ko module
qla2300 0000:01:01.0: Found an ISP2312, irq 40, iobase 0xe000000080000000
qla2300 0000:01:01.0: Configuring PCI space...
qla2300 0000:01:01.0: Configure NVRAM parameters...
qla2300 0000:01:01.0: Verifying loaded RISC code...
qla2300 0000:01:01.0: Extended memory detected (512 KB)...
qla2300 0000:01:01.0: Resizing request queue depth (2048 -> 4096)...
qla2300 0000:01:01.0: Waiting for LIP to complete...
qla2300 0000:01:01.0: LOOP UP detected (2 Gbps).
qla2300 0000:01:01.0: Topology - (F_Port), Host Loop address 0xffff
scsi0 : qla2xxx
qla2300 0000:01:01.0:
QLogic Fibre Channel HBA Driver: 8.01.07-d4-rhel4.7-01
QLogic IBM FCEC -
ISP2312: PCI-X (133 MHz) @ 0000:01:01.0 hdma-, host#=0, fw=3.03.20 IPX
qla2300 0000:01:01.1: Found an ISP2312, irq 41, iobase 0xe000000080001000
qla2300 0000:01:01.1: Configuring PCI space...
qla2300 0000:01:01.1: Configure NVRAM parameters...
qla2300 0000:01:01.1: Verifying loaded RISC code...
qla2300 0000:01:01.1: Extended memory detected (512 KB)...
qla2300 0000:01:01.1: Resizing request queue depth (2048 -> 4096)...
qla2300 0000:01:01.1: Waiting for LIP to complete...
qla2300 0000:01:01.1: Cable is unplugged...
scsi1 : qla2xxx
qla2300 0000:01:01.1:
QLogic Fibre Channel HBA Driver: 8.01.07-d4-rhel4.7-01
QLogic IBM FCEC -
ISP2312: PCI-X (133 MHz) @ 0000:01:01.1 hdma-, host#=1, fw=3.03.20 IPX
Loading dm-mod.ko module
device-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel
Loading jbd.ko module
Loading ext3.ko module
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
Making device-mapper control node
Scanning logical volumes
Reading all physical volumes. This may take a while...
No volume groups found
Activating logical volumes
Volume group "VolGroup00" not found
ERROR: /bin/lvm exited abnormally! (pid 470)
Creating root device
Mounting root filesystem
mount: error 6 mounting ext3
mount: error 2 mounting none
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2579448
If you believe that this is a hardware issue can you please have the hardware
sent to the RDU lab.
We have hit RHEL4.7 kernel freeze and we are not sure if this is hardware or
software.
------- Comment From bpeters.com 2008-05-21 17:03 EDT------- Jeff, could you provide details as to the system you saw this on? Was it the same one Vivek saw this problem? ------- Comment From bpeters.com 2008-06-19 11:57 EDT------- My best guess is that this is a simple HDD failure. Tracking down the failing disk is a hands-on job, but should be reasonably simple given light-path diagnostics. I recommend you contact your local RDU-equivalent of the Westford engineering. If they refuse to support this box, then send and email to myself and Mark Wisner (onsite and may be able to assist). Updating PM score. Subhendu, Brad Peters is no longer here at Red Hat. He was the onsite partner for IBM but was replaced by Ameet Paranjape <aparanja> Jeff Did this turn out to be hardware? If so can we go ahead and close this bug? Created attachment 339558 [details]
Panic with 2.6.9-88.EL
Switching to new root
exec of init (/bin/sh) failed!!!: 5
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!
I've seen the lvm scan fail intermittently on this JS20: <snip...> Reading all physical volumes. This may take a while... Activating logical volumes Volume group "VolGroup00" not found ERROR: /bin/lvm exited abnormally! (pid 471) Creating root device Mounting root filesystem mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root switchroot: mount failed: 22 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! A manual filesystem check passed and I don't see any hardware complaints in the kernel logs either, so I thought I would try updating the blade FW to the latest level (it was more than 2 years old): http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-55553&brandind=5000020 I'm running a cron script to see if the boot failure recreates. Also, I am not sure what effect the old RAID configuration has on the disk layout, but I am continuing to investigate that. Ameet, It happened again on last nights kernel: 2.6.9-89.EL. This time is just hung. http://rhts.redhat.com/testlogs/55163/185171/1548457/console.txt Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active Creating root device Mounting root filesystem kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Switching to new root INIT: version 2.85 booting INIT: No inittab file found All the other systems seem to be fine. Any way we can pull this thing from RHTS until we get some answers to what is going wring? Thanks, Jeff Reassigning to Steve Best, the new IBM on-site partner. ------- Comment From mjr.ibm.com 2009-12-01 17:24 EDT------- It's closed on this side, does it need to be re-opened? CLosing - NOTABUG - HDD failure |