Description of problem: This js20 blade is showing erratic behavior. Sometimes installation of RHEL4U6 as well as RHEL4U7 will fail. If installation is successful then it will fail to reboot in subsequent reboots. We see the issue only on this particular machine. ibm-js20-1.test.redhat.com Problem happens with mouting the root file systems. - Once it said partition table is corrupted. - Now it is complaining about file system inconsistency Some failure logs links. http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2524169 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2506636 Version-Release number of selected component (if applicable): RHEL4U6 How reproducible: Pretty frequent Steps to Reproduce: 1. 2. 3. Actual results: reboot of the mahcine after installation fails. Expected results: Additional info: This might very well be an some bad hardware issue but we don't know at this point of time.
re-assigning to Brad, I will be glad to help out but Power is his hardware. Brad, let me know if you need something on this.
Looking through the watchdog logs, this appears to be caused by a simple HDD failure. Note the following: (...) Checking root filesystem [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/VolGroup00/LogVol00 /dev/VolGroup00/LogVol00 contains a file system with errors, check forced. /dev/VolGroup00/LogVol00: Inode 4866129 has a bad extended attribute block 9732616. /dev/VolGroup00/LogVol00: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) [FAILED] *** An error occurred during the file system check. *** Dropping you to a shell; the system will reboot *** when you leave the shell. (...) I think we need to replace the disk, and see if this problem goes away. Second opinions would be welcome -Brad
Created attachment 301020 [details] Watchdog Log file Show's apparent HDD failure
Created attachment 301028 [details] Installer log
Brad, This has happened again. With the same signature. Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx AMD8111: IDE controller at PCI slot 0000:00:04.1 AMD8111: chipset revision 3 AMD8111: 0000:00:04.1 (rev 03) UDMA133 controller AMD8111: 100% native mode on irq 32 ide0: BM-DMA at 0x7c00-0x7c07, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0x7c08-0x7c0f, BIOS settings: hdc:pio, hdd:pio hda: TOSHIBA MK6026GAXB, ATA DISK drive Using cfq io scheduler ide0 at 0x7400-0x7407,0x6c02 on irq 32 hda: max request size: 128KiB hda: 117210240 sectors (60011 MB), CHS=65535/16/63, UDMA(33) hda: unknown partition table <---------Note first sign of failure ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.0:USB HID core driver mice: PS/2 mouse device common for all mice md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 262144 (order: 10, 4194304 bytes) TCP: Hash tables configured (established 262144 bind 262144) Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 Freeing unused kernel memory: 216k freed Red Hat nash version 4.2.1.13 starting Mounted /proc filesystem Mounting sysfs Creating /dev Starting udev Loading scsi_mod.ko module SCSI subsystem initialized Loading sd_mod.ko module Loading scsi_transport_fc.ko module Loading qla2xxx.ko module QLogic Fibre Channel HBA Driver Loading qla2300.ko module qla2300 0000:01:01.0: Found an ISP2312, irq 40, iobase 0xe000000080000000 qla2300 0000:01:01.0: Configuring PCI space... qla2300 0000:01:01.0: Configure NVRAM parameters... qla2300 0000:01:01.0: Verifying loaded RISC code... qla2300 0000:01:01.0: Extended memory detected (512 KB)... qla2300 0000:01:01.0: Resizing request queue depth (2048 -> 4096)... qla2300 0000:01:01.0: Waiting for LIP to complete... qla2300 0000:01:01.0: LOOP UP detected (2 Gbps). qla2300 0000:01:01.0: Topology - (F_Port), Host Loop address 0xffff scsi0 : qla2xxx qla2300 0000:01:01.0: QLogic Fibre Channel HBA Driver: 8.01.07-d4-rhel4.7-01 QLogic IBM FCEC - ISP2312: PCI-X (133 MHz) @ 0000:01:01.0 hdma-, host#=0, fw=3.03.20 IPX qla2300 0000:01:01.1: Found an ISP2312, irq 41, iobase 0xe000000080001000 qla2300 0000:01:01.1: Configuring PCI space... qla2300 0000:01:01.1: Configure NVRAM parameters... qla2300 0000:01:01.1: Verifying loaded RISC code... qla2300 0000:01:01.1: Extended memory detected (512 KB)... qla2300 0000:01:01.1: Resizing request queue depth (2048 -> 4096)... qla2300 0000:01:01.1: Waiting for LIP to complete... qla2300 0000:01:01.1: Cable is unplugged... scsi1 : qla2xxx qla2300 0000:01:01.1: QLogic Fibre Channel HBA Driver: 8.01.07-d4-rhel4.7-01 QLogic IBM FCEC - ISP2312: PCI-X (133 MHz) @ 0000:01:01.1 hdma-, host#=1, fw=3.03.20 IPX Loading dm-mod.ko module device-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel Loading jbd.ko module Loading ext3.ko module Loading dm-mirror.ko module Loading dm-zero.ko module Loading dm-snapshot.ko module Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... No volume groups found Activating logical volumes Volume group "VolGroup00" not found ERROR: /bin/lvm exited abnormally! (pid 470) Creating root device Mounting root filesystem mount: error 6 mounting ext3 mount: error 2 mounting none http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2579448 If you believe that this is a hardware issue can you please have the hardware sent to the RDU lab. We have hit RHEL4.7 kernel freeze and we are not sure if this is hardware or software.
------- Comment From bpeters.com 2008-05-21 17:03 EDT------- Jeff, could you provide details as to the system you saw this on? Was it the same one Vivek saw this problem?
------- Comment From bpeters.com 2008-06-19 11:57 EDT------- My best guess is that this is a simple HDD failure. Tracking down the failing disk is a hands-on job, but should be reasonably simple given light-path diagnostics. I recommend you contact your local RDU-equivalent of the Westford engineering. If they refuse to support this box, then send and email to myself and Mark Wisner (onsite and may be able to assist).
Updating PM score.
Subhendu, Brad Peters is no longer here at Red Hat. He was the onsite partner for IBM but was replaced by Ameet Paranjape <aparanja> Jeff
Did this turn out to be hardware? If so can we go ahead and close this bug?
Created attachment 339558 [details] Panic with 2.6.9-88.EL Switching to new root exec of init (/bin/sh) failed!!!: 5 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init!
I've seen the lvm scan fail intermittently on this JS20: <snip...> Reading all physical volumes. This may take a while... Activating logical volumes Volume group "VolGroup00" not found ERROR: /bin/lvm exited abnormally! (pid 471) Creating root device Mounting root filesystem mount: error 6 mounting ext3 mount: error 2 mounting none Switching to new root switchroot: mount failed: 22 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! A manual filesystem check passed and I don't see any hardware complaints in the kernel logs either, so I thought I would try updating the blade FW to the latest level (it was more than 2 years old): http://www-947.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-55553&brandind=5000020 I'm running a cron script to see if the boot failure recreates. Also, I am not sure what effect the old RAID configuration has on the disk layout, but I am continuing to investigate that.
Ameet, It happened again on last nights kernel: 2.6.9-89.EL. This time is just hung. http://rhts.redhat.com/testlogs/55163/185171/1548457/console.txt Activating logical volumes 2 logical volume(s) in volume group "VolGroup00" now active Creating root device Mounting root filesystem kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Switching to new root INIT: version 2.85 booting INIT: No inittab file found All the other systems seem to be fine. Any way we can pull this thing from RHTS until we get some answers to what is going wring? Thanks, Jeff
Reassigning to Steve Best, the new IBM on-site partner.
------- Comment From mjr.ibm.com 2009-12-01 17:24 EDT------- It's closed on this side, does it need to be re-opened?
CLosing - NOTABUG - HDD failure