Bug 163437
Summary: | Using new 2.6.12-1.1372_FC3smp kernel causes kernel panic while booting | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Philip Pearson <prgp1976> |
Component: | mkinitrd | Assignee: | Peter Jones <pjones> |
Status: | CLOSED ERRATA | QA Contact: | David Lawrence <dkl> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | bugzilla, cmarco, deron.meranda, dgunchev, d.lesca, fche, ihok, jansen, j, kerryn.wood, lbyrd, linux_forum, luke, mail, matt, mattdm, matthew, menscher, mikes, mrsam, nathan-redhatbugzilla, nnc, pcoene1, prigault, smalenfant, trevor, ttaylor, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-07-31 12:30:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 161059 |
Description
Philip Pearson
2005-07-16 15:54:13 UTC
I see precisely the same behavior with the x86_64 version of kernel-smp-2.6.12-1.1372_FC3. Immediately after the output 'Red Hat nash version 4.1.18 starting' I see errors when various modules are inserted. The specific modules vary from boot to boot (modules must be loaded asynchronously), but I've seen dm-snapshot.ko, dm-mirror.ko, dm-zero.ko, ext3.ko, sata_nv.ko. Like the original reporter, the uniprocessor 2.6.12 kernel works fine, as does the previous 2.6.11-1.35_FC3 kernel (smp, x86_64). Looks like a bad build/packaging to me. I agree that this looks like a bad build. The x86_64 non-smp kernel boots fine, the smp kernel panics: Bootdata ok (command line is ro root=/dev/md1 console=ttyS0,9600 console=tty0) Linux version 2.6.12-1.1372_FC3smp (bhcompile.redhat.com) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri Jul 15 01:08:54 EDT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009b400 (usable) BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d6000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007ff70000 (usable) BIOS-e820: 000000007ff70000 - 000000007ff76000 (ACPI data) BIOS-e820: 000000007ff76000 - 000000007ff80000 (ACPI NVS) BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) Scanning NUMA topology in Northbridge 24 Number of nodes 2 Node 0 using interleaving mode 1/0 No NUMA configuration found Faking a node at 0000000000000000-000000007ff70000 Bootmem setup node 0 0000000000000000-000000007ff70000 ACPI: PM-Timer IO Port: 0x8008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:5 APIC version 16 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 15:5 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x03] address[0xfe500000] gsi_base[24]) IOAPIC[1]: apic_id 3, version 17, address 0xfe500000, GSI 24-27 ACPI: IOAPIC (id[0x04] address[0xfe501000] gsi_base[28]) IOAPIC[2]: apic_id 4, version 17, address 0xfe501000, GSI 28-31 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 80000000 (gap: 80000000:7ec00000) Checking aperture... CPU 0: aperture @ 0 size 32 MB No AGP bridge found Built 1 zonelists Kernel command line: ro root=/dev/md1 console=ttyS0,9600 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 131072 bytes) time.c: Using 3.579545 MHz PM timer. time.c: Detected 1403.221 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Memory: 2054124k/2096576k available (2378k kernel code, 0k reserved, 1292k data, 228k init) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) Using local APIC timer interrupts. Detected 12.528 MHz APIC timer. Booting processor 1/1 rip 6000 rsp ffff81007ff33f58 Initializing CPU#1 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) AMD Opteron(tm) Processor 240 stepping 01 CPU 1: Syncing TSC to CPU 0. Brought up 2 CPUs Disabling vsyscall due to use of PM timer time.c: Using PM based timekeeping. testing NMI watchdog ... OK. checking if image is initramfs... it is CPU 1: synchronized TSC with CPU 0 (last diff -27 cycles, maxerr 712 cycles) NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20050309 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Link [LNKA] (IRQs 3 5 10 *11) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 *5 10 11) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 5 *10 11) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 5 10 *11) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 12 devices usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report PCI-DMA: Disabling IOMMU. pnp: 00:04: ioport range 0x4d0-0x4d1 has been reserved pnp: 00:04: ioport range 0x1100-0x117f has been reserved pnp: 00:04: ioport range 0x1180-0x11ff has been reserved IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ audit: initializing netlink socket (disabled) audit(1121529062.434:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) SELinux: Registering netfilter hooks Initializing Cryptographic API ksign: Installing public key data Loading keyring - Added public key 7C7615FA604FC717 - User ID: Red Hat, Inc. (Kernel Module GPG key) PCI: MSI quirk detected. pci_msi_quirk set. PCI: MSI quirk detected. pci_msi_quirk set. pci_hotplug: PCI Hot Plug PCI Core version: 0.5 Real Time Clock Driver v1.12 Linux agpgart interface v0.101 (c) Dave Jones PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 76 ports, IRQ sharing enabled ÿttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx AMD8111: IDE controller at PCI slot 0000:00:07.1 AMD8111: chipset revision 3 AMD8111: not 100% native mode: will probe irqs later AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:DMA, hdd:pio hda: SAMSUNG SP1213N, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: LITE-ON DVDRW LDW-811S, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 1024KiB hda: 234493056 sectors (120060 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100) hda: cache flushes supported hda: hda1 hdc: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.01:USB HID core driver mice: PS/2 mouse device common for all mice md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27 NET: Registered protocol family 2 IP: routing cache hash table of 8192 buckets, 128Kbytes TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 powernow-k8: Power state transitions not supported powernow-k8: Power state transitions not supported ACPI wakeup devices: TP2P USB0 USB1 G0PA <7>Losing some ticks... checking if CPU frequency changed. LAN0 LAN1 G0PB ACPI: (supports S0 S1 S4 S5) Freeing unused kernel memory: 228k freed input: AT Translated Set 2 keyboard on isa0060/serio0 SCSI subsystem initialized sd_mod: Unknown symbol scsi_device_get sd_mod: Unknown symbol scsi_wait_req sd_mod: Unknown symbol scsi_get_sense_info_fld sd_mod: Unknown symbol scsicam_bios_param sd_mod: Unknown symbol scsi_command_normalize_sense sd_mod: Unknown symbol scsi_test_unit_ready sd_mod: Unknown symbol scsi_block_when_processing_errors sd_mod: Unknown symbol scsi_register_driver sd_mod: Unknown symbol scsi_ioctl sd_mod: Unknown symbol scsi_nonblockable_ioctl sd_mod: Unknown symbol scsi_device_put sd_mod: Unknown symbol scsi_request_normalize_sense sd_mod: Unknown symbol __scsi_mode_sense sd_mod: Unknown symbol scsi_logging_level sd_mod: Unknown symbol scsi_print_req_sense sd_mod: Unknown symbol scsi_release_request sd_mod: Unknown symbol scsi_print_sense sd_mod: Unknown symbol scsi_allocate_request sd_mod: Unknown symbol scsi_io_completion sd_mod: Unknown symbol scsi_set_medium_removal ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 29 (level, low) -> IRQ 169 ACPI: PCI Interrupt 0000:03:01.1[B] -> GSI 30 (level, low) -> IRQ 177 scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11 <Adaptec 29320 Ultra320 SCSI adapter> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs md: raid1 personality registered as nr 3 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Kernel panic - not syncing: Attempted to kill init! Call Trace:<ffffffff80138164>{panic+196} <ffffffff8034f811>{__down_read+49} <ffffffff80207ef1>{__up_read+33} <ffffffff8013ae53>{do_exit+99} <ffffffff80207db1>{__up_write+49} <ffffffff8013ba8f>{do_group_exit+239} <ffffffff8010eaa6>{system_call+126} This is the same as bug 160652 It's a race condition in the mkinitrd package. This bug just hit me on a 1372 I just rmpbuilded with a few extra patches. The machine is an old dual P2-400 SMP with onboard (but unused) AIC7xxx and a normal IDE HD. Nothing fancy. I did check the modprobe.conf and hwconf ahead of time as per the mailing list notice and they looked ok (and after a rm+kudzu they were identical with the previous versions). Sounds like the fix is to wait for an errata mkinitrd+kernel and then rerpmbuild myself? Errata issued soon? Same Problem with an P4 HT 3GHz.:( Chipset: Intel 875 Same problem. Bad kernel. It should never be released. It would not have been released if somebody had reported problems with the 2.6.12-1.1371 kernel that was in the testing repository. I feel somewhat responsible because I saw this problem with the first testing kernel in this series that was released a couple of weeks ago. Unfortunately then I thought the issue was due to a broken selinux dependency and didn't make a full report. I was actually writing up a report on 1371 when I saw the 1372 announcement. But of course anyone else could have tested it. I know I'll have a machine set aside and tracking the testing repository from now on. Same here on my Dell box. Interestingly I saw the exact same symptoms a few days earlier when trying to build a vanilla 2.6.12.2 kernel directly from kernel.org and blamed my inability to get a vanilla kernel working in FC3 ... I just want to note that the comments regarding the broken kernel rpm point out a possible link to nvidia drivers, but the 2 systems I had this bug hit me on were NOT running NVIDIA cards nor drivers. Nor are we using any non-FC yum repos (except crash-hat for clamav antivirus). This seems to be a "generic" kernel bug introduced with 2.6.12-RC1 somewhere in the ACPI code (or the ACPI code interacting with something else) on SMP builds and does not seem related to bug 160652. Setting acpi=off works for me. I am running 2.6.12 RC4 without any problem, however, I have the same kernel panic reported here with the FC3 2.6.12 SMP kernels, see, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162859 so I think the problem was introduced after RC1. I, too, see this problem. I've rebuilt my modprobe.conf and hwconf ad nauseum. It seems pretty clear to me that the initrd is hosed, but I don't see how to fix it. see comment #3. We need an mkinitrd update with a newer 'sh' for FC3. To test that, built a "hybrid" mkinitrd package using the nash source from FC4 and the rest from the current FC3 package. (I just replaced the nash directory tarball.) After installing the new kernel, I still had the problem. Next I updated mkinitrd and, as required, udev, to the FC-4 versions and rebuilt the initrds. This time the kernel boots. Now I have to see what could have been busted by the udev update. Just to note: I have a similar problem on P4 2.8Ghz HT machines. The smp kernel fails as below wiht hdb being a zip disk. Then none smp kernel goes thru fine until pops up with a black screen that ends with Enabling swap space: [ OK ] Then it just sits there, others have said the none smp works, but it doesn't for these systems. Older kernels boot just fine. booting with kernel 2.6.12-1.1372_FC3smp Red Hat nash version 4.1.18 starting hdb: No disk in drive mount: error 6 mounting ext3 mount: error 2 mounting none switchroot: mount failed 22 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! *** Bug 163672 has been marked as a duplicate of this bug. *** *** Bug 162770 has been marked as a duplicate of this bug. *** Updating udev seems to have messed up various pieces of the system; inserted USB devices don't show up in /etc/fstab and /dev/dsp isn't being created, to name two. So I'd say that updating to the FC-4 mkinitrd (which requiires that udev be updated as well) is not a viable solution. If anyone has any suggestions for other things I can try, I'll be happy to investigate them. As suggested in other reports which are now duplicates of this one, I unpacked my initrd and copied in a copy of nash unpacked from the current rawhide RPM. The behavior did not change. I'm at a complete loss now. Updating mkinitrd and udev makes things work, but the udev update breaks piles of things. So there's a good chance that either mkinitrd or udev is causing the problems. Swapping in the latest version the important contribution from mkinitrd (/sbin/hash) doesn't help. Could it be udev instead that needs to be updated? If so, how can I do that without the breakage it causes to the rest of the machine? Using an updated udev (058-1) and mkinitrd (4.2.15-1) from the FC4 distro, I was able to get my FC3 system to boot with the 2.6.12-1.1372_FC3smp kernel. I also have noticed problems with what seems to be udev, namely, /dev/floppy (or fd0) no longer exists. Right now this isn't a problem, but I'm wondering if this is worth posting another bugzilla bug to address the udev issue specifically? The udev issue isn't a bug; it's not expected that you can just install FC-4 packages on an FC-3 machine and have them work. *** Bug 163863 has been marked as a duplicate of this bug. *** I can't believe we haven't seen more work from the FC kernel team on this bug (posted to this bugzilla). The CC list here is getting quite large and the dupes are starting to pop up like flies (I suspect we'll start seeing an explosion soon). The 1372 errata release should be withdrawn, or fixed and obsoleted. It appears that anyone with SMP (or HT, which is a LOT of people) will have a non-booting machine after putting in 1372. Is there not a method in the FC process to pull a faulty errata or put a crack team on a critical bug to get it fixed asap? I feel sorry for all the people (like myself) with remote/colocated servers running auto (or manual) yum updates who are then completely screwed when they issue the reboot and discover they're going to have to drive 50 miles (or more) to fix it. It's good that this bug was found and posted almost right away, but what good is that if nothing is done about it and the exponential increase in the number of people updating to the new package is allowed to occur? And that follow-up to the errata email post was poorly worded and gave people who weren't running obscure nvidia rpm's false confidence in the update. Sorry if I sound bitchy, I'm not really upset for myself, because I got caught with this on a local machine that I could easily reboot, but there's lots of people who aren't going to be so lucky, including lots of noobs that will really panic when their machine doesn't boot with cryptic errors that could be perceived as disk corruption errors. Yes, this is quite severe, I know many who run this SMP kernel with FC3 even on semi-production-level stable servers (for the poor, so to speak), which often need remote reboot capability. I've been fixing two cases of it already, on one I had done a yum update manually and it all seemed fine and dandy, but after some cron-job did a reboot around 5 AM a few days later, there it was; The server was down for almost 4 hours without me knowing it. The stable yum releases have been pretty reliable, but I would retract this kernel release from yum repositories fast, because this is pretty bad for fedora core's reputation as a whole. (In reply to comment #18) > Updating udev seems to have messed up various pieces of the system; inserted USB > devices don't show up in /etc/fstab and /dev/dsp isn't being created, to name > two. So I'd say that updating to the FC-4 mkinitrd (which requiires that udev > be updated as well) is not a viable solution. > > If anyone has any suggestions for other things I can try, I'll be happy to > investigate them. (In reply to comment #19) > As suggested in other reports which are now duplicates of this one, I unpacked > my initrd and copied in a copy of nash unpacked from the current rawhide RPM. > The behavior did not change. > > I'm at a complete loss now. Updating mkinitrd and udev makes things work, but > the udev update breaks piles of things. So there's a good chance that either > mkinitrd or udev is causing the problems. Swapping in the latest version the > important contribution from mkinitrd (/sbin/hash) doesn't help. Could it be > udev instead that needs to be updated? If so, how can I do that without the > breakage it causes to the rest of the machine? Here are the updates I performed. These fix all the behaviors reported from the kernel panic to /dev device disruption. All devices are recognized and activated upon bootup, including a Midisport 2x2 USB device that never initialized with loading of firmware upon bootup before!. Note these are all from FC4 and all dependencies are satisfied: checkpolicy-1.23.1-1.i386.rpm initscripts-8.11.1-1.i386.rpm libselinux-1.23.10-2.i386.rpm libselinux-devel-1.23.10-2.i386.rpm libsepol-1.5.9-2.i386.rpm libsepol-devel-1.5.9-2.i386.rpm mkinitrd-4.2.15-1.i386.rpm SysVinit-2.85-39.i386.rpm udev-058-1.i386.rpm Frank (In reply to comment #18) > Updating udev seems to have messed up various pieces of the system; inserted USB > devices don't show up in /etc/fstab and /dev/dsp isn't being created, to name > two. So I'd say that updating to the FC-4 mkinitrd (which requiires that udev > be updated as well) is not a viable solution. > > If anyone has any suggestions for other things I can try, I'll be happy to > investigate them. (In reply to comment #19) > As suggested in other reports which are now duplicates of this one, I unpacked > my initrd and copied in a copy of nash unpacked from the current rawhide RPM. > The behavior did not change. > > I'm at a complete loss now. Updating mkinitrd and udev makes things work, but > the udev update breaks piles of things. So there's a good chance that either > mkinitrd or udev is causing the problems. Swapping in the latest version the > important contribution from mkinitrd (/sbin/hash) doesn't help. Could it be > udev instead that needs to be updated? If so, how can I do that without the > breakage it causes to the rest of the machine? Here are the updates I performed. These fix all the behaviors reported from the kernel panic to /dev device disruption. All devices are recognized and activated upon bootup, including a Midisport 2x2 USB device that never initialized with loading of firmware upon bootup before!. Note these are all from FC4 and all dependencies are satisfied: checkpolicy-1.23.1-1.i386.rpm initscripts-8.11.1-1.i386.rpm libselinux-1.23.10-2.i386.rpm libselinux-devel-1.23.10-2.i386.rpm libsepol-1.5.9-2.i386.rpm libsepol-devel-1.5.9-2.i386.rpm mkinitrd-4.2.15-1.i386.rpm SysVinit-2.85-39.i386.rpm udev-058-1.i386.rpm Frank (In reply to comment #19) > As suggested in other reports which are now duplicates of this one, I unpacked > my initrd and copied in a copy of nash unpacked from the current rawhide RPM. > The behavior did not change. > > I'm at a complete loss now. Updating mkinitrd and udev makes things work, but > the udev update breaks piles of things. So there's a good chance that either > mkinitrd or udev is causing the problems. Swapping in the latest version the > important contribution from mkinitrd (/sbin/hash) doesn't help. Could it be > udev instead that needs to be updated? If so, how can I do that without the > breakage it causes to the rest of the machine? Please mark Bug#164108 as a dupe of this one. *** Bug 164108 has been marked as a duplicate of this bug. *** I also concur that when I booted 2.6.12-1.1372_FC3smp on a hyperthreaded 3.0 GHz P4 I experienced the same kernel panic as everyone else, and had to revert back to the prior kernel. Question: If/when the fix is dicsovered, *and* if it involves mkinitrd and not the kernel itself, will uninstalling and reinstalling 2.6.12-1.1372_FC3smp still be required? I know this is putting the cart in front of the horse. I have had the same experience on a dual PIII. Never tried the uniproc version of 12-1.1372, as I was keen to get the box back up (not one I can mess with, really). Got same output as original reporter, and succeeded in boot to prior kernel (kernel-smp-2.6.11-1.35_FC3). Same bug here on SMP dual Xeons, no hyperthreading. I need a 2.6.12 or later kernel with smp because the 2.6.12 seems to cure some libata sata sil 3112/3114 problems (I haven't ruled out that 2.6.12 single proc is the reason, I'm going to have to see if the problem with the sata sil is in 2.6.9 single proc). I guess I'm getting on the Cc list. The mkinitrd update that just appeared in the testing repository solves the problem for me. Install the update, then remove the 1372 kernel package and reinstall it (so that the initrd is recreated). Your system should hopefully boot fine. I can confirm that mkinitrd-4.1.18.1-1 solves the problem for me too. I can confirm that mkinitrd-4.1.18.1-1 solves the problem for me too. I can confirm that mkinitrd-4.1.18.1-1 solves the problem for me too. mkinitrd-4.1.18.1-1 works for me on the first system I tested. I have to try this on one other hardware type, but it looks good. When does the mkinitrd fix hit stable? mkinitrd-4.1.18.1-1 likewise works for me. Metoo! mkinitrd-4.1.18.1-1 produces bootable 2.6.12.xxx kernels. However, the instructions ("update information") say "reinstall recent kernels" when just rerunning "mkinitrd" appears to be sufficient. K.O. Now all is ok. 1. rpm -Fvh mkinitrd-4... 2. rpm -e kernel-smp.. --nodeps 3. rpm -ivh kernel-smp... This morning mkinitrd-4.1.18.1-1 showed up in the FC3 yum update repository, and I can confirm that the update fixed the kernel panic on kernel-smp-2.6.12-1.1372_FC3. If you do the update and uninstall/reinstall the kernel (and its initrd), this will replace the bad mkinitrd and stop the kernel panic on the duals. Some claim the kernel does not need to be reinstalled, just mkinitrd. I do not know the answer to that question, but I do know what worked for me: 1. yum update mkinitrd (make sure mkinitrd-4.1.18.1-1 updates on top of 4.1.18-2) 2. yum remove kernel-smp-2.6.12-1.1372_FC3 kernel-2.6.12-1.1372_FC3 3. yum update yum should retrieve and install kernel-smp-2.6.12-1.1372_FC3 again and correctly create the initrd for the smp version to be bootable again. 4. Take a breath and reboot. Thanks to all who helped solve this one. -BK confirming mkinitrd update works I can confirm for me that you DO have to uninstall the SMP kernel and then re-install it (or rebuild the init image manually), just updating mkinitrd is not enough. You don't *have* to uninstall the single-processor kernel version, just the SMP one, although you may want to anyway. rpm -U mkinitrd-4.1.18.1-1 rpm -e kernel-smp-2.6.12-1.1372_FC3 rpm -i kernel-smp-2.6.12-1.1372_FC3 (or yum equivalents that Brad already gave). Otherwise this bug seems fixed. Thanks. Here's a question: if you have a box without 1372 yet and you do a yum update which grabs 1372 and mkinitrd-4.1.18.1-1 at the same time, can we assume it will apply them in the proper order (mkinitrd first) so that you will have a working system? I don't think there's anything in the dependency information which would insure that. So yeah, that could be a serious problem on new installs. The next kernel update should probably have "Requires: mkinitrd >= 4.1.18.1". When I applied the fix then it broke VMware for me. When I tried to compile the modules again, VMware reports that the kernel was compiled with gcc 3.4.3 whereas tehe modules were compiled with 3.4.4. Can the kernel be compiled with gcc 3.4.4 by chance to get around this problem. I realize that I could install gcc 3.4.3 and the header files and recompile the module but if 3.4.4 is going to be the compiler for the next kernel then I would have to recompile the modules again. Am I making sense? Thanks for all the hard work. Same here except no SMP, just the latest kernel 2.6.12-1.1372_FC3 / mkinitrd 4.1.18 grub param selinux=0 fixed it error at kernel panic Enforced mode requested but no policy loaded Correction to my last post, The reason I can not rebuild the modules for VMware is because the /lib/modules/2.6.12-1.1372_FC3smp directory is not correct. The build directory points to a non existant directory under /usr/src... Is there a way I can correct this problem myself. Thanks Can anyone help me fix the problem. I'm new to this so I really don't have a clue :( BTW, here is my bootup sequence: Booting 'Fedora Core (2.6.12-1.1372_FC3smp)' root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel vmlinuz-2.6.12-1.1372_FC3smp ro root=/dev/VolGroup00/LogVl00 rhgb quiet [Linux-bzImage, setup=0x1e00, size=0x17e869] initrd /initrd-2.6.12-1.1372_FC3smp.img [Linux-initrd @ 0x1fef2000, 0xed0c5 bytes] Uncompressing Linux... Ok, booting the kernel. Red Hat nash version 4.1.18 starting mount: error 6 mounting ext3 mount: error 2 mounting none switchroot: mount failed: 22 umount /initrd/dev failed: 2 Kernel panic - not syncing: Attempted to kill init! Having similar problems. Kernel version 2.6.12-1.1372_FC3smp, obtined through yum update, was compiled under gcc version 3.4.3. When I compiled source with gcc version 3.4.4 it kernal paniced (mkinitrd --version = 4.1.18.1). For everyone's reference: Due to an interruption on the power source, my production machine accidentally get rebooted and freezed up yesterday afternoon. Googled for a while and finally found me too are the victim of this bug#163437. Checked the yum.log this morning and found: ... Jul 16 04:27:15 Installed: kernel-smp.i686 2.6.12-1.1372_FC3 ... Jul 31 05:03:29 Updated: mkinitrd.i386 4.1.18.1-1 ... Nov 03 07:11:26 Installed: kernel.i686 2.6.12-1.1381_FC3 ... so, seems the yum update of mkinitrd.i386 4.1.18.1-1 won't help for systems that's being left unattened. Any of those systems using the smp kernel rebooted during 2005-07-16 ~ 2005-11-03 will get paralyzed. and, today, a new kernel get installed, and i hoped that the next time when my machine get rebooted (either accidentally or deliberately), it will up & running again smoothly. (but definitely i would like to arrange to have it rebooted in the coming few days to see if it's really bug free...) so if the "power interruption" happens on today afternoon, i guess my production system probably can be up & running again within 1 mintues or 2. the point i would like to raise out is, if it's clear that the newer version of mkinitrd won't help much on the situation, especially for newbie and for lazy admin like me that let the system run somewhere in the data center, why the Fedora not try to help to slove the problem until today a new kernel come? is it feasible to "push" a new kernel out (with increased minor/maintenance version number, say 2.6.12-1.1372.1_FC3smp) so that yum/up2date will install the kernel again and the bug#163437 gone? |