Bug 163437

Summary: Using new 2.6.12-1.1372_FC3smp kernel causes kernel panic while booting
Product: [Fedora] Fedora Reporter: Philip Pearson <prgp1976>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: bugzilla, cmarco, deron.meranda, dgunchev, d.lesca, fche, ihok, jansen, j, kerryn.wood, lbyrd, linux_forum, luke, mail, matt, mattdm, matthew, menscher, mikes, mrsam, nathan-redhatbugzilla, nnc, pcoene1, prigault, smalenfant, trevor, ttaylor, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-07-31 12:30:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 161059    

Description Philip Pearson 2005-07-16 15:54:13 UTC
From Bugzilla Helper:
User-Agent: Opera/8.01 (X11; Linux i686; U; en)

Description of problem:
I have just updated by kernel from 2.6.11-1.35_FC3 (smp) to 2.6.12-1.1372_FC3 
(smp).  When I reset the machine I was greet with a kernel panic during the 
early stages of booting.  Removing the 'quiet' option from grub produced the 
following (copied by hand, so I apologise for any typos):

insmod: error inserting '/lib/ata_piix.ko': -1 unknown symbol in module
ERROR: /bin/insmod exited abnormally
Creating root device
umount /sys failed: 16
Mounting root filesystem
mount: error 19 mounting ext3
mount: error 2 mouting none
Switching to new root
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!
 [<c0120e85>] panic+0x42/0x1ca
 [<c0121ff1>] profile_task_exit+0x31/0x45
 [<c0123d8d>] do_exit+0x252/0x35a
 [<c0123eb5>] next_thread+0x0/0xc
 [<c0103fd9>] syscall_call_0x7/0xb

It is interesting to note that the non-smp version of the kernel 2.6.12-1.
1372_FC3 does successfully boot.  Hence, the problem seems to be only in the smp 
version.

I had resently done a clean install of FC3 which I updated straight away (to 
kernel 2.6.11-1.35_FC3).  The machine is a fairly standard Dell GX280 (2.8GHz 
intel pentium 4 with HT, thus the smp version of the kernel) with 1GB memory.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.12-1.1372_FC3

How reproducible:
Always

Steps to Reproduce:
1. Turn on machine / reboot
2. Allow grub to start "Fedora Core (2.6.12-1.1372_FC3smp)"
3. Watch as the kernel panics ...
  

Actual Results:  Kernel Panics

Expected Results:  Kernel should not panic (system should continue to boot properly)

Additional info:

Comment 1 Neil Carlson 2005-07-16 17:32:08 UTC
I see precisely the same behavior with the x86_64 version of
kernel-smp-2.6.12-1.1372_FC3.  Immediately after the output 'Red Hat nash
version 4.1.18 starting' I see errors when various modules are inserted.  The
specific modules vary from boot to boot (modules must be loaded asynchronously),
but I've seen dm-snapshot.ko, dm-mirror.ko, dm-zero.ko, ext3.ko, sata_nv.ko.

Like the original reporter, the uniprocessor 2.6.12 kernel works fine, as does
the previous 2.6.11-1.35_FC3 kernel (smp, x86_64).

Looks like a bad build/packaging to me.

Comment 2 Sam Varshavchik 2005-07-16 19:59:33 UTC
I agree that this looks like a bad build.  The x86_64 non-smp kernel boots fine,
the smp kernel panics:


Bootdata ok (command line is ro root=/dev/md1 console=ttyS0,9600 console=tty0)
Linux version 2.6.12-1.1372_FC3smp (bhcompile.redhat.com) (gcc
version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri Jul 15 01:08:54 EDT 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009b400 (usable)
 BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000d6000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff70000 (usable)
 BIOS-e820: 000000007ff70000 - 000000007ff76000 (ACPI data)
 BIOS-e820: 000000007ff76000 - 000000007ff80000 (ACPI NVS)
 BIOS-e820: 000000007ff80000 - 0000000080000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
Scanning NUMA topology in Northbridge 24
Number of nodes 2
Node 0 using interleaving mode 1/0
No NUMA configuration found
Faking a node at 0000000000000000-000000007ff70000
Bootmem setup node 0 0000000000000000-000000007ff70000
ACPI: PM-Timer IO Port: 0x8008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xfe500000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 17, address 0xfe500000, GSI 24-27
ACPI: IOAPIC (id[0x04] address[0xfe501000] gsi_base[28])
IOAPIC[2]: apic_id 4, version 17, address 0xfe501000, GSI 28-31
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 80000000 (gap: 80000000:7ec00000)
Checking aperture...
CPU 0: aperture @ 0 size 32 MB
No AGP bridge found
Built 1 zonelists
Kernel command line: ro root=/dev/md1 console=ttyS0,9600 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz PM timer.
time.c: Detected 1403.221 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2054124k/2096576k available (2378k kernel code, 0k reserved, 1292k data,
228k init)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Using local APIC timer interrupts.
Detected 12.528 MHz APIC timer.
Booting processor 1/1 rip 6000 rsp ffff81007ff33f58
Initializing CPU#1
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
AMD Opteron(tm) Processor 240 stepping 01
CPU 1: Syncing TSC to CPU 0.
Brought up 2 CPUs
Disabling vsyscall due to use of PM timer
time.c: Using PM based timekeeping.
testing NMI watchdog ... OK.
checking if image is initramfs... it is
CPU 1: synchronized TSC with CPU 0 (last diff -27 cycles, maxerr 712 cycles)
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 5 10 *11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 *5 10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 5 *10 11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 5 10 *11)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 12 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
PCI-DMA: Disabling IOMMU.
pnp: 00:04: ioport range 0x4d0-0x4d1 has been reserved
pnp: 00:04: ioport range 0x1100-0x117f has been reserved
pnp: 00:04: ioport range 0x1180-0x11ff has been reserved
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1121529062.434:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key 7C7615FA604FC717
- User ID: Red Hat, Inc. (Kernel Module GPG key)
PCI: MSI quirk detected. pci_msi_quirk set.
PCI: MSI quirk detected. pci_msi_quirk set.
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12
Linux agpgart interface v0.101 (c) Dave Jones
PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 76 ports, IRQ sharing enabled
ÿttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
    ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:DMA, hdd:pio
hda: SAMSUNG SP1213N, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: LITE-ON DVDRW LDW-811S, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 1024KiB
hda: 234493056 sectors (120060 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100)
hda: cache flushes supported
 hda: hda1
hdc: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.01:USB HID core driver
mice: PS/2 mouse device common for all mice
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
powernow-k8: Power state transitions not supported
powernow-k8: Power state transitions not supported
ACPI wakeup devices: 
TP2P USB0 USB1 G0PA <7>Losing some ticks... checking if CPU frequency changed.
LAN0 LAN1 G0PB 
ACPI: (supports S0 S1 S4 S5)
Freeing unused kernel memory: 228k freed
input: AT Translated Set 2 keyboard on isa0060/serio0
SCSI subsystem initialized
sd_mod: Unknown symbol scsi_device_get
sd_mod: Unknown symbol scsi_wait_req
sd_mod: Unknown symbol scsi_get_sense_info_fld
sd_mod: Unknown symbol scsicam_bios_param
sd_mod: Unknown symbol scsi_command_normalize_sense
sd_mod: Unknown symbol scsi_test_unit_ready
sd_mod: Unknown symbol scsi_block_when_processing_errors
sd_mod: Unknown symbol scsi_register_driver
sd_mod: Unknown symbol scsi_ioctl
sd_mod: Unknown symbol scsi_nonblockable_ioctl
sd_mod: Unknown symbol scsi_device_put
sd_mod: Unknown symbol scsi_request_normalize_sense
sd_mod: Unknown symbol __scsi_mode_sense
sd_mod: Unknown symbol scsi_logging_level
sd_mod: Unknown symbol scsi_print_req_sense
sd_mod: Unknown symbol scsi_release_request
sd_mod: Unknown symbol scsi_print_sense
sd_mod: Unknown symbol scsi_allocate_request
sd_mod: Unknown symbol scsi_io_completion
sd_mod: Unknown symbol scsi_set_medium_removal
ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 29 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:03:01.1[B] -> GSI 30 (level, low) -> IRQ 177
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
        <Adaptec 29320 Ultra320 SCSI adapter>
        aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs

md: raid1 personality registered as nr 3
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Kernel panic - not syncing: Attempted to kill init!

Call Trace:<ffffffff80138164>{panic+196} <ffffffff8034f811>{__down_read+49}
       <ffffffff80207ef1>{__up_read+33} <ffffffff8013ae53>{do_exit+99}
       <ffffffff80207db1>{__up_write+49} <ffffffff8013ba8f>{do_group_exit+239}
       <ffffffff8010eaa6>{system_call+126} 
 

Comment 3 Dan Carpenter 2005-07-16 21:47:40 UTC
This is the same as bug 160652

It's a race condition in the mkinitrd package.


Comment 4 Trevor Cordes 2005-07-16 22:16:48 UTC
This bug just hit me on a 1372 I just rmpbuilded with a few extra patches.  The
machine is an old dual P2-400 SMP with onboard (but unused) AIC7xxx and a normal
IDE HD.  Nothing fancy.

I did check the modprobe.conf and hwconf ahead of time as per the mailing list
notice and they looked ok (and after a rm+kudzu they were identical with the
previous versions).

Sounds like the fix is to wait for an errata mkinitrd+kernel and then rerpmbuild
myself?  Errata issued soon?



Comment 5 Frank Büttner 2005-07-17 09:11:37 UTC
Same Problem with an P4 HT 3GHz.:(
Chipset: Intel 875

Comment 6 Radek Liboska 2005-07-17 14:55:33 UTC
Same problem. Bad kernel. It should never be released.

Comment 7 Jason Tibbitts 2005-07-17 16:34:58 UTC
It would not have been released if somebody had reported problems with the
2.6.12-1.1371 kernel that was in the testing repository.  I feel somewhat
responsible because I saw this problem with the first testing kernel in this
series that was released a couple of weeks ago.  Unfortunately then I thought
the issue was due to a broken selinux dependency and didn't make a full report.
 I was actually writing up a report on 1371 when I saw the 1372 announcement.

But of course anyone else could have tested it.  I know I'll have a machine set
aside and tracking the testing repository from now on.

Comment 8 Volker Schäfer 2005-07-18 10:47:01 UTC
Same here on my Dell box.
Interestingly I saw the exact same symptoms a few days earlier when trying to
build a vanilla 2.6.12.2 kernel directly from kernel.org and blamed my inability
to get a vanilla kernel working in FC3 ...

Comment 9 Trevor Cordes 2005-07-18 20:35:15 UTC
I just want to note that the comments regarding the broken kernel rpm point out
a possible link to nvidia drivers, but the 2 systems I had this bug hit me on
were NOT running NVIDIA cards nor drivers.  Nor are we using any non-FC yum
repos (except crash-hat for clamav antivirus).


Comment 10 Volker Schäfer 2005-07-19 11:04:31 UTC
This seems to be a "generic" kernel bug introduced with 2.6.12-RC1 somewhere in
the ACPI code (or the ACPI code interacting with something else) on SMP builds
and does not seem related to bug 160652.
Setting acpi=off works for me.

Comment 11 Stuart Anderson 2005-07-19 16:46:27 UTC
I am running 2.6.12 RC4 without any problem, however, I have the same kernel
panic reported here with the FC3 2.6.12 SMP kernels, see,
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162859
so I think the problem was introduced after RC1.

Comment 12 Joe Christy 2005-07-19 18:08:02 UTC
I, too, see this problem. I've rebuilt my modprobe.conf and hwconf ad nauseum.
It seems pretty clear to me that the initrd is hosed, but I don't see how to fix it.

Comment 13 Dave Jones 2005-07-19 19:34:04 UTC
see comment #3.  We need an mkinitrd update with a newer 'sh' for FC3.


Comment 14 Jason Tibbitts 2005-07-20 17:28:52 UTC
To test that, built a "hybrid" mkinitrd package using the nash source from FC4
and the rest from the current FC3 package.  (I just replaced the nash directory
tarball.)  After installing the new kernel, I still had the problem.

Next I updated mkinitrd and, as required, udev, to the FC-4 versions and rebuilt
the initrds.  This time the kernel boots.

Now I have to see what could have been busted by the udev update.

Comment 15 Michael Setzer II 2005-07-21 05:58:34 UTC
Just to note: I have a similar problem on P4 2.8Ghz HT machines.
The smp kernel fails as below wiht hdb being a zip disk. Then none smp 
kernel goes thru fine until pops up with a black screen that ends with 
Enabling swap space: [ OK ]

Then it just sits there, others have said the none smp works, but it doesn't
for these systems. Older kernels boot just fine. 

booting with kernel  2.6.12-1.1372_FC3smp



Red Hat nash version 4.1.18 starting

hdb: No disk in drive   

mount: error 6 mounting ext3

mount: error 2 mounting none

switchroot: mount failed 22

umount /initrd/dev failed: 2

Kernel panic - not syncing: Attempted to kill init!





Comment 16 Frank Ch. Eigler 2005-07-21 14:12:40 UTC
*** Bug 163672 has been marked as a duplicate of this bug. ***

Comment 17 Frank Ch. Eigler 2005-07-21 14:15:07 UTC
*** Bug 162770 has been marked as a duplicate of this bug. ***

Comment 18 Jason Tibbitts 2005-07-21 14:20:45 UTC
Updating udev seems to have messed up various pieces of the system; inserted USB
devices don't show up in /etc/fstab and /dev/dsp isn't being created, to name
two.  So I'd say that updating to the FC-4 mkinitrd (which requiires that udev
be updated as well) is not a viable solution.

If anyone has any suggestions for other things I can try, I'll be happy to
investigate them.

Comment 19 Jason Tibbitts 2005-07-21 15:46:44 UTC
As suggested in other reports which are now duplicates of this one, I unpacked
my initrd and copied in a copy of nash unpacked from the current rawhide RPM. 
The behavior did not change.

I'm at a complete loss now.  Updating mkinitrd and udev makes things work, but
the udev update breaks piles of things.  So there's a good chance that either
mkinitrd or udev is causing the problems.  Swapping in the latest version the
important contribution from mkinitrd (/sbin/hash) doesn't help.  Could it be
udev instead that needs to be updated?  If so, how can I do that without the
breakage it causes to the rest of the machine?

Comment 20 Nick Fitzkee 2005-07-21 15:54:07 UTC
Using an updated udev (058-1) and mkinitrd (4.2.15-1) from the FC4 distro, I was
able to get my FC3 system to boot with the 2.6.12-1.1372_FC3smp kernel.  I also
have noticed problems with what seems to be udev, namely, /dev/floppy (or fd0)
no longer exists.  Right now this isn't a problem, but I'm wondering if this is
worth posting another bugzilla bug to address the udev issue specifically?

Comment 21 Jason Tibbitts 2005-07-21 16:04:29 UTC
The udev issue isn't a bug; it's not expected that you can just install FC-4
packages on an FC-3 machine and have them work.

Comment 22 Forest Wilkinson 2005-07-21 17:34:27 UTC
*** Bug 163863 has been marked as a duplicate of this bug. ***

Comment 23 Trevor Cordes 2005-07-21 18:33:04 UTC
I can't believe we haven't seen more work from the FC kernel team on this bug
(posted to this bugzilla).  The CC list here is getting quite large and the
dupes are starting to pop up like flies (I suspect we'll start seeing an
explosion soon).  The 1372 errata release should be withdrawn, or fixed and
obsoleted.  It appears that anyone with SMP (or HT, which is a LOT of people)
will have a non-booting machine after putting in 1372.  Is there not a method in
the FC process to pull a faulty errata or put a crack team on a critical bug to
get it fixed asap?

I feel sorry for all the people (like myself) with remote/colocated servers
running auto (or manual) yum updates who are then completely screwed when they
issue the reboot and discover they're going to have to drive 50 miles (or more)
to fix it.

It's good that this bug was found and posted almost right away, but what good is
that if nothing is done about it and the exponential increase in the number of
people updating to the new package is allowed to occur?  And that follow-up to
the errata email post was poorly worded and gave people who weren't running
obscure nvidia rpm's false confidence in the update.

Sorry if I sound bitchy, I'm not really upset for myself, because I got caught
with this on a local machine that I could easily reboot, but there's lots of
people who aren't going to be so lucky, including lots of noobs that will really
panic when their machine doesn't boot with cryptic errors that could be
perceived as disk corruption errors.


Comment 24 Julius Thyssen 2005-07-23 01:53:43 UTC
Yes, this is quite severe, I know many who run this SMP kernel with FC3 even on
semi-production-level stable servers (for the poor, so to speak), which often
need remote reboot capability.

I've been fixing two cases of it already, on one I had done a yum update
manually and it all seemed fine and dandy, but after some cron-job did a reboot
around 5 AM a few days later, there it was; The server was down for almost 4
hours without me knowing it. The stable yum releases have been pretty reliable,
but I would retract this kernel release from yum repositories fast, because this
is pretty bad for fedora core's reputation as a whole.

Comment 25 frank pirrone 2005-07-23 03:38:57 UTC
(In reply to comment #18)
> Updating udev seems to have messed up various pieces of the system; inserted USB
> devices don't show up in /etc/fstab and /dev/dsp isn't being created, to name
> two.  So I'd say that updating to the FC-4 mkinitrd (which requiires that udev
> be updated as well) is not a viable solution.
> 
> If anyone has any suggestions for other things I can try, I'll be happy to
> investigate them.

(In reply to comment #19)
> As suggested in other reports which are now duplicates of this one, I unpacked
> my initrd and copied in a copy of nash unpacked from the current rawhide RPM. 
> The behavior did not change.
> 
> I'm at a complete loss now.  Updating mkinitrd and udev makes things work, but
> the udev update breaks piles of things.  So there's a good chance that either
> mkinitrd or udev is causing the problems.  Swapping in the latest version the
> important contribution from mkinitrd (/sbin/hash) doesn't help.  Could it be
> udev instead that needs to be updated?  If so, how can I do that without the
> breakage it causes to the rest of the machine?

Here are the updates I performed.  These fix all the behaviors reported from the
kernel panic to /dev device disruption.  All devices are recognized and
activated upon bootup, including a Midisport 2x2 USB device that never
initialized with loading of firmware upon bootup before!.  Note these are all
from FC4 and all dependencies are satisfied:

checkpolicy-1.23.1-1.i386.rpm
initscripts-8.11.1-1.i386.rpm
libselinux-1.23.10-2.i386.rpm
libselinux-devel-1.23.10-2.i386.rpm
libsepol-1.5.9-2.i386.rpm
libsepol-devel-1.5.9-2.i386.rpm
mkinitrd-4.2.15-1.i386.rpm
SysVinit-2.85-39.i386.rpm
udev-058-1.i386.rpm

Frank


Comment 26 frank pirrone 2005-07-23 03:41:07 UTC
(In reply to comment #18)
> Updating udev seems to have messed up various pieces of the system; inserted USB
> devices don't show up in /etc/fstab and /dev/dsp isn't being created, to name
> two.  So I'd say that updating to the FC-4 mkinitrd (which requiires that udev
> be updated as well) is not a viable solution.
> 
> If anyone has any suggestions for other things I can try, I'll be happy to
> investigate them.

(In reply to comment #19)
> As suggested in other reports which are now duplicates of this one, I unpacked
> my initrd and copied in a copy of nash unpacked from the current rawhide RPM. 
> The behavior did not change.
> 
> I'm at a complete loss now.  Updating mkinitrd and udev makes things work, but
> the udev update breaks piles of things.  So there's a good chance that either
> mkinitrd or udev is causing the problems.  Swapping in the latest version the
> important contribution from mkinitrd (/sbin/hash) doesn't help.  Could it be
> udev instead that needs to be updated?  If so, how can I do that without the
> breakage it causes to the rest of the machine?

Here are the updates I performed.  These fix all the behaviors reported from the
kernel panic to /dev device disruption.  All devices are recognized and
activated upon bootup, including a Midisport 2x2 USB device that never
initialized with loading of firmware upon bootup before!.  Note these are all
from FC4 and all dependencies are satisfied:

checkpolicy-1.23.1-1.i386.rpm
initscripts-8.11.1-1.i386.rpm
libselinux-1.23.10-2.i386.rpm
libselinux-devel-1.23.10-2.i386.rpm
libsepol-1.5.9-2.i386.rpm
libsepol-devel-1.5.9-2.i386.rpm
mkinitrd-4.2.15-1.i386.rpm
SysVinit-2.85-39.i386.rpm
udev-058-1.i386.rpm

Frank
(In reply to comment #19)
> As suggested in other reports which are now duplicates of this one, I unpacked
> my initrd and copied in a copy of nash unpacked from the current rawhide RPM. 
> The behavior did not change.
> 
> I'm at a complete loss now.  Updating mkinitrd and udev makes things work, but
> the udev update breaks piles of things.  So there's a good chance that either
> mkinitrd or udev is causing the problems.  Swapping in the latest version the
> important contribution from mkinitrd (/sbin/hash) doesn't help.  Could it be
> udev instead that needs to be updated?  If so, how can I do that without the
> breakage it causes to the rest of the machine?

Comment 27 Philippe Rigault 2005-07-27 15:00:08 UTC
Please mark Bug#164108 as a dupe of this one. 

Comment 28 Frank Ch. Eigler 2005-07-27 15:06:58 UTC
*** Bug 164108 has been marked as a duplicate of this bug. ***

Comment 29 BK Broiler 2005-07-28 16:37:03 UTC
I also concur that when I booted 2.6.12-1.1372_FC3smp on a hyperthreaded 3.0 GHz P4 I experienced the 
same kernel panic as everyone else, and had to revert back to the prior kernel.  Question: If/when the fix 
is dicsovered, *and* if it involves mkinitrd and not the kernel itself, will uninstalling and reinstalling 
2.6.12-1.1372_FC3smp still be required?  I know this is putting the cart in front of the horse.

Comment 30 Matt Thompson 2005-07-28 18:33:41 UTC
I have had the same experience on a dual PIII.  Never tried the uniproc version
of 12-1.1372, as I was keen to get the box back up (not one I can mess with,
really).  Got same output as original reporter, and succeeded in boot to prior
kernel (kernel-smp-2.6.11-1.35_FC3).

Comment 31 Andrew J. Gristina 2005-07-28 23:26:18 UTC
Same bug here on SMP dual Xeons, no hyperthreading.

I need a 2.6.12 or later kernel with smp because the 2.6.12 seems to cure some
libata sata sil 3112/3114 problems (I haven't ruled out that 2.6.12 single proc
is the reason, I'm going to have to see if the problem with the sata sil is in
2.6.9 single proc).


I guess I'm getting on the Cc list.



Comment 32 Jason Tibbitts 2005-07-28 23:38:40 UTC
The mkinitrd update that just appeared in the testing repository solves the
problem for me.  Install the update, then remove the 1372 kernel package and
reinstall it (so that the initrd is recreated).  Your system should hopefully
boot fine.

Comment 33 Philippe Rigault 2005-07-29 00:47:15 UTC
I can confirm that mkinitrd-4.1.18.1-1 solves the problem for me too. 

Comment 34 Radek Liboska 2005-07-29 11:13:13 UTC
I can confirm that mkinitrd-4.1.18.1-1 solves the problem for me too. 

Comment 35 Steve Malenfant 2005-07-29 15:18:33 UTC
I can confirm that mkinitrd-4.1.18.1-1 solves the problem for me too.

Comment 36 Andrew J. Gristina 2005-07-29 17:15:29 UTC
mkinitrd-4.1.18.1-1 works for me on the first system I tested.  I have to try
this on one other hardware type, but it looks good.  When does the mkinitrd fix
hit stable?

Comment 37 Michael Carney 2005-07-29 23:42:38 UTC
mkinitrd-4.1.18.1-1 likewise works for me.

Comment 38 Konstantin Olchanski 2005-07-30 01:00:43 UTC
Metoo! mkinitrd-4.1.18.1-1 produces bootable 2.6.12.xxx kernels. However, the
instructions ("update information") say "reinstall recent kernels" when just
rerunning "mkinitrd" appears to be sufficient. K.O.

Comment 39 Frank Büttner 2005-07-30 07:17:53 UTC
Now all is ok.
1. rpm -Fvh mkinitrd-4...
2. rpm -e kernel-smp.. --nodeps
3. rpm -ivh kernel-smp...

Comment 40 BK Broiler 2005-07-30 18:12:29 UTC
This morning mkinitrd-4.1.18.1-1 showed up in the FC3 yum update repository, and
I can confirm that the update fixed the kernel panic on
kernel-smp-2.6.12-1.1372_FC3.  If you do the update and uninstall/reinstall the
kernel (and its initrd), this will replace the bad mkinitrd and stop the kernel
panic on the duals.

Some claim the kernel does not need to be reinstalled, just mkinitrd.  I do not
know the answer to that question, but I do know what worked for me:

1. yum update mkinitrd
(make sure mkinitrd-4.1.18.1-1 updates on top of 4.1.18-2)
2. yum remove kernel-smp-2.6.12-1.1372_FC3 kernel-2.6.12-1.1372_FC3
3. yum update
yum should retrieve and install kernel-smp-2.6.12-1.1372_FC3 again and correctly
create the initrd for the smp version to be bootable again.
4. Take a breath and reboot.

Thanks to all who helped solve this one.

-BK

Comment 41 Frank Ch. Eigler 2005-07-31 12:30:08 UTC
confirming mkinitrd update works

Comment 42 Deron Meranda 2005-08-01 14:56:47 UTC
I can confirm for me that you DO have to uninstall the SMP kernel
and then re-install it (or rebuild the init image manually), just
updating mkinitrd is not enough.  You don't *have* to uninstall
the single-processor kernel version, just the SMP one, although
you may want to anyway.

 rpm -U mkinitrd-4.1.18.1-1
 rpm -e kernel-smp-2.6.12-1.1372_FC3
 rpm -i kernel-smp-2.6.12-1.1372_FC3

(or yum equivalents that Brad already gave).  Otherwise this bug
seems fixed.  Thanks.


Comment 43 Trevor Cordes 2005-08-01 17:25:42 UTC
Here's a question: if you have a box without 1372 yet and you do a yum update
which grabs 1372 and mkinitrd-4.1.18.1-1 at the same time, can we assume it will
apply them in the proper order (mkinitrd first) so that you will have a working
system?


Comment 44 Matthew Miller 2005-08-01 17:33:19 UTC
I don't think there's anything in the dependency information which would insure
that. So yeah, that could be a serious problem on new installs. 

The next kernel update should probably have "Requires: mkinitrd >= 4.1.18.1".

Comment 45 Lyman Byrd 2005-08-01 17:52:17 UTC
When I applied the fix then it broke VMware for me. When I tried to compile the
modules again, VMware reports that the kernel was compiled with gcc 3.4.3
whereas tehe modules were compiled with 3.4.4.  Can the kernel be compiled with
gcc 3.4.4 by chance to get around this problem. I realize that I could install
gcc 3.4.3 and the header files and recompile the module but if 3.4.4 is going to
be the compiler for the next kernel then I would have to recompile the modules
again. Am I making sense? Thanks for all the hard work.

Comment 46 Clay Campbell 2005-08-02 12:38:53 UTC
Same here except no SMP, just the latest kernel 2.6.12-1.1372_FC3 / mkinitrd 4.1.18

grub param selinux=0 fixed it 

error at kernel panic

Enforced mode requested but no policy loaded

Comment 47 Lyman Byrd 2005-08-02 14:41:57 UTC
Correction to my last post, The reason I can not rebuild the modules for VMware
is because the /lib/modules/2.6.12-1.1372_FC3smp directory is not correct. The
build directory points to a non existant directory under /usr/src... Is there a
way I can correct this problem myself.  Thanks


Comment 48 Roy O. 2005-08-10 07:46:47 UTC
Can anyone help me fix the problem. I'm new to this so I really don't have a 
clue :(

Comment 49 Roy O. 2005-08-10 08:50:28 UTC
BTW, here is my bootup sequence:

Booting 'Fedora Core (2.6.12-1.1372_FC3smp)'

root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel vmlinuz-2.6.12-1.1372_FC3smp ro root=/dev/VolGroup00/LogVl00 rhgb quiet
	[Linux-bzImage, setup=0x1e00, size=0x17e869]
initrd /initrd-2.6.12-1.1372_FC3smp.img
	[Linux-initrd @ 0x1fef2000, 0xed0c5 bytes]

Uncompressing Linux... Ok, booting the kernel.
Red Hat nash version 4.1.18 starting
mount: error 6 mounting ext3
mount: error 2 mounting none
switchroot: mount failed: 22
umount /initrd/dev failed: 2
Kernel panic - not syncing: Attempted to kill init!

Comment 50 Jim Mauroff 2005-08-23 14:01:37 UTC
Having similar problems. Kernel version 2.6.12-1.1372_FC3smp, obtined through
yum update, was compiled under gcc version 3.4.3.  When I compiled source with
gcc version 3.4.4 it kernal paniced (mkinitrd --version = 4.1.18.1).

Comment 51 Matthew Wong 2005-11-03 03:40:25 UTC
For everyone's reference:

Due to an interruption on the power source, my production machine accidentally
get rebooted and freezed up yesterday afternoon.

Googled for a while and finally found me too are the victim of this bug#163437.

Checked the yum.log this morning and found:

   ...
   Jul 16 04:27:15 Installed: kernel-smp.i686 2.6.12-1.1372_FC3
   ...
   Jul 31 05:03:29 Updated: mkinitrd.i386 4.1.18.1-1
   ...
   Nov 03 07:11:26 Installed: kernel.i686 2.6.12-1.1381_FC3
   ...

so, seems the yum update of mkinitrd.i386 4.1.18.1-1 won't help for systems
that's being left unattened. Any of those systems using the smp kernel rebooted
during 2005-07-16 ~ 2005-11-03 will get paralyzed.

and, today, a new kernel get installed, and i hoped that the next time when my
machine get rebooted (either accidentally or deliberately), it will up & running
again smoothly. (but definitely i would like to arrange to have it rebooted in
the coming few days to see if it's really bug free...)

so if the "power interruption" happens on today afternoon, i guess my production
system probably can be up & running again within 1 mintues or 2.

the point i would like to raise out is, if it's clear that the newer version of
mkinitrd won't help much on the situation, especially for newbie and for lazy
admin like me that let the system run somewhere in the data center, why the
Fedora not try to help to slove the problem until today a new kernel come?

is it feasible to "push" a new kernel out (with increased minor/maintenance
version number, say 2.6.12-1.1372.1_FC3smp) so that yum/up2date will install the
kernel again and the bug#163437 gone?