Bug 156664

Summary: kernel smp > 1275 fails boot at 'Switching to new root'
Product: [Fedora] Fedora Reporter: gene c <gjunk>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: graham.hudspith, pfrields, pjones, roland, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-23 05:13:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Photo of screen with oops none

Description gene c 2005-05-03 00:30:13 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050430 Firefox/1.0+

Description of problem:
   All was well up to 1268 - starting with 1275 and 1276
   the smp kernel fails to boot - it hangs here:

   Mounting root filesystem
   kjournald starting. Commit interval 5 seconds
   EXT3-fs: mounted filesystems with ordered data mode
   Switching to new root.

   Then it just hangs. No more info

   Non-smp boots fine as do all versions of smp prior to 1275.

   Machine is 1 GiB mem, HT 3.6 GHz with SATA disk (intel ICH6R/ICH6RW)
   - dell precision 370 (intel based).






Version-Release number of selected component (if applicable):
kernel-smp-2.6.11-1.1276_FC4

How reproducible:
Always

Steps to Reproduce:
1. Reboot
2.
3.
  

Actual Results:  Hangs at Switching to new root.


Expected Results:  boot normally

Additional info:

 Following is /proc/cpuinfo and lspci -vv.

  # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.60GHz
stepping        : 1
cpu MHz         : 3591.578
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
+cmovpat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni
+monitor ds_cpl tm2 cid cx16 xtpr
bogomips        : 7127.04

  ------------------------------------------------------------
  # lspci
00:00.0 Host bridge: Intel Corporation 925X/XE Memory Controller Hub (rev 04)
00:01.0 PCI bridge: Intel Corporation 925X/XE PCI Express Root Port (rev 04)
00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 2 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #3 (rev 03)
00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #4 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
00:1e.2 Multimedia audio controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller (rev 03)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE Controller (rev 03)
00:1f.2 Class 0106: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: nVidia Corporation NV37GL [Quadro FX 330] (rev a2)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 01)

Comment 1 gene c 2005-05-04 03:51:35 UTC
Problem persists using kernel-smp-2.6.11-1.1284_FC4

from

http://people.redhat.com/davej/kernels/Fedora/FC4

Hangs at  Switching to new root.



Comment 2 Graham Hudspith 2005-05-08 11:16:21 UTC
Same happens to me. Machine is a Dell Dimension 5000. 1Gb RAM, 3.00GHz P4 with
HT and SATA disk.

spiceisland root # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 1
cpu MHz         : 2992.768
cache size      : 1024 KB
physical id     : 0
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pni monitor ds_cpl
cid xtpr
bogomips        : 5931.00

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 1
cpu MHz         : 2992.768
cache size      : 1024 KB
physical id     : 0
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pni monitor ds_cpl
cid xtpr
bogomips        : 5980.16

spiceisland root # lspci
0000:00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to
I/O Controller (rev 04)
0000:00:01.0 PCI bridge: Intel Corporation 915G/P/GV/GL/PL/910GL PCI Express
Root Port (rev 04)
0000:00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL
Express Chipset Family Graphics Controller (rev 04)
0000:00:02.1 Display controller: Intel Corporation 82915G Express Chipset Family
Graphics Controller (rev 04)
0000:00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 1 (rev 03)
0000:00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 2 (rev 03)
0000:00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #1 (rev 03)
0000:00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #2 (rev 03)
0000:00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #3 (rev 03)
0000:00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #4 (rev 03)
0000:00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB2 EHCI Controller (rev 03)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
0000:00:1e.2 Multimedia audio controller: Intel Corporation
82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller (rev 03)
0000:00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface
Bridge (rev 03)
0000:00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) IDE Controller (rev 03)
0000:00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA
Controller (rev 03)
0000:00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus
Controller (rev 03)
0000:04:03.0 Ethernet controller: Broadcom Corporation BCM4401 100Base-T (rev 01)


Comment 3 gene c 2005-05-08 14:58:58 UTC
Seems it may be HT related.
It works fine a dual (non HT) P4 for me. 
Something changed at kernel 1275 for sure to make it stop working.

g/

Comment 4 gene c 2005-05-08 20:30:10 UTC
Created attachment 114143 [details]
Photo of screen with oops

Comment 5 gene c 2005-05-08 20:33:59 UTC
kernel-smp 1287 hangs as well.
Hangs at same place.

This time I got the OOPS info from Alt-SysRq-P

(i attached a photo of the oops as well but hand transcribed some here)

  Pid: 1, comm:  init
  EIP: 0060:[c03088f5>] CPU: 0
  EPI is at _spin_unlow_irqrestore+0x1b/0x30
  EFLAGS 00000296 Not tainted ...
  ..
  ... force_sig_info+0x5b/0xa7
  ... do_page_fault+0x359/0x6a7
  ... schedule+0x405/0xc5e
  ... schedule+0x431/0xc5e
  ... scheduler_tick+0x23b/0x414
  ... do_page_fault+0x0/0x6a7
  ... error_code+0x4f/0x54


Let me know if theres any more I can do


Comment 6 gene c 2005-05-12 03:02:23 UTC
kernel-smp-1290 

Same thing.

g/

Comment 7 gene c 2005-05-14 04:36:54 UTC
kernel-smp-1303

same problem.

g/

Comment 8 gene c 2005-05-15 02:37:33 UTC
I turned off HT in the BIOS -
this had no effect at all - same problem.

g/

Comment 9 Dave Jones 2005-05-18 00:01:51 UTC
1268 was fine
1275 (and newer) is broken

The only changes between there are..

revision 1.1274
date: 2005/04/27 17:58:26;  author: davej;  state: Exp;  lines: +8 -2
Hopefully fix the random reboots some folks saw on x86-64
----------------------------
revision 1.1273
date: 2005/04/27 17:03:41;  author: katzj;  state: Exp;  lines: +12 -5
* Wed Apr 27 2005 Jeremy Katz <katzj>
- fix prereqs for -devel packages
----------------------------
revision 1.1272
date: 2005/04/27 16:58:47;  author: riel;  state: Exp;  lines: +6 -2
- Fix up the vdso stuff so kernel-xen* compile again
- Import upstream bugfix so xenU domains can be started again
----------------------------
revision 1.1271
date: 2005/04/27 01:38:11;  author: davej;  state: Exp;  lines: +32 -2
xen vdso magick
----------------------------
revision 1.1270
date: 2005/04/27 00:31:49;  author: davej;  state: Exp;  lines: +6 -2
apply vdso patch
----------------------------
revision 1.1269
date: 2005/04/27 00:28:45;  author: davej;  state: Exp;  lines: +3 -2
fix up the vdso again.

We can rule out the x86-64 changes as this is seen on i386, leaving only the
vdso bits.  Roland, any ideas ?


Comment 10 Roland McGrath 2005-05-18 05:32:45 UTC
1275 works fine for me.  1312 dies, but unchanged with vdso=0.
Apparently the execve of /sbin/init done by nash in the initrd /init is failing.
One only one iteration did nash get its error message out on my serial console
before the panic, and that time said 14 (EFAULT).  Has anything in
exec/binfmt_elf changed recently? 

Comment 11 Peter Jones 2005-05-18 20:22:17 UTC
Can you try installing mkinitrd 4.2.15, rebuilding the initrd for one of the
broken kernels, and then rebooting?

I think this looks like something we saw with mkinitrd, and it should be fixed.

There are mkinitrd packages at http://people.redhat.com/pjones/mkinitrd/ which
should hit rawhide tomorrow morning; I'll probably take them down later in the week.

Comment 12 gene c 2005-05-19 01:12:36 UTC
installed mkinitrd 4.2.15 and rebuilt 1303 smp initrd - no difference unfortunately.

At some point (maybe  1280 I forget) the messages on screen changed to something
like
Switching to new root
umounting old /proc
unmountiung old /sys
 (or similar).
i.e. 2 additional lines.

Thereafter it is hung - alt-sysrq-p shows similar to previous - in the UP boot
the next message is an selinux one - selinux is disabled.

Sorry it did not help. Let me know if there's any mroe I can do to help.


Comment 13 Warren Togami 2005-05-23 05:13:12 UTC

*** This bug has been marked as a duplicate of 158413 ***