Bug 179601

Summary: kernel hang "Starting udev:_" on Athlon SiS 730
Product: [Fedora] Fedora Reporter: John Reiser <jreiser>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: pfrields, sundaram, tmraz, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-04 22:28:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output with 'lapic'
none
.jpg photograph of VGA console at hang
none
serial console boot capture
none
log from git bisect runs
none
sis900_wol_fix.diff none

Description John Reiser 2006-02-01 16:46:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
The kernel hangs with a hard freeze during init on a clone box with Athlon plain CPU and Silicon Integrated Systems [SiS] 730 chipset.  The VGA console shows "Starting udev:_" where the '_' indicates that the cursor has stopped blinking.  Serial console shows no Oops.  There is no response to any keyboard entries, including <ctrl><alt><del>, <alt><sysrq>T, etc., even when appending "debug" or "debug=7" to boot command line.  The same symptom appears when appending "single" option on the boot command line.

This was reported initially as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=178708#c2
but is now being entered separately because the symptoms seem to be separate, and specific to the configuration and/or hardware.  Boots successfully on x86_64 Athlon64 with nForce3 and same kernel versions.  Last kernel which works on Athlon Plain and SiS 730 is 2.6.15-1.1826.2.10_FC5

Version-Release number of selected component (if applicable):
kernel-2.6.15-1.1884_FC5

How reproducible:
Always

Steps to Reproduce:
1.boot kernel-2.6.15-1.1884_FC5, or any kernel after 2.6.15-1.1826.2.10_FC5.
2.
3.
  

Actual Results:  Hang "Starting udev:_" and no response to anything except hardware reset.

Expected Results:  Successful boot and init.


Additional info:

$ /sbin/lspci
00:00.0 Host bridge: Silicon Integrated Systems [SiS] 730 Host (rev 02)
00:00.1 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev d0)
00:01.0 ISA bridge: Silicon Integrated Systems [SiS] SiS85C503/5513 (LPC Bridge)
00:01.1 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 82)
00:01.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 07)
00:01.3 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 07)
00:01.4 Multimedia audio controller: Silicon Integrated Systems [SiS] SiS PCI Audio Accelerator (rev 02)
00:02.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge (AGP)
00:09.0 Ethernet controller: Intel Corporation 82559 InBusiness 10/100 (rev 08)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 Pro Ultra TF
$

Comment 1 John Reiser 2006-02-01 21:59:15 UTC
Created attachment 124000 [details]
dmesg output with 'lapic'

Adding 'lapic' to the boot command line does not change the basic results. 
kernel-2.6.15-1.1884_FC5 still hangs; kernel-2.6.15-1.1826.2.10_FC5 still
works.
Current packages include
  udev-078-8
  initscripts-8.23-1
  MAKEDEV-3.21-1

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 4
model name	: AMD Athlon(tm) Processor
stepping	: 2
cpu MHz 	: 1095.237
cache size	: 256 KB
fdiv_bug	: no
hlt_bug 	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips	: 2191.86
$

Comment 2 John Reiser 2006-02-03 23:21:49 UTC
Created attachment 124133 [details]
.jpg photograph of VGA console at hang

By putting "set -x" at the beginning of /sbin/start_udev, the VGA console
showed the hang was somewhere around shell procedure 'wait_for_queue'.	I tried
to capture the output using serial console, but "exec >/dev/console 2>&1" at
the beginning of /sbin/start_udev did not capture, neither did "exec
>>/dev/console 2>&1".  Running strace on /sbin/udevstart, which is the
significant operation immediately before wait_for_queue, shows a hang in this
attached photograph.  Attempting to redirect the output to serial console using
"strace -o /dev/console /sbin/udevstart" gave nothing on serial console.

The strace of /sbin/udevstart suggests that there may be a leak of
filedescriptors, because the fd grows from time to time, instead of close() and
re-using the same fd each time.

This strace also shows the hang as somewhere in looking at the USB controllers.
 The last group of operations is on
/sys//bus/pci/devices/0000:00:01.2/modalias, where /sbin/lspci shows
-----
00:00.0 Host bridge: Silicon Integrated Systems [SiS] 730 Host (rev 02)
00:00.1 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev d0)
00:01.0 ISA bridge: Silicon Integrated Systems [SiS] SiS85C503/5513 (LPC
Bridge)
00:01.1 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast
Ethernet (rev 82)
00:01.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller
(rev 07)
00:01.3 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller
(rev 07)
00:01.4 Multimedia audio controller: Silicon Integrated Systems [SiS] SiS PCI
Audio Accelerator (rev 02)
00:02.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge
(AGP)
00:09.0 Ethernet controller: Intel Corporation 82559 InBusiness 10/100 (rev 08)

01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 Pro Ultra TF
-----

Comment 3 Jeremy Katz 2006-02-14 20:14:46 UTC
Is it fine if you boot the installer?  (no need to actually go through the
installer, just see if the second stage will come up).

Comment 4 John Reiser 2006-02-14 20:39:31 UTC
Booting i386/images/boot.iso of 2006-02-13 (yesterday, which is what my local
mirror has) from a physical CD, then <Enter> at the boot: prompt, gets to
"running /sbin/loader" and then hangs.  The VGA cursor is blinking, but nothing
has happened after 5 minutes.  There is no reponse to any keyboard command:
Ctrl+Alt+DEL, Alt+Fn, Ctrl+Alt+Fn, Alt+SysRq+<any>, etc.  Hardware reset is
required.

Comment 5 John Reiser 2006-02-14 20:45:02 UTC
The original problem persists with today's packages:
  kernel-2.6.15-1.1948_FC5
  udev-084-1.1
  mkinitrd-5.0.23-1
  initscripts-8.28-1
The old kernel-2.6.15-1.1826.2.10_FC5 still boots with those packages.


Comment 6 Tomas Mraz 2006-02-14 20:50:16 UTC
I cannot reproduce the problem with kernel-2.6.15-1.1948_FC5 udev-084-1.1 on SiS
745 board so it is probably SiS 730 only.


Comment 7 Jeremy Katz 2006-02-14 21:01:32 UTC
Hrmm... does adding 'nousb' to the kernel args help booting the installer?  

Comment 8 John Reiser 2006-02-14 21:32:36 UTC
Created attachment 124641 [details]
serial console boot capture

boot: linux lapic nousb console=ttyS0,9600 console=tty0
still hangs in the same place, "running /sbin/loader" with no keyboard response
after several minutes.	The last message on the serial console was "Write
protecting the kernel read-only data: 344k"  [The "Greetings" and later from
anaconda were not captured on the serial console.]

Comment 9 Rahul Sundaram 2006-02-20 11:30:57 UTC

These bugs are being closed since a large number of updates have been released
after the FC5 test1 and test2 releases. Kindly update your system by running yum
update as root user or try out the third and final test version of FC5 being
released in a short while and verify if the bugs are still present on the system
.Reopen or file new bug reports as appropriate after confirming the presence of
this issue. Thanks

Comment 10 John Reiser 2006-02-21 17:06:27 UTC
Booting from the FC5test3 install DVD hangs "running /sbin/loader".

Comment 11 John Reiser 2006-02-23 19:43:55 UTC
Created attachment 125130 [details]
log from git bisect runs

This is a log of "git bisect" runs.  Git claims to have found the bad kernel
patch.

However, the last boot succeeded by being too old for the version of
udev-084-1.1, so perhaps git's finger is off by 1 patch or so.

Methodology: Clone the Linux kernel git repository for linux-2.6.  Then bisect
away.  At each stage:
    git bisect {good|bad}
    make mrproper
    cp kernel-2.6.15-1.1826.2.10_FC5/.config .	# from good rpmbuild
    make oldconfig  # give <Enter> default answers if necessary
    make
    cp ./arch/i386/boot/bzImage /boot/vmlinuz-$VER
    make modules_install
    /sbin/mkinitrd /boot/initrd-$VER.img $VER
    {boot single-user, see if udev starts}

In the beginning each stage took 2 hours; at the end, 1.5 hours.  1.1 GHz
Athlon Plain.

What else can I do to help diagnose and fix this problem?

Comment 12 Dave Jones 2006-02-24 02:52:31 UTC
great work!  Lets take this upstream, and get it sorted out there.
In the meantime, I'll drop it from the Fedora kernel.


Comment 13 John Reiser 2006-02-25 22:48:34 UTC
Created attachment 125261 [details]
sis900_wol_fix.diff

This patch from Daniele Venzano <venza> in LKML 2006-02-24 works
for me (my SiS 730 board boots again) when applied to kernel-2.6.15-1.1977_FC5.

Comment 14 Dave Jones 2006-03-04 22:28:14 UTC
patch is merged upstream, and in latest builds.