Bug 165620

Summary: 2.6.12-1.1398_FC fails to boot frequently
Product: [Fedora] Fedora Reporter: Bruce vanNorman <bruce>
Component: xorg-x11Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: FC5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-06-27 16:01:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bruce vanNorman 2005-08-10 20:38:59 UTC
Description of problem: IBM T22 Thinkpad hangs during boot process. Confirmed on
two different T22 platforms.


Version-Release number of selected component (if applicable):
kernel-2.6.12-1.1398_FC4 - doesn't seem to happen with kernel-2.6.11-1.1369_FC4


How reproducible: Intermittent. of 39 boots 15 hung just after "initializing
hardware   [OK]". Of the 24 that got past that point, 6 hung just after
"starting HAL daemon   [OK]"


Steps to Reproduce:
1. Power T22 on
2.let it boot to the default kernel (1398)
3.If T22 appears dead, wait 2 minutes, then force power off
  
Actual results: See "How reproducable" section


Expected results: user logon screen appears withing 60 seconds of power on.


Additional info: In about 1/3 of the hangs, the successful boot displays the
following:
Red Hat nash version 4.2.15 starting
mknod:  failed to create /dev/console:  17
mknod:  failed to create /dev/null:  17
mknod:  failed to create /dev/zero:  17
INIT version 2.85 booting

Comment 1 Bruce vanNorman 2005-08-10 21:04:53 UTC
- My T22's have Intel P-III 900mHz mobile processors, 256mB or more of PC-100
DRAM, and S3 Graphics Savage/IX 1024 vidio adapters (windows device description)
/ S3 Inc. 86C270-294 Savage/IX-MV (FC4 description).
- my instincts tell me that X-windows is at the heart of the matter. The points
of hang (when no hang occurs) have a lot of vidio activity - always random junk,
which straigtens out a couple of seconds later. A boot hang never has vidio
activity, the screen is always completely blank.
- The IBM T22 Thinkpad is described at http://www.ibm/com | products | certified
used equipment | notebooks | support & downloads.
- The explicit IBM model numbers (necessary to get the hardware PDFs) of my
T22's are T2648-ESU and T2647-8EU - either gets to the same documentation.
- I am quite willing to do any legwork you might want.

Comment 2 Bruce vanNorman 2005-08-10 22:07:17 UTC
I tried a number of boots of kernel-2.6.11-1.1369_FC4. After three tries, I got
one to fail. The "mknod" messages appeared on re-boot after the failure.

Comment 3 Dan Carpenter 2005-08-13 06:48:06 UTC
To be honest, the more interesting lines are probably the slightly prior to the
mknod errors.

Edit your /etc/grub.conf file and remove all the places that say "quiet".  Take
a digital photo of the crash.

Also make sure that you're using the most recent version of udev.



Comment 4 Bruce vanNorman 2005-08-16 22:10:57 UTC
(In reply to comment #3)
> To be honest, the more interesting lines are probably the slightly prior to the
> mknod errors.
> 
> Edit your /etc/grub.conf file and remove all the places that say "quiet".  Take
> a digital photo of the crash.
> 
> Also make sure that you're using the most recent version of udev.
===> 
Did the no quiet thing. Got lots of pages of output, which brings me to - how
does one take a digital photo of this. I am a nubie to Linux, but have been
around computers since 1953. I can also RTFM, if I knew which one :-)
> 



Comment 5 Bruce vanNorman 2005-08-17 00:47:40 UTC
Re: UP2DATE says that there are no packages available. RPM -q udev reports
Udev-058-1 is installed. I have no way to know if this is the most recent version.

Comment 6 Dan Carpenter 2005-08-17 05:05:29 UTC
Forget about taking a photograph of the screen, that only is useful for people
with a digital camera.

Udev-058-1 is the newest version.  

I think you may have the same bug as bug 157129  That bug is older so let's move
discussion over there.  

Comment 7 Bruce vanNorman 2005-08-17 16:32:31 UTC
Let's NOT move it over there. My report focuses on intermittent boot failures
which are not mentioned in the 157129 thread. The FC4 messages common to both
threads appear in the T22 reboot and harald a strong possibility of a successful
boot. I interpreted them as a symptom that FC4 was correcting the previous boot
problem so that the current boot might work. In the initial bug report I
included the mknod messages because they always followed the hanf and MIGHT be
related. Given the evidence presented in the other thread (thanks for pointing
it out), they probably are unrelated to my bug report.
---
I have the same build on four machines (now). The RPM's I've checked are all on
the same release. Only the IBM T22 laptops have the boot hang problem. I have no
objective evidence that the following factoids are related to my reported bug. I
have had some FC4 / ACPI problems with the T22's which I have been able to work
around with BIOS settings. If I use APM instead of ACPI (editing the GRUB
command line with ACPI=NO) I seem to not have the the boot hang problem. I need
more data to verify this. I will develop a sample set comparable to the one
already included in this report. The ACPI hangs did not occur with FC4 until I
updated the distribution. When going back to the original kernel build (via
GRUB) the bug is still present - which sort of indicates that the kernel is not
the problem. The provided list (bugzilla web page) of possible failing
components provided no other viable alternative.
---
I have just ordered another IBM refurbished laptop - a T23 this time. It is
almost identical to the T22 - 15% faster and PC133 SDRAM instead of PC100. When
it arrives I will install the FC4 CD's and test for my problem. If it is not
present, I will update the kernel, but nothing else and test again. It's arrival
is a couple of weeks off.
---
Regarding my misinterpretation of the "digital capture" - the text scrolls past
much too fast for a photograph of any kind, even on a pathetic (by todays
standards) 900mHz mobile Pentium III. Is there any way to route or pipe the text
stream to a file for post-mortem analysis?

Comment 8 Bruce vanNorman 2005-08-17 19:08:14 UTC
ACPI v/s APM testing
===
adding "acpi=off" to the GRUB command line from a cold start (using the power
button to initiate the boot) results in repeatable 100% hang just after
"starting HAL daemon .... [OK]" (17 attempts). Note: I visually confirmed that
the APMD deamon was started instead of ACPID.
---
booting with "acpi=off" from a warm start (FC4 requested restart) works every
time (9 attempts).
---
We cannot remember any boot hangs with either APM or ACPI when we were back on
FC3. I sort of recall that FC4 seemed to work better with ACPI than FC3, which
is why I began the FC4 migration. Unfortunatly I do not have detailed notes on
that. Any ideas as what else we can try, so as to identify the offending component?

Comment 9 Dave Jones 2005-08-26 22:38:47 UTC
Can you try the latest errata kernel in updates-testing please ?

Comment 10 Bruce vanNorman 2005-09-18 00:27:04 UTC
Have done this for bug 168062 (a different set of symptoms). SInce these are the
same machines, will do.
---
Latest observations - this problem may be x-org related. I can always avoid this
hang by moving the mouse just before x-org initializes. Just after the INIT msg
and just after the HAL started   [OK] msg. WIthout the mouse movement, the same
odds of a hang seem to prevail. Grandson came up with this :-).
---
This problem does not happen on my T23, which is slightly faster and has a
different BIOS.

Comment 11 Bruce vanNorman 2005-09-18 03:04:29 UTC
5 out 5 tries booting kernel 1455 were successful without mousing. Tried both
restart, and cold boot. Will keep a watch on it. xorg-x11 is 6.8.2-37.FC4.48.1. 

Comment 12 Bruce vanNorman 2005-09-18 18:29:03 UTC
xorg-x11 & friends becomming more likly suspect than kernel. 12 out 12 boots
(warm & cold) with kernel 1398 have been successful. I was trying to get a
control sample to better identify which package change removed the problem. I
think that the kernel is absolved. What do you think?
---
Wish I knew how to roll back the xorg-x11 bundle of packages to whatever version
I was running when I reported this bug. Assuming that RPM is the way to go, how
does one determine what version was running on a specific date? Also how does
one know the bundle of RPM's to roll back. I would like to do a more credible
job of testing. I was a mainframe software developer for over 30 years. My areas
were OS interrupt handlers (FLIC & SLIC) and later on, RTL (Run Time Library)
maintenance for compilers and interpreters. I know what I needed out of our
product testers and I assume you would want the same. TTFN

Comment 13 Dave Jones 2005-09-30 05:46:48 UTC
reassigning to xorg team. The summary of this bug should probably be updated to
reflect that this is most likely an xorg bug than a kernel bug.


Comment 14 Mike A. Harris 2006-06-27 16:01:40 UTC
Since this bugzilla report was filed, there have been several major
updates to the X Window System, which may resolve this issue.  Users
who have experienced this problem are encouraged to upgrade to the
latest version of Fedora Core, which can be obtained from:

        http://fedora.redhat.com/download

If this issue turns out to still be reproduceable in the latest
version of Fedora Core, please file a bug report in the X.Org
bugzilla located at http://bugs.freedesktop.org in the "xorg"
component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes that
become available for consideration in future updates.

Setting status to "CURRENTRELEASE".