Bug 101160

Summary:	RH9 install aborts during package installation
Product:	[Retired] Red Hat Linux	Reporter:	Bob Hockney <bhockney>
Component:	kernel	Assignee:	Dave Jones <davej>
Status:	CLOSED WONTFIX	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	9	CC:	jhmail, pfrields, russell.c
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2004-08-25 16:11:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bob Hockney 2003-07-29 18:49:27 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.1.4322)

Description of problem:
When I try to do a new install of RH9 the process aborts during package 
installation with a message that it had a problem installing a package, and 
suggesting a disk problem.  There is also garbage on the display about a rpmdb 
problem and suggesting the recovery be run.  I can consistently reproduce the 
problem.

I have checked the media, and have also used it succesfully on other machines.  
I have also checked the HD by using a CD distro.

I am really not sure what is causing the problem, but here is a brief 
description of my system and procedure:

AMD Athlon 2000
VIA Apollo KT266 chipset
512MB RAM
Recent BIOS update (Phoenix)
ATI Radeon 7000 AGP video
Two Maxtor HDDs

Another OS is installed with multiple partitions.

Floppy boot, FTP install

linux dd ide=nodma noathlon

Install proceeds normally until one of two points.  Originally I used an 
existing partition and reformatted it, in which case it fails while trying to 
install packages with messages as described above.  When I tried to check for 
bad blocks before formatting it faild with a similar message about bad media.  
However, I thoughoghly checked out the drive and found no problems.

I can consistently reproduce this behavior.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Attempt install on my particular machine as described above.
2.
3.
    

Additional info:

Comment 1 Michael Fulbright 2003-07-30 18:18:08 UTC

If you switch to VC4 (cntl-alt-f4) when the error occurs see if you see alot of
read/write errors to one of your hard drives.

Comment 2 Joe Harrington 2003-08-25 21:07:49 UTC

I have had the same behavior.  Athlon 2800+, Giga-Byte GA-7N400 Pro motherboard
(nVidia nForce 2 chipset, latest bios version F11), both new.  I have made about
100 attempts and have not yet successfully installed Linux (I've done it many
times on other systems).

For this box, I have tried several different hard drives, some new and some used
in other machines for a long time.  I have checked all install media, both
during the install and by installing other machines with the same media.  I have
used two different sets of media.  I have installed from a new SONY USB DVD
writer and an older IDE DVD-ROM drive.  I could not boot from the USB CD
directly (it worked perhaps 1 in 100 attempts).  I was able to boot it from a
floppy and install until disk 2, when I got a nice error suggesting a disk full
or failure.  Another attempt crashed in the middle of disk 1.  All other
attempts have crashed in disk 1 as well.  Either I get an MD5 error and a nice
box saying to start over, or a kernel panic.

For the panic, going to C-A-F2 reveals that the boot prompt is there and
responds to returns, but then hangs as soon as a command is typed.

The C-A-F4 messages (last screen) of the most recent failure:

Code: 8a 18 74 24 8b 42 04 89 43 04 89 18 c7 42 00 00 00 00 c7
 <1>Unable to handle kernel paging request at virtual address a3e6c1a4
<4> printing eip:
<4>f8893636
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU:    0
<4>EIP:    0060:[<f8893636>]    Not tainted
<4>EFLAGS: 00010287
<4>
<4>EIP is at  (2.4.20-8BOOT)
<4>eax: a3e6c1a4   ebx: e1268e40   ecx: f63f5870   edx: a3e6c1a4
<4>esi: a3e6c1a4   edi: 00000000   ebp: 00000001   esp: f5241e90
<4>ds: 0068   es: 0068   ss: 0068
<4>Process kjournald (pid: 143, stackpage=f5241000)
<4>Stack: f63f58e4 00000000 00000f44 f378d0bc 00000000 e3e6c150 dd796720 0000099d
<4>       e71571e0 e1268e40 e1268ea0 e1268de0 e1268d80 e1268d20 e1268cc0 e1268420
<4>       e12683c0 e1268360 e1268300 e12682a0 e1268240 e12681e0 e1268180 e1268120
<4>Call Trace:   [<f8895d9c>]  (0xf5241fc0))
<4>[<f8895c6c>]  (0xf5241fd8))
<4>[<f8895c7c>]  (0xf5241fe8))
<4>[<c0106fc1>]  (0xf5241ff0))
<4>
<4>
<4>Code: 8a 18 74 24 8b 42 04 89 43 04 89 18 c7 42 00 00 00 00 c7
<4> ^[[[D^[[[D

Note also that when I did this from the USB CD, one of the other screens
included messages about the USB filesystem being unstable.  I didn't understand
why this might be the case.

In searching for this bug, I noticed similar complaints in prior RH releases,
all from athlon people.  This is my first AMD CPU, and I haven't seen similar
problems before.

--jh--

Comment 3 Bob Hockney 2003-08-26 15:52:11 UTC

In my particular case there were no error messages on VC4.

With some experimentation I first found that I could install if I formatted the 
drive as ext2 instead of ext3.  However, after only a short period of use (30 
minutes) I had a partition corrupted beyond meaningful use or repair.  There 
were no hardware related messages in /var/log/messages.

I then found I could install if I slowed down the hdd at the first opportunity 
during install (on VC2 hdparm -X68) from ATA 100 (mode 5) (it's maximum) to ATA 
66. (mode 4)  Even though I gave the nodma parameter on the boot line DMA was 
apparently re-enabled.  Anyway, at this speed I am able to install and do not 
experience corruption during use if I slow down the drive during startup.

I have two Maxtor drives in the system, hda is capable of ATA 100 and hdb is 
capable of ATA 133.  I can install and use RedHat on hdb at ATA 133 (and ATA 
100) without problem, but I cannot use hda at ATA 100 under RedHat.

I never saw any log messages associated with the problems.

I am able to use hda at 100 with another OS on the same machine without problem.

-Bob

Comment 4 Joe Harrington 2003-08-26 18:27:15 UTC

I just tried reducing the UDMA mode with hdparm as suggested, to no avail, but
it still fails, most recently with Oops: 0002, process anaconda.  I am
suspecting hardware, since this install is so generic that others should have
seen a software problem.

--jh--

Comment 5 Michael Fulbright 2003-09-03 19:17:59 UTC

This appears to be related to the kernel drivers.

Comment 6 Joe Harrington 2003-10-28 23:51:00 UTC

In the mean time, I sent back that motherboard and tried another one, same
problem.  The only common hardware component is the CPU, but it runs for days on
Morphix.  The new board is an Asus A7V600, with the Via KT600 chipset.  The old
board had an nVIDIA nForce 2 chipset.  The crashes were variously blamed (in the
VC messages) on anaconda, mini-wm, and the kupdated.

Arjan, are we likely to see any alternative boot images any time soon?  I can't
load Red Hat, and I've been trying since early August!  Since this problem is
keeping people from even loading Red Hat, I suggest raising the priority.

--jh--

Comment 7 Joe Harrington 2003-10-29 00:01:00 UTC

...also, some (not all) of the crash messages said something like "Kernel BUG in
buffer.c"

--jh--

Comment 8 Russell Cutcliffe 2003-12-21 11:32:52 UTC

I have also been having crashes during Disk 1 install.  In Anaconda, 
the Anaconda traceback was hiding any panic message, but when I switched
to text mode I got similar results to what's shown here..
<1>Unable to handle kernel paging request at virtual address 236ece21
<4> printing eip:
<4>cc892254
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU:    0
<4>EIP:    0060:[<cc892254>]    Not tainted
<4>EFLAGS: 00010202
<4>
<4>EIP is at  (2.4.20-8BOOT)
<4>eax: 00000001   ebx: 236ece09   ecx: c53ef550   edx: c53ef550
<4>esi: c0009150   edi: c0009150   ebp: c1af63f0   esp: c7551e90
<4>ds: 0068   es: 0068   ss: 0068
<4>Process kjournald (pid: 132, stackpage=c7551000)
<4>Stack: c17f7ed4 00000000 00000000 00000000 00000000 cac2e820
c1af6240 0000022d
<4>       c539d150 c539d1b0 c539d210 c539d270 c539d2d0 c534b490
c534b4f0 c53ef610
<4>       c53ef670 c53ef6d0 c53ef730 c53ef790 c53ef7f0 c53ef850
c53ef8b0 c539dab0
<4>Call Trace:   [<cc894d9c>]  (0xc7551fc0))
<4>[<cc894c6c>]  (0xc7551fd8))
<4>[<cc894c7c>]  (0xc7551fe8))
<4>[<c0106fc1>]  (0xc7551ff0))
<4>
<4>
<4>Code: 8b 43 18 a9 04 00 00 00 8b 7f 1c 75 19 83 e0 02 0f 84 3a 0d
<4>

My machine is quite different from the others here, being a Celeron
333 with a Intel BX chipset.  Disks are SCSI vai a Megaraid card and
CDROM is SCSI via a AHA1542.  These are loaded from the drivers disk.
The hardware is capable of passing many test cycles with no problems.