Bug 242536 - [pata_via] Anaconda failed after formatting the / partition
Summary: [pata_via] Anaconda failed after formatting the / partition
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 8
Hardware: i386
OS: Linux
low
high
Target Milestone: ---
Assignee: Alan Cox
QA Contact: Brian Brock
URL:
Whiteboard:
: 249209 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-06-04 18:28 UTC by Craig Butler
Modified: 2009-01-09 07:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-09 07:07:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Dump File (51.56 KB, text/plain)
2007-06-04 18:28 UTC, Craig Butler
no flags Details
Anaconda dump from 7.91 install attempt (51.17 KB, text/plain)
2007-09-16 08:13 UTC, Jeff Schultz
no flags Details
lspci -vvxx on affected K7 system (7.11 KB, text/plain)
2007-09-18 03:10 UTC, Jeff Schultz
no flags Details
Anaconda Crash Dump (97.17 KB, text/plain)
2007-10-05 06:06 UTC, Craig Butler
no flags Details
Anaconda dump from attempt to upgrade to 7.92 (88.68 KB, text/plain)
2007-10-06 11:39 UTC, Jeff Schultz
no flags Details
Anaconda Crash Dump (55.61 KB, text/plain)
2007-12-22 07:01 UTC, Craig Butler
no flags Details

Description Craig Butler 2007-06-04 18:28:57 UTC
Description of problem:
Anaconda failed after formatting the / partition, produced a dump file and
instructed me to upload it to bugzilla

Comment 1 Craig Butler 2007-06-04 18:28:57 UTC
Created attachment 156106 [details]
Dump File

Comment 2 Craig Butler 2007-06-04 18:30:13 UTC
Running a 3ware Escalade ATA RAID controller 7506.  Worked fine under FC6

Comment 3 Jeremy Katz 2007-06-04 18:35:38 UTC
Did you verify your media according to the instructions at
http://rhlinux.redhat.com/anaconda/mediacheck.html?

Comment 4 Jeff Schultz 2007-06-05 05:50:58 UTC
I have a similar problem with an old K7 system.  Worked fine under FC6, fails in
various different ways during or immediately after anaconda attempts to format
the drive.

Failures include format hanging at about 90% done, an "uncaught exception" and a
pop-up telling me that I've "probably run out of disk space."  In the last of
these, the install attempted to continue, but failed later.

My media was verified before installation.

Comment 5 Craig Butler 2007-06-05 13:01:47 UTC
To answer Jeremy Katz, yes I verified my media before my first installation
attempt.  I also checked the SHA1SUM before burning the DVD.

Additional info:  I decided to re-install FC6 which (worked again flawlessly)
and then perform an Upgrade to F7 which fails frequently during the installation
of packages.  Sometimes it will install a few hundred packages; once only TWO
before failing.  I wonder if there is a problem with the 3w-xxxx driver in F7.

Comment 6 Christopher Brown 2007-09-13 20:33:48 UTC
Hello Craig,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. In the run-up to Fedora
8 it would be good if you could test this with Fedora 8 test 2, available from:

http://fedoraproject.org/wiki/Distribution/Download

This will help developers iron out installer bugs such as yours which are of
particular importance as they cannot be resolved in an update.

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 7 Craig Butler 2007-09-13 21:03:05 UTC
Hi Chris,

I'm glad to see you looking at this bug.  I'm not a programmer myself and thus
cannot do much to address it, however I'd be happy to test possible fixes.

As I mentioned in my earlier comments, I no longer suspect this is a bug with
anaconda, but rather the 3ware driver or its support in the kernel.  I'm not
absolutely certain however.  Perhaps the bug summary line should be changed?

In order to keep my machine running, I rolled back to FC6 and have been running
that since.

Thanks
Craig

Comment 8 Christopher Brown 2007-09-13 21:16:32 UTC
(In reply to comment #7)
> Hi Chris,
> 
> I'm glad to see you looking at this bug.  I'm not a programmer myself and thus
> cannot do much to address it, however I'd be happy to test possible fixes.

..and in the same way a triage nurse is not a doctor neither am I much good at
coding. :)

However there is a lot of kernel bugs filed and with Fedora 8 coming up its
important to whittle things down and remove ones that have been resolved.

> As I mentioned in my earlier comments, I no longer suspect this is a bug with
> anaconda, but rather the 3ware driver or its support in the kernel.  I'm not
> absolutely certain however.  Perhaps the bug summary line should be changed?
> 
> In order to keep my machine running, I rolled back to FC6 and have been running
> that since.

Is there a chance you can test Fedora 8 T2? It might even be helpful to see if
the live cd boots for you.

I've added a blocker bug for the next kernel release which will be a heads up
for the kernel developers who can then weigh in and ask you for more info if
possible.

Cheers
Chris

Comment 9 Jeff Schultz 2007-09-16 08:07:56 UTC
I've just tried 7.91 on the hardware that failed previously for me.  Same
symptoms, I'm afraid.  Anaconda failed copying the installation image to the
disk, then throws an exception.  I'll attach that.

Comment 10 Jeff Schultz 2007-09-16 08:13:42 UTC
Created attachment 196661 [details]
Anaconda dump from 7.91 install attempt

Comment 11 Christopher Brown 2007-09-16 19:56:59 UTC
Thanks for that Jeff. Looks like this is the error that causes things to fail:

07:43:24 CRITICAL: error transferring stage2.img: [Errno 5] Input/output error

although:

07:43:20 ERROR   : unable to set timezone

and

<6>sr 1:0:0:0: [sr0] Add. Sense: Logical unit communication CRC error (Ultra-DMA/32)
<4>end_request: I/O error, dev sr0, sector 5541080
<3>Buffer I/O error on device sr0, logical block 1385270
<3>Buffer I/O error on device sr0, logical block 1385271
<3>Buffer I/O error on device sr0, logical block 1385272
<3>Buffer I/O error on device sr0, logical block 1385273
<3>Buffer I/O error on device sr0, logical block 1385274
<3>Buffer I/O error on device sr0, logical block 1385275
<3>Buffer I/O error on device sr0, logical block 1385276
<3>Buffer I/O error on device sr0, logical block 1385277
<3>Buffer I/O error on device sr0, logical block 1385278
<3>Buffer I/O error on device sr0, logical block 1385279

indicate that perhaps your CMOS battery has failed (although the power supply
should keep system time until poweroff) and the second error seems like either
bad data cable, perhaps truncating the install partition and causing the
installation to fall with too little space. These are just speculations however.

I see the hard drive is 40GB - are you using all of this for the install,
formatting and re-installing?

Comment 12 Jeff Schultz 2007-09-17 00:12:43 UTC
Thanks for that.  The machine is about seven years old so a dead CMOS battery
would be unsurprising.  However, it keeps time, even when off for a day or more,
so if its battery is dying, it's not quite dead yet.  The machine's in use (with
FC6) so doing trial F7/8 installs on it is a nuisance.  Reinstalling FC6 after
Anaconda's mucked around with the disk works flawlessly.

Disk is fully allocated.  Usual /boot, 768MB swap, and the rest is /.

It looks like some driver problem with the old hardware.  I can swap cables and
battery, but I'm reluctant to have to do a full FC6 install and configure again
on that basis.  If there's some way to test the F7/8 drivers without trashing
the disk I'll give that a go.

Comment 13 Christopher Brown 2007-09-17 09:49:21 UTC
Other than a live cd its difficult to test and replicate. If its keeping time
after being turned off (and unplugged) then it won't be the CMOS battery - those
things can last for ages to be fair.

You can get the latest live cd from:

http://fedoraproject.org/wiki/Distribution/Download

It'd be disappointing if something has changed to *stop* it running on your
hardware. Are you running the same 3ware controller Craig mentioned? it would be
good to get an lspci -vvxx output attachment if you can.

Cheers
Chris

Comment 14 Chuck Ebbert 2007-09-17 18:39:21 UTC
(In reply to comment #10)
> Created an attachment (id=196661) [edit]
> Anaconda dump from 7.91 install attempt
> 

It is having problems reading the DVD. Was the media verified on the machine
that was attempting the install? And is there really a 40-wire cable on the hard
drive and an 80-wire cable on the DVD?


Comment 15 Jeff Schultz 2007-09-18 03:10:26 UTC
Created attachment 198011 [details]
lspci -vvxx on affected K7 system

Here's the output from lspci -vvxx, running FC6.  I'll try booting it from a
F7.91 Live CD later.

Comment 16 Jeff Schultz 2007-09-18 05:21:19 UTC
Tried booting from F7.91 Live CD.  Failed the first time---I got a login screen,
but although it claimed to be about to log in as fedora "in 0 seconds" it just
sat there.  Second attempt worked.  Could mount /boot from the FC6 installation,
but not /.  Just complained that it was "busy."

Comment 17 Jeff Schultz 2007-09-18 05:58:34 UTC
Oops.  Forgot that / was a logical volume.  It mounts okay, so the Live CD can
see the disk well enough.

Comment 18 Christopher Brown 2007-09-18 09:34:55 UTC
I would recommend ensuring that you do have the 80-wire cable chuck mentioned as
well as doing a media check (boot with linux mediacheck) to make sure the disk
is good. If either yourself or Craig can test with an install once this is done
it would be great, as at the moment this is a release blocker.

Cheers
Chris

Comment 19 Jeff Schultz 2007-09-18 10:19:16 UTC
They're different machines, so whether mine has a particular cable is not
relevant to Craig's.  In any case, I've pulled it out of the stack and checked.
 Both the hard disk and the DVD are on separate 40 wire IDE cables.  As it's
happily run FC6, FC5, FC4, and various RH predecessors, and can't install F7/8 I
think its cabling is not the problem.

Comment 20 Christopher Brown 2007-09-18 11:25:17 UTC
Okay, I'm re-assigning to the relevant maintainer and they will hopefully be
able to shed some more light on the problem.

Cheers
Chris

Comment 21 Alan Cox 2007-09-18 14:43:33 UTC
<6>sr 4:0:0:0: SCSI error: return code = 0x08000002
<6>sr0: Current: sense key: Hardware Error
<6>    Additional sense: Timeout on logical u

Thats a media error or drive failure.

While 
<6>sr 1:0:0:0: [sr0] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
<6>sr 1:0:0:0: [sr0] Sense Key : Hardware Error [current] 
<6>sr 1:0:0:0: [sr0] Add. Sense: Logical unit communication CRC error (Ultra-DMA/32)

looks like a pata_via problem or drive hiccup


Comment 22 Chuck Ebbert 2007-09-18 14:51:04 UTC
(In reply to comment #19)
> They're different machines, so whether mine has a particular cable is not
> relevant to Craig's.  In any case, I've pulled it out of the stack and checked.
>  Both the hard disk and the DVD are on separate 40 wire IDE cables.  As it's
> happily run FC6, FC5, FC4, and various RH predecessors, and can't install F7/8 I
> think its cabling is not the problem.

It looks like the cable is misdetected. It detects 80-wire on the second channel
but you said it is really 40-wire. There was a bug in there that was fixed very
recently (Sep 10), so this may be okay now.

Comment 23 Jeff Schultz 2007-09-18 22:45:18 UTC
The installation media tests okay.  (Same symptoms with F7 media when I tried
that .)

About all I can add about the hardware is that the DVD drive is fairly recent
compared with the rest of it and I suppose it may not be understood properly by
the BIOS.

If there's a potential fix from the 10th, how do I test it?

Comment 24 Chuck Ebbert 2007-09-18 22:55:10 UTC
(In reply to comment #23)
> The installation media tests okay.  (Same symptoms with F7 media when I tried
> that .)
> 
> About all I can add about the hardware is that the DVD drive is fairly recent
> compared with the rest of it and I suppose it may not be understood properly by
> the BIOS.
> 
> If there's a potential fix from the 10th, how do I test it?

Try the rawhide boot/rescue discs from

http://download.fedora.redhat.com/pub/fedora/linux/development/i386/os

(Some days there are no boot iso images, just try again the next day.)


Comment 25 Jeff Schultz 2007-09-18 23:07:14 UTC
Thanks Chuck.  I'll try that then.  There are no images there at the moment though.


Comment 26 Jeff Schultz 2007-09-19 13:56:56 UTC
Please forgive my ignorance, but there's a 9.7MB images/boot.iso there now.  Is
that what you mean?

Comment 27 Christopher Brown 2007-09-19 14:45:36 UTC
Thats the one. Burn it like any other .iso file and boot from it. You can then
install from a list of sources such as http, local nfs share etc.

Cheers
Chris

Comment 28 Craig Butler 2007-09-19 17:35:28 UTC
I've tried the LiveCD on my server with the 3ware RAID controller which is
currently running FC6 / LVM / ext3 without a problem.  While it does recognize
the controller and the logical drives, and creates sdb1 and sdb2 (/boot and /),
it does not recognize any filesystem on sdb2. 

Comment 29 Christopher Brown 2007-09-23 18:58:34 UTC
*** Bug 249209 has been marked as a duplicate of this bug. ***

Comment 30 Craig Butler 2007-10-05 06:06:32 UTC
Created attachment 217071 [details]
Anaconda Crash Dump

Attempted to UPGRADE my working FC6 server with 7.91 and it failed attempting
to copy the image file to the disk.  Attached is the file.

Comment 31 Jeff Schultz 2007-10-06 11:39:53 UTC
Created attachment 218381 [details]
Anaconda dump from attempt to upgrade to 7.92

There was some possibility that the driver problem was fixed in 7.92.  The
attached dump is the result of trying to upgrade a working FC6 install on the
target system with (tested media) of 7.92.

Comment 32 Will Woods 2007-10-24 21:29:55 UTC
Can you try booting the media with 'libata.dma=1' on the boot command line? And
if that doesn't work, try 'libata.dma=0'? These can be used to help work around
problems with some flaky optical drives.

Comment 33 Alan Cox 2007-10-24 23:30:10 UTC
Note there isn't any point putting this in the blocker list, its a collection of
assorted random unrelated bug reports


Comment 34 Alan Cox 2007-10-24 23:33:59 UTC
Jeff - your trace seems to be the drive saying the media is faulty, and Craig's
is the drive reporting a hardware error and timeout, which again is probably a
media problem.

Still be interesting to know what libata.dma=1 does


Comment 35 Jeff Schultz 2007-10-24 23:46:32 UTC
Alan, I'll try the libata.dma=0/1 experiment and let you know.  It's clear that
there's something at best "unexpected" about this hardware, but it's also clear
that between FC6 and F7 something was changed that's exposing the problem.  (If
I had to guess, I'd look to the 1999 BIOS it's running.)

I would note that the machine in question has so far failed four or five DVDs,
from F7 release on, and that that F7 disk has been used to install on other
hardware without trouble.  Of course, the fault could be in the DVD drive
itself, but each time anaconda has trashed the hard disk I've successfully
reinstalled FC6 from DVD.

Comment 36 Craig Butler 2007-10-25 02:28:46 UTC
I add my voice to Jeff's.  My RAID controller and drives are working just fine
with FC6, but something changed in 7+ that keeps my storage subsystem from being
stable.

Comment 37 Bill Nottingham 2007-10-25 14:13:01 UTC
FWIW, I have a box that uses pata_via for a hard drive and a CD drive (K8M800
chipset), and it works OK in F8; so it's not something common to all pata_via
installs.

Comment 38 Will Woods 2007-10-25 15:43:08 UTC
I'm going to have to defer to Alan here - this is kind of a hodgepodge of bug
reports and available evidence suggests that most pata_via systems work fine.
Removing from blocker.

Comment 39 Jeff Schultz 2007-10-26 13:07:06 UTC
I've tried F7.92 with libata.dma=1 (and =0) and all that happens is that I get a
driver selection dialogue.  None of the PATA drivers offered work, dropping me
back into the selection dialogue after loading.  Any other suggestions welcome.

Comment 40 Alan Cox 2007-10-26 17:14:51 UTC
Jeff's trace is clearly showing that the drivers found the hardware and the CD
drive reported an error transferring the bits for stage2.img. Unless the
libata.dma= traces looks different (care to attach it) I'm putting it down as a
media/hardware fault for now.





Comment 41 Jeff Schultz 2007-10-28 02:08:44 UTC
Alan, it's definitely not a media fault.  Every post-FC6 installer
disk I try gives the same family of problems and some of these are
known-good media that have been used for installs on other machines.

Could be a DVD drive fault, but it works flawlessly for the FC6
installer, so I doubt it.  This is a regression that occured with F7.
Whether it matters enough to F8 is not my call, but as FC6 is about to
EOL, being able to upgrade has a certain necessity to it.

(It would not surprise me if the combination of a relatively recent
DVD drive and a 1999 MB and BIOS on this machine is exercising some
less common pathways.  I've actually got a replacement CPU/MB ready to
go in this, but I've left it as is in case Fedora wants to track down
the regression.)


I'll happily attach any traces you tell me how to get.  I didn't see
any option to save anything from the installer dialogues yesterday.
With either boot option of libata.dma=0/1, anaconda asks where to get
the installer files from and then announces that it can't find such a
device and puts up a chooser for device drivers.


Comment 42 Craig Butler 2007-10-28 05:40:19 UTC
As this is the bug I reported originally, I rearranged my schedule so I could
spend some time on it.  I migrated all my data off the server with the 3ware
controller (a very popular controller for Linux use) and tried again.

Keep in mind that it was working perfectly with FC6 and prior.  Multiple
installs with FC6 media and prior have worked without a hitch every time.

First try: I attempted an Upgrade install over FC6.  This surprised me because
it ran to completion without dying.  After rebooting however, I had nothing but
kernel exceptions.

Second try: I attempted a Full install, removing ALL partitions on the (logical)
drive.  It died during formatting of the root partition.  I didn't get a crash
report, just a message saying: An error occurred trying to format
VolGroup00/LogVol00.  This problem is serious, and the install cannot continue.
 Press <Enter> to exit the installer.

Third and Fourth try: I attempted a Full install specifying the libata.dma=0/1
during boot and got the same issues Jeff Schultz had.  Anaconda stayed in text
mode and asked me where to get the installer files from, I specify local CD/DVD
and it then announces that it can't find such a device and puts up a chooser for
device drivers.  No luck.

After the third and fourth tries, the system was left in a state such that a
warm reboot would not initialize the 3ware card and BIOS init halted there.  I
had to power-off to get it to initialize normally.  Thinking that perhaps the
controller was already in a weird state for try #2, I power-cycled again and
re-tried a Full install, removing all Linux partitions this time.  Same exact
error as #2 above.

Looks like I'm stuck at FC6.

Comment 43 Chuck Ebbert 2007-10-29 19:06:22 UTC
Try the workarounds:

https://fedoraproject.org/wiki/KernelCommonProblems

Mainly nomsi/nommconf, and clocksource/nohz/highres



Comment 44 Jeff Schultz 2007-10-30 08:25:07 UTC
pci=nomsi,nommconf produces the same sort of error.



Comment 45 Craig Butler 2007-12-22 07:01:26 UTC
Created attachment 290286 [details]
Anaconda Crash Dump

Comment 46 Craig Butler 2007-12-22 07:03:10 UTC
Above new dump file came from an install attempt with the release version of
Fedora 8

Comment 47 Chuck Ebbert 2008-01-02 18:31:32 UTC
Can you get a dump when using libata.dma=1? Somehow that is breaking DVD drive
detection but we need to see the kernel messages.

Comment 48 Craig Butler 2008-01-23 18:55:47 UTC
(In reply to comment #47)
> Can you get a dump when using libata.dma=1? Somehow that is breaking DVD drive
> detection but we need to see the kernel messages.

When I try this, I get the following error dialog while formatting:
"An error occurred trying to format VolGroup00/LogVol00.  This problem is
serious, and the install cannot continue.  Press <Enter> to exit the installer."

When I press <Enter> it reboots the system.

If you could give me the proper procedure for getting a dump file with this
scenario, I'd be glad to.

I have discovered Novell SLES 10 SP1 works perfectly.


Comment 49 Bug Zapper 2008-05-14 12:47:44 UTC
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 50 Bug Zapper 2008-11-26 07:17:21 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 51 Bug Zapper 2009-01-09 07:07:18 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.