Bug 157175 - LVM Input/Output error during boot with recent kernel. (ALI IDE regression)
LVM Input/Output error during boot with recent kernel. (ALI IDE regression)
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Alan Cox
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-08 06:02 EDT by Martin Garton
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-06-05 21:56:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Martin Garton 2005-05-08 06:02:14 EDT
kernel fails to boot properly due to input/output error.

This started with kernel kernel-2.6.11-1.1282_FC4 and is still a problem with
kernel-2.6.11-1.1287_FC4.

The last working kernel for me was kernel-2.6.11-1.1276_FC4 (which I am using now)

During boot, I get the message:

Setting up Logical Volume Manager /dev/hda2: read failed after 0 of 512 at
2989876384: Input/output error

This message appears first just after the message "Reading all physical volumes.
This may take a while" and again just before the "Checking filesystems" message.

It then goes on to do a fsck (even though previously shut down cleanly.)  The
automatic fsck fails and manual fsck reports various errors before hanging
completely part way through. (I left it for over two hours)  Once hung the ide
light stays on but no disc activity seems to be going on.

Looking at possible causes, I see "Patch1710: linux-2.6.12rc-ac-ide-fixes.patch"
which looks like a possibility. However, I have not yet managed to rebuild a
kernel-2.6.11-1.1287_FC4 rpm (with or without that patch) due to build errors
that I don't yet understand.

My machine is a thinkpad r40e. hda is a "FUJITSU MHT2030AT, ATA DISK drive"
Comment 1 Martin Garton 2005-05-08 12:45:39 EDT
After overcoming my build problem I can confirm that kernel-2.6.11-1.1287_FC4
boots and works fine if I revert "Patch1710: linux-2.6.12rc-ac-ide-fixes.patch"

With help, I would be willing to help narrow this down further. Looking at the
patch I'm not sure where to start.

Additional possibly useful information snippet (from lspci):

00:0f.0 IDE interface: ALi Corporation M5229 IDE (rev c4)
Comment 2 Martin Garton 2005-05-18 05:52:31 EDT
Is anyone willing to help me nerrow down the cause of this problem? I would like
to try and fix it before FC4 final if possible.  If this is (as it seems) an ide
problem it could prevent FC4 final installing at all on some machines.
Comment 3 Martin Garton 2005-05-28 07:56:11 EDT
I can confirm this is still a problem in kernel-2.6.11-1.1363_FC4. It still
breaks my filesystem, and reverting "Patch1710:
linux-2.6.12rc-ac-ide-fixes.patch" fixes it again.

Any suggestions?
Comment 4 Alan Cox 2005-05-28 08:19:49 EDT
Thanks.  The ALi IDE controller has some serious problems with large disks. The
kernel tries to handle this and I'll go check whether I've got a mismerge
somewhere in the patch sets. What drive is attached ?
Comment 5 Alan Cox 2005-05-28 08:29:41 EDT
Hold on - 2.6.12rc IDE patches are being applied to 2.6.11 ? Thats going to be
interesting to say the least anyway.

In the 2.6.12rc case the only merge material related to the lba limit are the
following chunks. Can you drop these two chunks and see if it makes any
difference. If it does can you see which one makes the difference.

I also need to know what is logged in the kernel log on the other consoles when
it fails.

(Bugzilla will whack the formatting but they should be easy to find in the real
diff to edit)

diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.
--- linux.vanilla-2.6.12rc5/drivers/ide/ide-disk.c      2005-05-27 15:37:06.000
+++ linux-2.6.12-rc5/drivers/ide/ide-disk.c     2005-05-27 15:42:21.000000000 +
@@ -475,13 +479,14 @@
                                                                                
 static inline void idedisk_check_hpa(ide_drive_t *drive)
 {
-       unsigned long long capacity, set_max;
+       unsigned long long capacity, set_max = 0;
        int lba48 = idedisk_supports_lba48(drive->id);
                                                                                
+
        capacity = drive->capacity64;
        if (lba48)
                set_max = idedisk_read_native_max_address_ext(drive);
-       else
+       if (set_max == 0)       /* LBA28 or LBA48 failed */
                set_max = idedisk_read_native_max_address(drive);
                                                                                
        if (set_max <= capacity)
@@ -494,7 +499,8 @@
                         capacity, sectors_to_MB(capacity),
                         set_max, sectors_to_MB(set_max));
                                                                                
-       if (lba48)
+       /* Some maxtor support LBA48 but do not accept LBA48  set max... */
+       if (lba48 || set_max < (1ULL << 28))
                set_max = idedisk_set_max_address_ext(drive, set_max);
        else
                set_max = idedisk_set_max_address(drive, set_max);
                                                                                
                                                                                
                                                                                
                                                                                
Comment 6 Alan Cox 2005-05-28 08:49:17 EDT
Ok not sure what the underlying trigger is but the base bug is a known 2.6.11
bug and I'm suprised it worked without that diff (though I don't know what DaveJ
has and has not applied). I can't duplicate the problem here with 2.6.12rc and
that would make sense as the underlying bug is fixed there and in 2.6.11.11

What -ac does do is get the geometry data right for large disks that have been
clipped and maybe thats what makes the difference to hitting the error case.

Anyway the fix needed is already in 2.6.12-rc* and 2.6.11.11 and quoted below
(see the 2.6.11.11 patch for the actual diff pieces). Applying 2.6.12rc -ac
diffs to 2.6.11 wouldn't include this fix because the 2.6.12-rc diff is against
a 2.6.12rc with the bug already fixed.

--(From 2.6.11.11 message)

diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -133,6 +133,8 @@ static ide_startstop_t __ide_do_rw_disk(
        if (hwif->no_lba48_dma && lba48 && dma) {
                if (block + rq->nr_sectors > 1ULL << 28)
                        dma = 0;
+               else
+                       lba48 = 0;
        }
 
        if (!dma) {
@@ -146,7 +148,7 @@ static ide_startstop_t __ide_do_rw_disk(
        /* FIXME: SELECT_MASK(drive, 0) ? */
 
        if (drive->select.b.lba) {
-               if (drive->addressing == 1) {
+               if (lba48) {
                        task_ioreg_t tasklets[10];
 
                        pr_debug("%s: LBA=0x%012llx\n", drive->name, block);
Comment 7 Martin Garton 2005-05-28 09:40:48 EDT
Thanks for the help Alan.

Your first patch does indeed give me the same problem. I am testing to see which
part of it now and will report back shortly.

Your second patch seems to not in fact be the underlying cause because it is
already included in kernel-2.6.11-1.1363_FC4 (davej has applied
patch-2.6.12-rc2.bz2 which includes it)

In case it is still useful info my drive is:
hda: FUJITSU MHT2030AT, ATA DISK drive

Comment 8 Martin Garton 2005-05-28 09:46:40 EDT
Actually, I probably don't just mean patch-2.6.12-rc2.bz2 but
patch-2.6.12-rc[12345] are included anyway and it's in there somewhere)
Comment 9 Martin Garton 2005-05-28 12:09:03 EDT
With just the first part of your first patch applied, things are fine.

With just the second part things break.

this part seems to be what is breaking it:

@@ -494,7 +499,8 @@
                         capacity, sectors_to_MB(capacity),
                         set_max, sectors_to_MB(set_max));
                                                                                
-       if (lba48)
+       /* Some maxtor support LBA48 but do not accept LBA48  set max... */
+       if (lba48 || set_max < (1ULL << 28))
                set_max = idedisk_set_max_address_ext(drive, set_max);
        else
                set_max = idedisk_set_max_address(drive, set_max);


I am about to recheck my findings just to be sure.
Comment 10 Martin Garton 2005-05-28 14:53:54 EDT
I have tested this again and confirmed the same result starting from
kernel-2.6.11-1.1363_FC4 and just reverting that one line change.  It works fine.
Comment 11 Alan Cox 2005-05-28 20:11:28 EDT
Thanks thats what I was hoping after I looked at it because the if is wrong.
Please change the if to read

    if (lba48 && set_max >= (1ULL << 28))

and I think your drive will be happier

Alan
Comment 12 Martin Garton 2005-05-29 06:50:06 EDT
My drive is much happier now.

Thanks very much Alan.

Presumably this will be included in the FC packages before FC4 release? (Dave?)
Comment 13 Martin Garton 2005-06-01 11:49:24 EDT
I'm guessing this fix has missed the FC4 release now.

Is there a workaround that would allow to at least install without this fix?
Comment 14 Dave Jones 2005-06-01 17:25:07 EDT
Does booting with ide=nodma work around this ?
Comment 15 Martin Garton 2005-06-02 12:37:53 EDT
apparently not.

I didn't see the original error, but instead I got several like:

hda: task_in_intr: error=0x10 { SectorIdNotFound }, LBAsect=58604877,
sector=58604877
ide: failed opcode was: unknown

and then I get the usual fsck and failed boot.
Comment 16 Martin Garton 2005-06-02 12:39:46 EDT
Sorry ignore that last comment.

It still does not work and the problem is the same as before with ide=nodma.

Comment 17 Martin Garton 2005-06-05 12:32:48 EDT
I spotted that this fix has been applied kernel-2.6.11-1.1369_FC4. I just want
to say thanks and confirm that it now Works For Me.
Comment 18 Dave Jones 2005-06-05 21:56:34 EDT
This got accepted for the final FC4 release kernel too, so I'll close this out.

thanks for testing.

Note You need to log in before you can comment on or make changes to this bug.