Bug 125524

Summary:

kernel panic when attempting to umount a pulled USB floppy with ext2

Product:

Red Hat Enterprise Linux 3

Reporter:

Gary Lerhaupt <gary_lerhaupt>

Component:

kernel

Assignee:

Pete Zaitcev <zaitcev>

Status:

CLOSED WONTFIX

QA Contact:

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.0

CC:

aviro, barryn, fhirtz, petrides, riel, sct, tao, us_linux_engineering

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-09-01 22:15:59 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Kernel Panic	none

Description Gary Lerhaupt 2004-06-08 14:59:38 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
H010818; .NET CLR 1.0.3705)

Description of problem:
If you mount an ext2 formatted USB floppy drive on RHEL3, pull the 
floppy usb cable out of the system, and then try to umount the 
floppy, a kernel panic will result.

It is being caused by a coded BUG() in /drivers/block/ll_rw_blk, line 
1014 in __make_request:

if (!buffer_mapped(bh))
    BUG();

However, even though the same BUG is coded into RHEL21, this issue 
did not occure on the 2.4.9-e.34 kernel.  We have seen this on both 
RHEL3gold and U2.

If the floppy is formatted fat, then the kernel panic doese not occur.

Version-Release number of selected component (if applicable):
2.4.21-15.EL

How reproducible:
Always

Steps to Reproduce:
1. Plug in USB floppy with ext2 filesystem
2. mount USB floppy
3. Pull floppy cable
4. umount floppy
5. observe kernel panic

Actual Results:  kernel panic

Expected Results:  no kernel panic

Additional info:

Comment 1 Gary Lerhaupt 2004-06-08 15:00:53 UTC

Note that this is Issue Tracker#Issue 40475 which has been escalated 
by dmaley.  However, since I have seen no status, I created this 
bugzilla.

Comment 2 Pete Zaitcev 2004-06-23 21:57:08 UTC

I see we've already focused on the reproduction. Whatever happened
to our vaunted first-fault first-look first-fix strategy?
Dell did not even collect an oops traceback for us, let alone
a netdump core.

Obviously, everything works dandy here [see below], so I expect
a little bit of trouble for reproducing engineers in Centennial once
they found the USB floppy.

[root@ithil /]# mount /dev/sdc1 /mnt/tmp
[root@ithil /]# cp /boot/vmlinuz-2.4.21-15.ELsmp /mnt/tmp
[root@ithil /]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              4032124   1663896   2163400  44% /
none                   1999292         0   1999292   0% /dev/shm
/dev/sdb3            233856796   2267784 219709748   2% /q
/dev/sdc1                15475      1331     13345  10% /mnt/tmp
[root@ithil /]# sync
[root@ithil /]# 
  <=============================== pulled here
[root@ithil /]# dmesg | tail
 I/O error: dev 08:20, sector 0
 unable to read partition table
sdc : READ CAPACITY failed.
sdc : status = 1, message = 00, host = 0, driver = 08
Info fld=0xa00 (nonstd), Current sd00:00: sense key Not Ready
sdc : block size assumed to be 512 bytes, disk size 1GB.
sdc: test WP failed, assume Write Enabled
 sdc: I/O error: dev 08:20, sector 0
 I/O error: dev 08:20, sector 0
 unable to read partition table
[root@ithil /]# umount /mnt/tmp
[root@ithil /]#
[root@ithil /]# dmesg | tail -5
sdc: test WP failed, assume Write Enabled
 sdc: I/O error: dev 08:20, sector 0
 I/O error: dev 08:20, sector 0
 unable to read partition table
 I/O error: dev 08:21, sector 2
[root@ithil /]# cat /proc/version
Linux version 2.4.21-15.ELsmp (bhcompile.redhat.com) (gcc
version 3.2.3 20030502 (Red Hat Linux 3.2.3-34)) #1 SMP Thu Apr 22
00:18:24 EDT 2004
[root@ithil /]#

As you can see from the capacity, it's a USB key, not floppy.
Also, I tried the test without syncing, to make sure that it's
not an outstanding data which cause the failure.

The detective work Dell did to pinpoint the BUG() location
is welcome, but to develop a fix we need the precise context.
I need that oops message!! (with the exact kernel version)

Comment 3 Gary Lerhaupt 2004-06-23 22:09:24 UTC

Your engineers in Issue Tracker #40475 have already reproduced this. 

What happened to the vaunted right-hand talk to left-hand strategy?  
All kidding aside, if you still need me to provide this information 
if it's not available from them, then let me know and I'll do so.

Comment 4 Pete Zaitcev 2004-06-23 22:27:51 UTC

Gary, please give me the terminal trace with commands and the
console trace (dmesg). The netdump might be good but I understand
it's a lot of hassle, so let's start simple. The version will let
me start binary diffing.

The it#40475 hasn't got any substantial info, or else the
Issue Tracker doesn't show it to me.

Comment 5 Gary Lerhaupt 2004-06-24 15:39:39 UTC

Created attachment 101376 [details]
Kernel Panic

Comment 6 Pete Zaitcev 2004-07-13 20:52:16 UTC

Trying to engage Stephen again.

Also, I didn't see "Busy inodes after unmounting" in my testing.
I am going to redo it, create dirty metadata before disconnect,
not just dirty data...

Comment 8 Stephen Tweedie 2004-07-20 20:30:47 UTC

I was able to get the device explosion this time.  Log is:

        VFS: busy inodes on changed media.
        sdc : READ CAPACITY failed.
        sdc : status = 1, message = 00, host = 0, driver = 08 
        Info fld=0xa00 (nonstd), Current sd00:00: sense key Not Ready
        sdc : block size assumed to be 512 bytes, disk size 1GB.  
        sdc: test WP failed, assume Write Enabled
         I/O error: dev 08:20, sector 0
         unable to read partition table
        VFS: busy inodes on changed media.
        sdc : READ CAPACITY failed.
        sdc : status = 1, message = 00, host = 0, driver = 08 
        Info fld=0xa00 (nonstd), Current sd00:00: sense key Not Ready
        sdc : block size assumed to be 512 bytes, disk size 1GB.  
        sdc: test WP failed, assume Write Enabled
         I/O error: dev 08:20, sector 0
         unable to read partition table
        Kernel BUG at ll_rw_blk:1014
        
then the oops.

I think I can see what's happening.  sd.c sees the removed media as a
disk change.  check_disk_change(kdev_t dev) calls the revalidate code:

        if (bdops->revalidate)
                bdops->revalidate(dev);

which gets to fop_revalidate_scsidisk in sd.c:

static int fop_revalidate_scsidisk(kdev_t dev)
{
        return revalidate_scsidisk(dev, 0);
}

which in turn tries to grok_partitions(), which calls
check_partitions, which calls (eg.) msdos_partition and fails with EIO:

                res = check_part[i](hd, bdev, first_sector,
first_part_minor);
                if (res) {
                        if (res < 0 &&  warn_no_part)
                                printk(" unable to read partition
table\n");
                        goto setup_devfs;
                }

which we can see from the log --- we get exactly this printk quite
early in the process.

Then the exit code tries:

setup_devfs:
        invalidate_bdev(bdev, 1);
        truncate_inode_pages(bdev->bd_inode->i_mapping, 0);

and it's the latter which is the problem: truncate_inode_pages() calls
truncate_list_pages() which calls truncate_complete_page() which calls
do_flushpage() which calls block_flushpages() which is a macro
expanding to discard_bh_page() which calls discard_buffer() which
(PHEW) clears BH_Mapped.

Basically, it's the attempt to rescan the partitioning on an
already-mounted device which is killing us.

Comment 11 Gary Lerhaupt 2004-07-30 14:15:28 UTC

Any progress on coding out the BUG() call?

Comment 15 Tim Burke 2004-08-11 13:59:22 UTC

Here's some related comments following an internal audit of the
proposed U3 blocking issues which are still unresolved...

This issue is in practice only applicable to ext2 on floppy.  
- ext3 on floppy is an unlikely combination because it needs a large
journal file
- msdos/vfat and iso9660 filesystems are more synchronous and don't
need to write anything on unmount
- USB pen-drives are almost always partitioned which therefore doesn't
end up with a filesystem on /dev/sda, so the problem doesn't arise here

The resolution to this issue would entail substantial modification of
the buffer invalidation logic in the device revalidation path.  This
is historically a very complex and delicate codepath.  Obviously not
something to be done shortly before code freeze.  For this reason, we
won't be able to address this in U3.

I'll concede that this is an annoying problem. And that ideally it
shouldn't occur.  However I don't consider it to be a release stopper
item.  Thats because to some extent its a user error (yes, the system
would ideally protect against).  The problem isn't blocking the
ability to support a new hardware platform, it isn't a regression, and
is not a data corruptor.

Comment 16 Matt Domsch 2004-09-01 22:15:59 UTC

Per above, we will close this as Obsolete.  Dell documented this in 
our tech sheet, sysadmins should know better anyhow, and RHEL4 won't 
have this problem.