Bug 125524
Summary: | kernel panic when attempting to umount a pulled USB floppy with ext2 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Gary Lerhaupt <gary_lerhaupt> | ||||
Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> | ||||
Status: | CLOSED WONTFIX | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0 | CC: | aviro, barryn, fhirtz, petrides, riel, sct, tao, us_linux_engineering | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-01 22:15:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Gary Lerhaupt
2004-06-08 14:59:38 UTC
Note that this is Issue Tracker#Issue 40475 which has been escalated by dmaley. However, since I have seen no status, I created this bugzilla. I see we've already focused on the reproduction. Whatever happened to our vaunted first-fault first-look first-fix strategy? Dell did not even collect an oops traceback for us, let alone a netdump core. Obviously, everything works dandy here [see below], so I expect a little bit of trouble for reproducing engineers in Centennial once they found the USB floppy. [root@ithil /]# mount /dev/sdc1 /mnt/tmp [root@ithil /]# cp /boot/vmlinuz-2.4.21-15.ELsmp /mnt/tmp [root@ithil /]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 4032124 1663896 2163400 44% / none 1999292 0 1999292 0% /dev/shm /dev/sdb3 233856796 2267784 219709748 2% /q /dev/sdc1 15475 1331 13345 10% /mnt/tmp [root@ithil /]# sync [root@ithil /]# <=============================== pulled here [root@ithil /]# dmesg | tail I/O error: dev 08:20, sector 0 unable to read partition table sdc : READ CAPACITY failed. sdc : status = 1, message = 00, host = 0, driver = 08 Info fld=0xa00 (nonstd), Current sd00:00: sense key Not Ready sdc : block size assumed to be 512 bytes, disk size 1GB. sdc: test WP failed, assume Write Enabled sdc: I/O error: dev 08:20, sector 0 I/O error: dev 08:20, sector 0 unable to read partition table [root@ithil /]# umount /mnt/tmp [root@ithil /]# [root@ithil /]# dmesg | tail -5 sdc: test WP failed, assume Write Enabled sdc: I/O error: dev 08:20, sector 0 I/O error: dev 08:20, sector 0 unable to read partition table I/O error: dev 08:21, sector 2 [root@ithil /]# cat /proc/version Linux version 2.4.21-15.ELsmp (bhcompile.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-34)) #1 SMP Thu Apr 22 00:18:24 EDT 2004 [root@ithil /]# As you can see from the capacity, it's a USB key, not floppy. Also, I tried the test without syncing, to make sure that it's not an outstanding data which cause the failure. The detective work Dell did to pinpoint the BUG() location is welcome, but to develop a fix we need the precise context. I need that oops message!! (with the exact kernel version) Your engineers in Issue Tracker #40475 have already reproduced this. What happened to the vaunted right-hand talk to left-hand strategy? All kidding aside, if you still need me to provide this information if it's not available from them, then let me know and I'll do so. Gary, please give me the terminal trace with commands and the console trace (dmesg). The netdump might be good but I understand it's a lot of hassle, so let's start simple. The version will let me start binary diffing. The it#40475 hasn't got any substantial info, or else the Issue Tracker doesn't show it to me. Created attachment 101376 [details]
Kernel Panic
Trying to engage Stephen again. Also, I didn't see "Busy inodes after unmounting" in my testing. I am going to redo it, create dirty metadata before disconnect, not just dirty data... I was able to get the device explosion this time. Log is: VFS: busy inodes on changed media. sdc : READ CAPACITY failed. sdc : status = 1, message = 00, host = 0, driver = 08 Info fld=0xa00 (nonstd), Current sd00:00: sense key Not Ready sdc : block size assumed to be 512 bytes, disk size 1GB. sdc: test WP failed, assume Write Enabled I/O error: dev 08:20, sector 0 unable to read partition table VFS: busy inodes on changed media. sdc : READ CAPACITY failed. sdc : status = 1, message = 00, host = 0, driver = 08 Info fld=0xa00 (nonstd), Current sd00:00: sense key Not Ready sdc : block size assumed to be 512 bytes, disk size 1GB. sdc: test WP failed, assume Write Enabled I/O error: dev 08:20, sector 0 unable to read partition table Kernel BUG at ll_rw_blk:1014 then the oops. I think I can see what's happening. sd.c sees the removed media as a disk change. check_disk_change(kdev_t dev) calls the revalidate code: if (bdops->revalidate) bdops->revalidate(dev); which gets to fop_revalidate_scsidisk in sd.c: static int fop_revalidate_scsidisk(kdev_t dev) { return revalidate_scsidisk(dev, 0); } which in turn tries to grok_partitions(), which calls check_partitions, which calls (eg.) msdos_partition and fails with EIO: res = check_part[i](hd, bdev, first_sector, first_part_minor); if (res) { if (res < 0 && warn_no_part) printk(" unable to read partition table\n"); goto setup_devfs; } which we can see from the log --- we get exactly this printk quite early in the process. Then the exit code tries: setup_devfs: invalidate_bdev(bdev, 1); truncate_inode_pages(bdev->bd_inode->i_mapping, 0); and it's the latter which is the problem: truncate_inode_pages() calls truncate_list_pages() which calls truncate_complete_page() which calls do_flushpage() which calls block_flushpages() which is a macro expanding to discard_bh_page() which calls discard_buffer() which (PHEW) clears BH_Mapped. Basically, it's the attempt to rescan the partitioning on an already-mounted device which is killing us. Any progress on coding out the BUG() call? Here's some related comments following an internal audit of the proposed U3 blocking issues which are still unresolved... This issue is in practice only applicable to ext2 on floppy. - ext3 on floppy is an unlikely combination because it needs a large journal file - msdos/vfat and iso9660 filesystems are more synchronous and don't need to write anything on unmount - USB pen-drives are almost always partitioned which therefore doesn't end up with a filesystem on /dev/sda, so the problem doesn't arise here The resolution to this issue would entail substantial modification of the buffer invalidation logic in the device revalidation path. This is historically a very complex and delicate codepath. Obviously not something to be done shortly before code freeze. For this reason, we won't be able to address this in U3. I'll concede that this is an annoying problem. And that ideally it shouldn't occur. However I don't consider it to be a release stopper item. Thats because to some extent its a user error (yes, the system would ideally protect against). The problem isn't blocking the ability to support a new hardware platform, it isn't a regression, and is not a data corruptor. Per above, we will close this as Obsolete. Dell documented this in our tech sheet, sysadmins should know better anyhow, and RHEL4 won't have this problem. |