This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 177951 - kernel 2.6.15-1.185*_FC5 eats my filesystem
kernel 2.6.15-1.185*_FC5 eats my filesystem
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
medium Severity high
: ---
: ---
Assigned To: Jeff Garzik
Brian Brock
:
Depends On:
Blocks: FC5Blocker FCMETA_SATA
  Show dependency treegraph
 
Reported: 2006-01-16 14:26 EST by Nicolas Mailhot
Modified: 2013-07-02 22:26 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-03 08:18:56 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
lspci (22.05 KB, text/plain)
2006-01-16 14:26 EST, Nicolas Mailhot
no flags Details
/var/log/dmesg with working kernel (27.34 KB, text/plain)
2006-01-16 14:38 EST, Nicolas Mailhot
no flags Details
mdadm for /dev/md0 (716 bytes, text/plain)
2006-01-16 14:40 EST, Nicolas Mailhot
no flags Details
mdadm for /dev/md1 (715 bytes, text/plain)
2006-01-16 14:41 EST, Nicolas Mailhot
no flags Details
lvm info (998 bytes, text/plain)
2006-01-16 14:42 EST, Nicolas Mailhot
no flags Details
lsmod on working system (2.90 KB, text/plain)
2006-01-16 14:43 EST, Nicolas Mailhot
no flags Details
dmesg for one problem kernel (kernel-2.6.15-1.1859_FC5) (34.20 KB, text/plain)
2006-01-17 18:18 EST, Nicolas Mailhot
no flags Details
smart info for sda (5.19 KB, text/plain)
2006-01-24 02:30 EST, Nicolas Mailhot
no flags Details
smart info for sdb (5.16 KB, text/plain)
2006-01-24 02:31 EST, Nicolas Mailhot
no flags Details
Simple patch to disable fua (524 bytes, patch)
2006-01-27 17:46 EST, Nicolas Mailhot
no flags Details | Diff
Fua blacklisting (1.38 KB, patch)
2006-01-31 17:38 EST, Nicolas Mailhot
no flags Details | Diff
dmesg for kernel patched with patch #123940 (21.68 KB, text/plain)
2006-01-31 17:41 EST, Nicolas Mailhot
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 5914 None None None Never

  None (edit)
Description Nicolas Mailhot 2006-01-16 14:26:27 EST
Description of problem:

After 6 days of uptime I decided to try the latest rawhide kernel
Result -> instant corruption (it starts by refusing to use some raid array
members, then barfs about ATA, and more info may have ended in the logs except
they were eaten by the last attempted boot)

My current kernel works fine (after cleaning up the mess)
It's kernel-2.6.15-1.1819_FC5.nim (meaning built from the 2.6.15-1.1819 srpm
with latest v4l patched in, about at the time 2.6.15-1.1819 was released)

Last changelog says :
* mar jan 03 2006 Dave Jones <davej@redhat.com>
- Silence some gcc4.1 warnings.

I don't really have all the intermediate kernels here to test and I have little
wish to play russian roulette till an important file is nuked, so if you could
fix this without more testing in my part that would be great ;)

This is an x86_64 raid + lvm system

Version-Release number of selected component (if applicable):

kernel-2.6.15-1.1857_FC5 is bad bad bad
as is the previous (I think) except I didn't rememeber to note its number and my
system logs are a mess

How reproducible:
Always (but I won't again)

Steps to Reproduce:
1. boot on rawhide kernel
2. watch the error messages scrool by
3. reboot under trusty kernel, get dumped in the "filesystem b0rked" admin
rescue prompt
Comment 1 Nicolas Mailhot 2006-01-16 14:26:27 EST
Created attachment 123251 [details]
lspci
Comment 2 Nicolas Mailhot 2006-01-16 14:38:57 EST
Created attachment 123252 [details]
/var/log/dmesg with working kernel
Comment 3 Nicolas Mailhot 2006-01-16 14:40:07 EST
Created attachment 123253 [details]
mdadm for /dev/md0
Comment 4 Nicolas Mailhot 2006-01-16 14:41:07 EST
Created attachment 123254 [details]
mdadm for /dev/md1
Comment 5 Nicolas Mailhot 2006-01-16 14:42:10 EST
Created attachment 123255 [details]
lvm info
Comment 6 Nicolas Mailhot 2006-01-16 14:43:25 EST
Created attachment 123256 [details]
lsmod on working system
Comment 7 Nicolas Mailhot 2006-01-17 18:18:54 EST
Created attachment 123343 [details]
dmesg for one problem kernel (kernel-2.6.15-1.1859_FC5)

I hope this helps - this just cost me 2h of cleanup after the attempted boot
(single mode) corrupted the filesystem again
Comment 8 Dave Jones 2006-01-24 00:28:10 EST
this really looks like a hardware problem. Either a bad cable, or worse, a dying
drive.  Those ata warnings are a really big sign..

"Unrecovered read error - auto reallocate failed"

Means it couldn't read a sector, and when it tried to reallocate it from the
spare pool, it couldn't, which usually means its already reallocated a bunch of
sectors.

Looks like RMA time.
Comment 9 Nicolas Mailhot 2006-01-24 01:53:15 EST
It may look like a dying drive but :
1. smart reports 0 error
2. the system is solid with 2.6.15 kernel, even after several days of I/O
3. the drives are new (ok weak point)
4. and anyway what's the probability for *two* new drives going bad at *exactly*
the same moment (being SATA BTW
Comment 10 Nicolas Mailhot 2006-01-24 01:54:25 EST
It may look like a dying drive but :
1. smart reports 0 error
2. the system is solid when rebooted with 2.6.15 kernel, even after several days
of I/O
3. the drives are new (ok weak point)
4. and anyway what's the probability for *two* new drives going bad at *exactly*
the same moment (being SATA BTW they don't share cabling)
Comment 11 Nicolas Mailhot 2006-01-24 02:30:25 EST
Created attachment 123604 [details]
smart info for sda
Comment 12 Nicolas Mailhot 2006-01-24 02:31:09 EST
Created attachment 123605 [details]
smart info for sdb
Comment 13 Nicolas Mailhot 2006-01-24 15:19:18 EST
Just let me know if you need more logs / test results
Comment 14 Nicolas Mailhot 2006-01-26 16:03:47 EST
2.6.15-1.1872_FC5 patched to disable FUA (as suggested by Tejun Heo there :
http://marc.theaimsgroup.com/?l=linux-ide&m=113825474609128) boots fine
Comment 15 Dave Jones 2006-01-27 15:49:57 EST
I've been unable to connect to marc.theaimsgroup.com for weeks, from multiple
locations around the world.  Can you attach that patch to the bugzilla please ?
Comment 16 Nicolas Mailhot 2006-01-27 17:43:48 EST
Strange, it works fine there. You can find the whole thread on any other
linux-ide archive (Title is : regarding bug #5914 - fs corruption on SATA)

I'll attach the patch but it's very preliminary and useful mainly to check if
FUA is causing problems on a system (it short-circuits it). People are talking
about  drive-specific FUA blacklisting now (but the fuller patch is not cooked yet)

Comment 17 Nicolas Mailhot 2006-01-27 17:46:27 EST
Created attachment 123808 [details]
Simple patch to disable fua
Comment 18 Nicolas Mailhot 2006-01-31 17:38:50 EST
Created attachment 123940 [details]
Fua blacklisting

The following (tested) patch implements fua drive blacklisting (specifically,
my drive model). Was posted in the aforementioned thread
Comment 19 Nicolas Mailhot 2006-01-31 17:41:32 EST
Created attachment 123941 [details]
dmesg for kernel patched with patch #123940
Comment 20 Nicolas Mailhot 2006-02-03 08:18:56 EST
Closing as the blacklisting patch was merged in latest git snapshot upstream

Note You need to log in before you can comment on or make changes to this bug.