Bug 408521 - Kernel-2.6.23 hosing my Fedora ext3 / Partition
Summary: Kernel-2.6.23 hosing my Fedora ext3 / Partition
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: i386
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-03 10:22 UTC by Declan Moriarty
Modified: 2008-01-17 16:33 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2008-01-17 16:33:38 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
error from syslog (1.17 KB, text/plain)
2007-12-03 10:22 UTC, Declan Moriarty
no flags Details
Tree of unusual lost+found stuff (16.80 KB, application/octet-stream)
2007-12-07 09:17 UTC, Declan Moriarty
no flags Details
Dmesg output (17.41 KB, application/octet-stream)
2007-12-07 15:44 UTC, Declan Moriarty
no flags Details
grep -C10 frozen /var/log/messages > errors.out (21.58 KB, application/octet-stream)
2007-12-07 16:22 UTC, Declan Moriarty
no flags Details
dmidecode, dmesg, and lspci outputs (8.84 KB, application/octet-stream)
2008-01-17 09:24 UTC, Declan Moriarty
no flags Details

Description Declan Moriarty 2007-12-03 10:22:59 UTC
Description of problem:
Since kernel 2.6.23.1-21, the ide driver seems to be malfunctioning. Repeating
error attached - 'frozen' on line 1 or repeat. For the time between errors, the
box freezes.  Fedora 7's lost+found now has 41 Megs - Evolution files, most of
firefox, includinjg /usr/lib/firefox<version>/ data files, dictionaries, jar
files, xpt files, plugins, config & localisation directories. There is also a
pile of libraries and weird things like ChangeLog files (all readable), some
cache or icon pics. There is also wine stuff, some compile stuff, some
openoffice directories, The distro is entirely hosed at this stage. I have been
compiling (with F7) for other partitions but all of Fedora is rpms.

Some interesting points:  

Fuller description on
http://forums.fedoraforum.org/showthread.php?p=914454#post914454

This is a lightly used box. I use Evolution(lightly), Firefox, Openoffice, a bit
of wine
stuff. No multimedia really. Much compiling has been done under Fedora on sda8,
& sda9 but those partitions came through virtually unscathed. The hard disk is
quite new. Memory has been tested. All files in lost+found seem complete, and
readable. This doesn't seem to be a hard disk crash. It behaves on the old ide
driver - I suspect the new legacy ide driver in the kernel. The old ide driver
resets and manages better than the new. E2fsck gave me every conceivable error -
cross linked inode blocks, count errors, allocation errors, orphans, illegal
inode blocks, and went back and restarted itself more than once. F7 has a
separate /home partition, but that came through OK damage wise.lost+found is empty.

Material hardware is the Via 8235 southbridge, for which I have a datasheet :-D.
The hard disk is ST38215a 80 Gig. 80 ribbon cable. 166Mhz bus speed (in bios)
This will probably gtive 33Mhz ide,& pci (which are separated). No SATA

Version-Release number of selected component (if applicable):
Description of problem:
Since kernel 2.6.23.1-21

How reproducible:
Just use the box!

Steps to Reproduce:
1.Boot up kernel 2.6.23
2.If you're impatient, start X
3.Do things.- browse a little, & answer email. Play minesweeper.
  
Actual results:
Box freezes periodically, error shown on stdout which repeats until you kill it.
Either three fingered salute or switch off. Then it freezes on bootup mounting /
until another distro is loaded and the disk is exorcised. To date I have not
seen one bad block on this disk. The old ide driver asked to read the same disk,
goes through a couple of errors, resets, and picks up pretty well

Expected results:
None of the above.

Additional info:
I'll leave it as is for the moment until I hear back on this in case somebody
wants data.

Comment 1 Declan Moriarty 2007-12-03 10:22:59 UTC
Created attachment 275601 [details]
error from syslog

Comment 2 Declan Moriarty 2007-12-06 09:21:55 UTC
Hmmmph!

Nobody has even read this, by the look of it. Nobody cares that F7 is about as
reliable here as windoze. Here's "/usr/lib/libwine.so.1" from the offending
partition: 
#!/bin/sh

for i in $HOME/.config/menus/applications-merged/*menu ; do
     sed -i -e 's:<Name>wine-wine</Name>:<Name>Wine</Name>:g' \
        -e ':<Directory>wine-wine</Directory>:<Directory>wine</Directory>:g' "$i"
done

Somebody tell me that's normal? Oh, I forgot - nobody's reading this stuff
/goes off to selectively reinstall - windows style :-(.

Comment 3 Chuck Ebbert 2007-12-06 22:53:30 UTC
Can you post the entire dmesg from bootup until it errors?

Comment 4 Declan Moriarty 2007-12-07 09:17:43 UTC
Created attachment 280811 [details]
Tree of unusual lost+found stuff

Comment 5 Declan Moriarty 2007-12-07 15:44:55 UTC
Created attachment 281301 [details]
Dmesg output

Good to hear from you. I am not able to reproduce the error, as I reinstalled,
and it now doesn't run, it limps along :-(. The tree output shows unusual &
complete files in lost+found. dmesg also attached, but no error today.
This is an ext3 filesystem problem affecting X programs on a new disk. Mozilla,
wine (minesweeper and Watchtower Library, both aok in wine) Openoffice, python
or yum and some init symlinks. It also picked up stray files from some
compiling going on on a separate partition. The zips there (mozilla bits
afaict) expand without error. A copy of the MIT license from python is readable
and perfect. There's a disproportionate amount of xml, xul, and java there. 
Much of this stuff may have been read, but should _never_ have been written to.
I certainly didn't. But huge toolchain compiles were behaving faultlessly on a
different partition of the same hard disk. What I can't do is give you the
other missing ingredient on a plate. I will also attach the errors from
/var/log/messages with the context.

Comment 6 Declan Moriarty 2007-12-07 16:22:40 UTC
Created attachment 281341 [details]
grep -C10 frozen /var/log/messages > errors.out

the word 'frozen' is on the 1st line of error output. In this, when they are
less then 20 lines apart, the error is continuing. Watch UDMA speed.

Comment 7 Declan Moriarty 2007-12-15 14:57:26 UTC
Footnote to this:

Nobody here paid any attention. It eventually got picked up through nabble.com
by the kernel guys. If you have a similar bug, post it on bugzilla.kernel.org
and reference this one.

Comment 8 Christopher Brown 2008-01-16 04:38:24 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

Thank you for filing the bug - I am CC'ing myself to this bug and will try and
assist you in resolving it if I can.

There hasn't been much activity on this for a while. Could you tell me if you
are still having problems with the latest kernel?

Does comment #7 indicate this has been resolved or is there any upstream bug
filed at the kernel.org bugzilla which can be referenced?

Comment 9 Declan Moriarty 2008-01-16 09:32:05 UTC
Comment #7 was the outcome of my discussions with Theodore Ts'o on the kernel
list as he wants to run this down.

The current situation is. I gather the via section of the PATA ide driver has
bugs which Theodore wants to sort if he can. I failed to reproduce the problem
after a reinstall, and Fedora 7 failed to reinstall properly. Tired of the
deafening silence here, and such a dodgy system, I formatted and overwrote the
F-7 partition with a Slackware 12.0. Slack and I are making friends.

My comment #4 has a tree of the 41 Megs of stuff in lost+found. That is now your
best information. This tree shows X apps were hammered (/usr/lib/libwine.so.1
became a script, /usr/lib/mozilla/ went awol altogether) while at the same time
heavy compiling of an lfs-like distribution behaved faultlessly in a console.
Mozilla, openoffice, evolution were always open, and wine a good bit of the time.

Fedora had sda5 as /home, but that was untouched. This is odd considering
~/.wine & ~/.evolution were base dirs. Many other partitions were mounted, but
only / got borked. The chipset is Via KT400-333 Mhx (=166 Mhz) which is a
relative of the famous MPV3 southbridges with the hardware problem. Only
read-only stuff got done in. Data was unharmed. I would get a disk crash, a
reboot would follow, the system wouldn't mount the disk, and froze. I would boot
into another system and run e2fsck -cvy or somesuch. Lost+found would fill up
some more, and we'd be going again.

Comment 10 Christopher Brown 2008-01-16 13:35:11 UTC
Hello Declan,

Okay, thanks for the update. Is the Slackware system exhibiting the same issues?
Are you still able to test a Fedora system? I can assign this over to the PATA
kernel maintainers but only if you are still able to test with a Fedora system -
the latest release would be good. In the first instance they would need the
following as separate text/plain attachments to this bug:

# dmidecode
lspci -vvxxx
dmesg

Cheers
Chris

Comment 11 Declan Moriarty 2008-01-17 09:24:15 UTC
Created attachment 291974 [details]
dmidecode, dmesg, and lspci outputs

dmidecode, lspci and dmesg outputs - attached (from slackware). As for your
questions:
No other system (I have a few) displays these symptoms. Fedora 7 before the
2.6.23.1 update didn't either. The old driver (hda) is fine, it was the new
one(sda). I can clear a partition and install any version of Fedora. Testing
for this is another matter, as I was simply living in the system, doing basic
stuff while compiling in a console (usually not an xterm). I now believe that
something in the few hundred megabytes of  updates I pulled in with the
2.6.23.1 kernel made the system unstable, and that this, along with whatever
bugs exist in the new driver caused the loss of binaries. I sat here with the
system hosed for a week waiting for you guys to wake up, but it never happened.
If you have something automatic to test for such a problem it's worth doing.

Comment 12 Christopher Brown 2008-01-17 15:06:18 UTC
Hello Declan,

I appreciate its frustrating (and thank you for providing the requested
information) but the fact is that the Fedora project is run by people who mostly
volunteer their time (me included) and if you persist in being rude I'm happy to
focus my time elsewhere. Now then, as few things to test:

1) It would be better to try with a Fedora 8 install or even an updated spin
from the fedora unity project:

http://spins.fedoraunity.org/spins

as the latter provide critical updates to the original released Fedora 8 which
resolve some installation issues.

2) http://fedoraproject.org/wiki/KernelCommonProblems

This is a website which has numerous suggestions on how to resolve install and
boot problems - I'm afraid there always will be some.

3) If an attempted install is still failing for you then try adding:

libata.dma=0

to the boot options

Let me know how you get on. Also in future, please attach information as I
indicated as separate text/plain attachments to this bug. tgz files just create
several more steps for us to access the debugging information.

Comment 13 Declan Moriarty 2008-01-17 16:27:34 UTC
I apologise if I'm seen as rude, which was not my intention. I suppose I was
trying to say that if notice had been taken of the bug report, I would have been
able to avoid the reinstall, and this would have been easier.

If nobody else is having this issue, I feel we should let it die, and close this
bug. Time is not particularly on my side ATM. I have a peculiar early revision
of the Via 82c686 Southbridge in a 2004 board. It has 6 usb ports (vs the usual
4) and two of these have log spam on them because they are non compliant. That
resulted in a revision to ehci-hcd adding the ignore_oc=1 option to the module
around 2.6.20. Via's response was to disable 2 of the ports quietly. The APIC is
broken and every device gets the same 2 halfassed IRQs so nothing works unless I
disable it on bootup. ACPI is dodgy at best. So it's pretty clear where the
problem really is - MY HARDWARE.

OTOH, if there's some issue you know is there and you just need a box like mine
showing the symptoms, (Like the ehci-hcd maintainer did), I'm game. Whatever you
want installed or tried I will do. In that case, the code was patched to read
the vital registers into syslog, and I would boot and plug/unplug into usb. I
added notes on what I was doing and sent it all off to the maintainer. 

I personally liked Fedora. I feel sure that I could install F-8 like I installed
FC5 & F7 and they would work fine for some months. Then after some update, I'd
be in a mess again, exactly as happened with F7 and FC5. The F7 one was
spectacular - hence the bug report. 



Comment 14 Christopher Brown 2008-01-17 16:33:38 UTC
Okay, as you feel your hardware is at issue I am closing NOTABUG. I would
recommend checking for a BIOS upgrade if one is available and also testing what
I suggested above.

Cheers
Chris


Note You need to log in before you can comment on or make changes to this bug.