Red Hat Bugzilla – Bug 502970
ext3 filesystem error reported on f10 system with crypt/soft-raid1 config
Last modified: 2009-12-18 04:29:53 EST
Description of problem:
System stops with filesystem error.
Version-Release number of selected component (if applicable):
Run Fedora 10 system for a while.
Note mouse pointer slows down, lag in movement.
quit all apps
note system stalls on console, eventually produces filesystem error message (see attachment as an example)
As nobody else seems to have reported such a problem, it might be caused by my unusual system setup.
I've been running Fedora 10 on my primary notebook for 6 months.
Whenever I use a f10 kernel, I run into this bug very soon.
When using a "Fedora *9*" kernel, it runs perfectly stable.
Therefore I used the f9 kernel most of the time.
Description of system setup
/dev/sda is the internal notebook harddisk.
fdisk -l /dev/sda
/dev/sda1 * 1 2295 18434556 7 HPFS/NTFS
/dev/sda2 2296 2320 200812+ 83 Linux
/dev/sda3 2321 4310 15984675 83 Linux
/dev/sda4 4311 24321 160738357+ 5 Extended
/dev/sda5 4311 23580 154786243+ fd Linux raid autodetect
/dev/sda6 23581 24065 3895731 fd Linux raid autodetect
/dev/sda7 24066 24193 1028128+ 82 Linux swap / Solaris
/dev/sda8 24194 24321 1028128+ 83 Linux
UUID=407b5126-2e34-4103-ad24-5ac38e4bfcb0 / ext3 defaults 1 1
UUID=7c516bbb-4306-432a-a57b-14eabb92d594 /boot ext3 defaults 1 2
/dev/mapper/tmpcrypt /tmp ext2 defaults 0 0
/dev/mapper/swapcrypt swap swap defaults 0 0
/dev/mapper/luks-24990861-ec21-46a7-9155-5a1ffe86bbef /private ext3 defaults 0 0
/dev/mapper/luks-4c0a967e-48c8-4cc3-8bf1-ebb5ff9f929f /home ext3 defaults 0 0
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
Personalities : [raid1]
md0 : active raid1 sda5
154786176 blocks [2/1] [U_]
md1 : active raid1 sda6
3895616 blocks [2/1] [U_]
unused devices: <none>
luks-24990861-ec21-46a7-9155-5a1ffe86bbef /dev/md1 none
luks-4c0a967e-48c8-4cc3-8bf1-ebb5ff9f929f /dev/md0 none
tmpcrypt /dev/sda8 /dev/urandom tmp
swapcrypt /dev/sda7 /dev/urandom swap
Text description of setup:
- both /tmp and swap are encrypted partitions, with random keys.
- /boot is plain partition with ext3
- / is plain partition with ext3
- /dev/md0 is software RAID 1
- /dev/md1 is software RAID 1
- /private is ext3 on top of /dev/md1
- /home is ext3 on top of /dev/md0
The harddrive which hosts the mirror partitions for the RAID 1 arrays is only temporarily available. I use this as a kind of easy backup.
Whenever I want to synchronize the partitions to the second disk, I insert it, change to runlevel 1, and use "mdstat" to add the partitions back to the RAIDs.
After synchronization is done, when I want to unplus the second disk, I go back to runlevel 1, I unmount the partitions (to ensure the mirrored filesystem is sane), then use "mdadm" with --fail and --remove.
This explains why my /proc/mdstat lists the RAID 1 drives with a missing disk. It's intended.
The original setup was made with Fedora 9.
When I upgraded to Fedora 10, I changed the above config files to use the UUID (instead of md0/md1).
The hardware is a Lenovo R61, less than 1.5 years old, Intel Core Duo, T7500, 4 GB physical RAM, running the 32 bit version of Fedora.
The system works rock solid with Fedora 9 kernel.
Only when using Fedora 10 kernel I see problems.
Note that in the past, early during the F10 lifetime, this once even crashed my whole filesystem. I had to go back to an backup. I had talked about it on IRC, but didn't have sufficient debug info.
I've now uploaded an external camera screenshot that shows the console errors.
*** Bug 489783 has been marked as a duplicate of this bug. ***
When I had experienced this the last time (a couple weeks ago), I booted a live CD and run a manual fsck (forced) on the file system. No problems were reported, everything clean.
When I booted (today) with the f9 kernel (directly after the crash), I didn't see any reports about filesystem trouble either (during the startup messages). Only thing I saw was a "inode time future" or similar, which got fixed.
So, this time, the first reported problem was in /dev/sda3, the plain partition containing the ext3 filesystem mounted at / (no crypt, no raid).
Afterwards we also got problem reports for dm-1 and dm-2, two of the encrypted partitions.
The current Fedora 9 and Fedora 10 kernels are both based on 188.8.131.52, so it doesn't seem right that one should work and the other not work. Or were you using an older Fedora 9 kernel when everything worked?
The working Fedora 9 kernel is 184.108.40.206-78.2.30.fc9
I had problems with all the fc10 kernels.
Kernels like 220.127.116.11-170.2.56.fc10 and earlier gave me problems, too.
Can you post the contents of /proc/interrupts with the f10 kernel running, and also from the f9 kernel if they are different?
Created attachment 346473 [details]
/proc/interrupts - fedora 9 kernel
Created attachment 346474 [details]
/proc/interrupts - fedora 10 kernel
Note, this kernel has been running only a short period of time.
(I didn't wait for the problem to show up.)
I decided to do another experiment.
I'm now running the latest f9 kernel availabe from updates, and will give feedback in a couple of days.
This shall answer the question:
Is the problem in the kernel version later than 18.104.22.168 ?
Or is the problem in the patches that are different between f9 and f10 ?
(I guess the applied patches differ between f9 and f10)
22.214.171.124-78.2.53.fc9 ... currently testing
so far the latest f9 kernel works stable.
I suspect the cause is contained in the differences between
Could this problem be related to unstable clocksource?
I'm on pentium-m, dual core.
My system reported "clocksource tsc unstable".
Elsewhere on the web I read that it's expected the kernel will automatically switch to a more reliable clocksource. But my system didn't report any such switching to a different one.
I changed my grub.conf to include clocksource=hpet on the kernel line.
I just realize that I've been running the most recent f10 kernel since yesterday... Maybe that fixed my problem?
(I'll report back later if the system keeps being stable.)
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.