Bug 502970

Summary: ext3 filesystem error reported on f10 system with crypt/soft-raid1 config
Product: [Fedora] Fedora Reporter: Kai Engert (:kaie) (inactive account) <kengert>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: agk, itamar, kernel-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 09:29:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/proc/interrupts - fedora 9 kernel
none
/proc/interrupts - fedora 10 kernel none

Description Kai Engert (:kaie) (inactive account) 2009-05-28 01:01:05 UTC
Description of problem:
System stops with filesystem error.

Version-Release number of selected component (if applicable):
kernel-2.6.27.24-170.2.68.fc10.i686


How reproducible:
Run Fedora 10 system for a while.
Note mouse pointer slows down, lag in movement.
quit all apps
type poweroff
note system stalls on console, eventually produces filesystem error message (see attachment as an example)

Additional info:


As nobody else seems to have reported such a problem, it might be caused by my unusual system setup.

I've been running Fedora 10 on my primary notebook for 6 months.
Whenever I use a f10 kernel, I run into this bug very soon.

When using a "Fedora *9*" kernel, it runs perfectly stable.
Therefore I used the f9 kernel most of the time.


Description of system setup
===========================
/dev/sda is the internal notebook harddisk.

fdisk -l /dev/sda

/dev/sda1   *           1        2295    18434556    7  HPFS/NTFS
/dev/sda2            2296        2320      200812+  83  Linux
/dev/sda3            2321        4310    15984675   83  Linux
/dev/sda4            4311       24321   160738357+   5  Extended
/dev/sda5            4311       23580   154786243+  fd  Linux raid autodetect
/dev/sda6           23581       24065     3895731   fd  Linux raid autodetect
/dev/sda7           24066       24193     1028128+  82  Linux swap / Solaris
/dev/sda8           24194       24321     1028128+  83  Linux


/etc/fstab

UUID=407b5126-2e34-4103-ad24-5ac38e4bfcb0 /                       ext3    defaults        1 1
UUID=7c516bbb-4306-432a-a57b-14eabb92d594 /boot                   ext3    defaults        1 2
/dev/mapper/tmpcrypt    /tmp                    ext2    defaults        0 0
/dev/mapper/swapcrypt   swap                    swap    defaults        0 0
/dev/mapper/luks-24990861-ec21-46a7-9155-5a1ffe86bbef /private ext3 defaults 0 0
/dev/mapper/luks-4c0a967e-48c8-4cc3-8bf1-ebb5ff9f929f /home ext3 defaults 0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0


/proc/mdstat

Personalities : [raid1]
md0 : active raid1 sda5[0]
      154786176 blocks [2/1] [U_]

md1 : active raid1 sda6[0]
      3895616 blocks [2/1] [U_]

unused devices: <none>


/etc/crypttab

luks-24990861-ec21-46a7-9155-5a1ffe86bbef /dev/md1 none
luks-4c0a967e-48c8-4cc3-8bf1-ebb5ff9f929f /dev/md0 none
tmpcrypt                /dev/sda8       /dev/urandom tmp
swapcrypt               /dev/sda7       /dev/urandom swap


Text description of setup:

- both /tmp and swap are encrypted partitions, with random keys.
- /boot is plain partition with ext3
- / is plain partition with ext3
- /dev/md0 is software RAID 1
- /dev/md1 is software RAID 1
- /private is ext3 on top of /dev/md1
- /home is ext3 on top of /dev/md0

The harddrive which hosts the mirror partitions for the RAID 1 arrays is only temporarily available. I use this as a kind of easy backup.

Whenever I want to synchronize the partitions to the second disk, I insert it, change to runlevel 1, and use "mdstat" to add the partitions back to the RAIDs.

After synchronization is done, when I want to unplus the second disk, I go back to runlevel 1, I unmount the partitions (to ensure the mirrored filesystem is sane), then use "mdadm" with --fail and --remove.

This explains why my /proc/mdstat lists the RAID 1 drives with a missing disk. It's intended.


The original setup was made with Fedora 9.
When I upgraded to Fedora 10, I changed the above config files to use the UUID (instead of md0/md1).


The hardware is a Lenovo R61, less than 1.5 years old, Intel Core Duo, T7500, 4 GB physical RAM, running the 32 bit version of Fedora.

The system works rock solid with Fedora 9 kernel.
Only when using Fedora 10 kernel I see problems.

Note that in the past, early during the F10 lifetime, this once even crashed my whole filesystem. I had to go back to an backup. I had talked about it on IRC, but didn't have sufficient debug info.

I've now uploaded an external camera screenshot that shows the console errors.
http://kuix.de/misc/2009-05-28-01_20-1.jpg

Comment 1 Kai Engert (:kaie) (inactive account) 2009-05-28 01:06:23 UTC
*** Bug 489783 has been marked as a duplicate of this bug. ***

Comment 2 Kai Engert (:kaie) (inactive account) 2009-05-28 01:11:20 UTC
When I had experienced this the last time (a couple weeks ago), I booted a live CD and run a manual fsck (forced) on the file system. No problems were reported, everything clean.

When I booted (today) with the f9 kernel (directly after the crash), I didn't see any reports about filesystem trouble either (during the startup messages). Only thing I saw was a "inode time future" or similar, which got fixed.

Comment 3 Kai Engert (:kaie) (inactive account) 2009-05-28 01:16:42 UTC
So, this time, the first reported problem was in /dev/sda3, the plain partition containing the ext3 filesystem mounted at / (no crypt, no raid).

Afterwards we also got problem reports for dm-1 and dm-2, two of the encrypted partitions.

Comment 4 Chuck Ebbert 2009-05-28 22:48:24 UTC
The current Fedora 9 and Fedora 10 kernels are both based on 2.6.27.24, so it doesn't seem right that one should work and the other not work. Or were you using an older Fedora 9 kernel when everything worked?

Comment 5 Kai Engert (:kaie) (inactive account) 2009-05-29 04:45:34 UTC
The working Fedora 9 kernel is 2.6.27.19-78.2.30.fc9

I had problems with all the fc10 kernels.
Kernels like 2.6.27.21-170.2.56.fc10 and earlier gave me problems, too.

Comment 6 Chuck Ebbert 2009-05-31 20:37:42 UTC
Can you post the contents of /proc/interrupts with the f10 kernel running, and also from the f9 kernel if they are different?

Comment 7 Kai Engert (:kaie) (inactive account) 2009-06-03 23:18:30 UTC
Created attachment 346473 [details]
/proc/interrupts - fedora 9 kernel

Comment 8 Kai Engert (:kaie) (inactive account) 2009-06-03 23:19:25 UTC
Created attachment 346474 [details]
/proc/interrupts - fedora 10 kernel


Note, this kernel has been running only a short period of time.
(I didn't wait for the problem to show up.)

Comment 9 Kai Engert (:kaie) (inactive account) 2009-06-03 23:25:22 UTC
I decided to do another experiment.
I'm now running the latest f9 kernel availabe from updates, and will give feedback in a couple of days. 
This shall answer the question:
  Is the problem in the kernel version later than 2.6.27.19 ?
  Or is the problem in the patches that are different between f9 and f10 ?

(I guess the applied patches differ between f9 and f10)


2.6.27.19-78.2.30.fc9    good

2.6.27.24-78.2.53.fc9    ... currently testing

2.6.27.24-170.2.68.fc10  bad

Comment 10 Kai Engert (:kaie) (inactive account) 2009-06-04 22:37:39 UTC
so far the latest f9 kernel works stable.

I suspect the cause is contained in the differences between

2.6.27.24-78.2.53.fc9
and
2.6.27.24-170.2.68.fc10

Comment 11 Kai Engert (:kaie) (inactive account) 2009-06-18 01:14:27 UTC
Could this problem be related to unstable clocksource?

I'm on pentium-m, dual core.
My system reported "clocksource tsc unstable".

Elsewhere on the web I read that it's expected the kernel will automatically switch to a more reliable clocksource. But my system didn't report any such switching to a different one.

I changed my grub.conf to include clocksource=hpet on the kernel line.
I just realize that I've been running the most recent f10 kernel since yesterday... Maybe that fixed my problem?

(I'll report back later if the system keeps being stable.)

Comment 12 Bug Zapper 2009-11-18 10:03:37 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bug Zapper 2009-12-18 09:29:53 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.