Bug 454848
Summary: | unstable file system after kernel update | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ralf W. Grosse-Kunstleve <rwgk> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 8 | CC: | esandeen | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
URL: | https://www.redhat.com/archives/fedora-list/2008-July/msg01127.html | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-07-15 00:03:30 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ralf W. Grosse-Kunstleve
2008-07-10 06:49:27 UTC
Created attachment 311450 [details]
example output of 99 bin/libtbx.scons runs
i'll take a quick look over the testcase, thanks. For those playing along at home, you need tcsh to run the testcase. .... and python-devel. :) Anyway, I ran 100 builds on 2.6.25.9-40.fc8 and saw no errors. On a whim, can you add default_relatime=0 to your boot commandline and see if it makes any difference? post-boot grep "default relative atime updates" in dmesg to be sure... Thanks, -Eric > On a whim, can you add default_relatime=0 to your boot commandline and see if it
> makes any difference?
Will do that over the weekend; the machines are too busy right now.
In the meantime, I did this:
1. install fedora 8 from dvd, starting from scratch (i.e. disk repartitioned)
2. yum install tcsh; adduser ralf, no other customizations
3. run 100*bin/libtbx.scons
4. yum update + reboot
5. run 100*bin/libtbx.scons
Step 3. shows no errors.
(I got stuck at step 4 with this: ERROR with rpm_check_debug
but got around this via: yum remove NetworkManager; yum update)
To my surprise, step 5. also shows no errors!
The kernel version after yum update is still as reported yesterday:
2.6.25.9-40.fc8
The only difference to what I did before are the missing customizations
of the system that I usually do: activate NFS (server & client),
NIS, automount, some other misc. things. I'll do that asap and will
report what happens.
Followup to comment #5: I applied all customizations step by step, rebooting and testing several times. The machine remained stable all the way to the end. Saturday I ran "yum update" on the Fedora 9 machine, and is is stable now, too, for the first time. Today I rebuilt another unstable Fedora 8 system from scratch, exactly the same way as the one in comment #5, only that I did everything in one go and didn't reboot+test after each step. At the end is was NOT stable. I have absolutely no clue how this can be, except that "yum update" may have given me different updates on Friday (first machine) and today (second machine). But then again, why should it be "NOT OK", "OK", "NOT OK"? Nothing here really makes sense. Rebooting the second FC8 machine with the original kernel (2.6.23.1-42.fc8) made it stable again. This is similar to what I observed on another machine mentioned in my original posting. For completeness, eventually I ran rpm --install --force kernel-2.6.23.15-137.fc8.x86_64.rpm kernel-devel-2.6.23.15-137.fc8.x86_64.rpm kernel-headers-2.6.23.15-137.fc8.x86_64.rpm on the machine I did today (just because we've been using that kernel for several months without problems) and it is still stable. Summary: a few days ago I had three broken machines, today I have them all fixed somehow in three different ways, and I have zero explanations. I'll keep my hand off the systems now, quietly hoping that somebody will somehow find and fix the root cause of the problem, without ever knowing how much trouble it has caused me. If you wind up with a problematic system again, please do try turning off the default relatime and see how that goes. Thanks, -Eric I'm not sure there's a lot we can do w/o a reproducer but please do keep us up to date, and re-open, if you get more info. Thanks, -Eric |