Red Hat Bugzilla – Bug 306801
System freeze with file damage
Last modified: 2008-01-13 21:57:57 EST
Description of problem:
System freezes completely. Scanning logs only turns up kernel null-pointer bugs
that appear to be linked in to the sensors files....see my post
Version-Release number of selected component (if applicable):
Random, but a couple of times per day. Causes file system damage when the system
has locked up and needs a hard reset.
Steps to Reproduce:
1.Impossible to tell, however usually doing something fairly intensive when it
occurs, but never in the same package. Always listening to music over and NFS
shared mount at the time.
I'm not sure if this is an lm_sensors fault, but it seems that the last sys file
used every time is the hardware monitor file:
This not an lm_sensors issue, but rather an kernel issue. Changing component to
match. This might be caused by acpi code and the w83627hf driver both trying to
access the w83627hf at the same time, in which case the only solution is not
using the w83627hf driver. Have you tried disabling your lm_sensors startup
service, and not loading the w83627hf module later on in anyway?
Anyways this is best handled upstream, please try if disabling the w83627hf
driver helps and send a detailed report to firstname.lastname@example.org, where all
lm-sensors development (including the kernel part) gets discussed.
OK, thats all good. I've disabled lm_sensors completely temporarily to see if
life improves, I'll keep a log of errors. Thanks.
Incidentally, how can i determine this for myself (if at all possible) so I dont
post these bugs incorrectly to the wrong package?
(In reply to comment #2)
> Incidentally, how can i determine this for myself (if at all possible) so I dont
> post these bugs incorrectly to the wrong package?
Choosing exactly the right component can be hard, so don't worry about it, you
were pretty close on, and provided a reasonable level of detail in the bug
report, which is really all we ask for.
I see no evidence that the w83627hf driver has anything to do with the crash.
The fact that the last open sysfs file at the time of the crash is
/devices/platform/w83627hf.656/temp2_input simply suggests that you have some
monitoring application (ksensors?) repeatedly polling temperature values, this
does not mean that the w83627hf driver is causing the crash.
Just try running without the w83627hf driver loaded and I guess that your system
will still freeze randomly. The backtraces in the logs suggest a network or
filesystem issue. Maybe a bug in NFS?
This bug should be reopened (for some reason I cannot do that myself.) There's
nothing that can be done "upstream" until we know for sure which "upstream" it is.
Reopening per comment #4.
Been running flawlessly for several hours, while using the NFS mount and amarok,
with ksensor removed and lm_ service stopped. I'll reload lm_ and not ksensor
and see how it get on?
(In reply to comment #6)
> Been running flawlessly for several hours, while using the NFS mount and amarok,
> with ksensor removed and lm_ service stopped. I'll reload lm_ and not ksensor
> and see how it get on?
Thats of little use, as hwmon (lm_sensors) drivers do little (nothing actually)
unless an application is using them.
Please keep testing without the w83627hf driver. It is possible that the bug
you're hitting triggers faster when sysfs is stressed, without the w83627hf
driver being directly responsible.
One way to test this would be to write a simple shell script repeatedly polling
arbitrary sysfs files. Something like:
--- 8< ---
while [ 1 ]
cat /sys/class/net/eth0/statistics/* > /dev/null
--- 8< ---
Adjust the sleep value to match what you had in ksensors. Let this script run in
the background and see if the bug triggers again.
OK I've set the script up and its running the background - I'll continue to work
as normal with all other apps etc. I'll keep you updated...
Created attachment 209231 [details]
Latest log from /var/log/messages
Occurred about 1hr after terminating the above script, I was just about to save
a document and X crashed, hard-locking the system again. This is the log from
after the event.
Do you confirm that the w83627hf driver was not loaded when the last crash
happened? That would mean that the problem is definitely not related to
lm-sensors at all.
That is correct, no lm_sensors or ksensors in use and had also stopped polling
the sys files too. Recent failures have not listed the w83627hf driver. Its
failed twice this morning, I've switched back to the last known good kernel to
try to prevent data corruption.
OK, can either you or Hans change the summary line to no longer point at
lm-sensors then? Thanks.
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.
I am CC'ing myself to this bug and will try and assist you in resolving it if I can.
There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?
If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.
I was never able to cure this problem and eventually gave up and installed F8.
This appeared to cure the file damage problems but then I had a catastrophic RAM
failure, ive not had the problem since but have not used the system more than
twice either so dont feel qualified to say "its cured" or whether or not the RAM
was the original fault.
Sorry to hear it. As you're no longer able to troubleshoot this machine I'll
take the liberty of closing it INSUFFICIENT_DATA but thanks very much for taking
the time to file the original bug report - it is appreciated.