Bug 306801

Summary: System freeze with file damage
Product: [Fedora] Fedora Reporter: James Buckle <coyoteboyuk>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 7CC: chris.brown, hdegoede, jdelvare
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-14 02:57:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Latest log from /var/log/messages none

Description James Buckle 2007-09-26 11:55:42 UTC
Description of problem:
System freezes completely. Scanning logs only turns up kernel null-pointer bugs
that appear to be linked in to the sensors files....see my post
http://www.fedoraforum.org/forum/showthread.php?p=871271#post871271

Version-Release number of selected component (if applicable):
lm_sensors-2.10.4-1.fc7

How reproducible:
Random, but a couple of times per day. Causes file system damage when the system
has locked up and needs a hard reset.

Steps to Reproduce:
1.Impossible to tell, however usually doing something fairly intensive when it
occurs, but never in the same package. Always listening to music over and NFS
shared mount at the time.
2.
3.
  
Actual results:
-

Expected results:
-

Additional info:
I'm not sure if this is an lm_sensors fault, but it seems that the last sys file
used every time is the hardware monitor file:
/devices/platform/w83627hf.656/....

Comment 1 Hans de Goede 2007-09-26 12:14:42 UTC
This not an lm_sensors issue, but rather an kernel issue. Changing component to
match. This might be caused by acpi code and the w83627hf driver both trying to
access the w83627hf at the same time, in which case the only solution is not
using the w83627hf driver. Have you tried disabling your lm_sensors startup
service, and not loading the w83627hf module later on in anyway?

Anyways this is best handled upstream, please try if disabling the w83627hf
driver helps and send a detailed report to lm-sensors, where all
lm-sensors development (including the kernel part) gets discussed.


Comment 2 James Buckle 2007-09-26 13:02:56 UTC
OK, thats all good. I've disabled lm_sensors completely temporarily to see if
life improves, I'll keep a log of errors. Thanks.

Incidentally, how can i determine this for myself (if at all possible) so I dont
post these bugs incorrectly to the wrong package?

Comment 3 Hans de Goede 2007-09-26 13:31:17 UTC
(In reply to comment #2)
> Incidentally, how can i determine this for myself (if at all possible) so I dont
> post these bugs incorrectly to the wrong package?

Choosing exactly the right component can be hard, so don't worry about it, you
were pretty close on, and provided a reasonable level of detail in the bug
report, which is really all we ask for.


Comment 4 Jean Delvare 2007-09-26 14:58:50 UTC
I see no evidence that the w83627hf driver has anything to do with the crash.
The fact that the last open sysfs file at the time of the crash is
/devices/platform/w83627hf.656/temp2_input simply suggests that you have some
monitoring application (ksensors?) repeatedly polling temperature values, this
does not mean that the w83627hf driver is causing the crash.

Just try running without the w83627hf driver loaded and I guess that your system
will still freeze randomly. The backtraces in the logs suggest a network or
filesystem issue. Maybe a bug in NFS?

This bug should be reopened (for some reason I cannot do that myself.) There's
nothing that can be done "upstream" until we know for sure which "upstream" it is.


Comment 5 Hans de Goede 2007-09-26 15:09:42 UTC
Reopening per comment #4.


Comment 6 James Buckle 2007-09-26 20:31:49 UTC
Been running flawlessly for several hours, while using the NFS mount and amarok,
with ksensor removed and lm_ service stopped. I'll reload lm_ and not ksensor
and see how it get on?

Comment 7 Hans de Goede 2007-09-26 20:37:28 UTC
(In reply to comment #6)
> Been running flawlessly for several hours, while using the NFS mount and amarok,
> with ksensor removed and lm_ service stopped. I'll reload lm_ and not ksensor
> and see how it get on?

Thats of little use, as hwmon (lm_sensors) drivers do little (nothing actually)
unless an application is using them.

Comment 8 Jean Delvare 2007-09-27 09:55:43 UTC
Please keep testing without the w83627hf driver. It is possible that the bug
you're hitting triggers faster when sysfs is stressed, without the w83627hf
driver being directly responsible.

One way to test this would be to write a simple shell script repeatedly polling
arbitrary sysfs files. Something like:

--- 8< ---
#!/bin/sh

while [ 1 ]
do
        cat /sys/class/net/eth0/statistics/* > /dev/null
        sleep 2
done
--- 8< ---

Adjust the sleep value to match what you had in ksensors. Let this script run in
the background and see if the bug triggers again.


Comment 9 James Buckle 2007-09-27 11:00:40 UTC
OK I've set the script up and its running the background - I'll continue to work
as normal with all other apps etc. I'll keep you updated...

Comment 10 James Buckle 2007-09-28 01:05:02 UTC
Created attachment 209231 [details]
Latest log from /var/log/messages

Occurred about 1hr after terminating the above script, I was just about to save
a document and X crashed, hard-locking the system again. This is the log from
after the event.

Comment 11 Jean Delvare 2007-09-28 12:23:41 UTC
Do you confirm that the w83627hf driver was not loaded when the last crash
happened? That would mean that the problem is definitely not related to
lm-sensors at all.


Comment 12 James Buckle 2007-09-28 14:16:49 UTC
That is correct, no lm_sensors or ksensors in use and had also stopped polling
the sys files too. Recent failures have not listed the w83627hf driver. Its
failed twice this morning, I've switched back to the last known good kernel to
try to prevent data corruption.

Comment 13 Jean Delvare 2007-09-28 17:35:35 UTC
OK, can either you or Hans change the summary line to no longer point at
lm-sensors then? Thanks.


Comment 14 Christopher Brown 2008-01-14 00:55:17 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 15 James Buckle 2008-01-14 01:31:38 UTC
I was never able to cure this problem and eventually gave up and installed F8.
This appeared to cure the file damage problems but then I had a catastrophic RAM
failure, ive not had the problem since but have not used the system more than
twice either so dont feel qualified to say "its cured" or whether or not the RAM
was the original fault.

Comment 16 Christopher Brown 2008-01-14 02:57:57 UTC
Sorry to hear it. As you're no longer able to troubleshoot this machine I'll
take the liberty of closing it INSUFFICIENT_DATA but thanks very much for taking
the time to file the original bug report - it is appreciated.

Cheers
Chris