Bug 190214

Summary: Kernel page faults
Product: [Fedora] Fedora Reporter: Darwin H. Webb <thethirddoorontheleft>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-11 17:11:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Shows the erros messages of the kernel fault. none

Description Darwin H. Webb 2006-04-28 20:45:09 UTC
Description of problem:
kernel crashes on page fault while running Folding@Home advanced cores in
usernsme space. This process is subposibly using SSE2 and double precision FT.
I have tried all kernels for FC5 and all updates.
FAH program is running in its own user in tty 3 and the main user may or may not
be logged on.

Version-Release number of selected component (if applicable):
kernel-2.6.16-2101

How reproducible: Everytime 


Steps to Reproduce:
1.Configure FAh program for > 5 MB downloads on a Pent 4
2.
3.
  
Actual results: 


Expected results:


Additional info:

Comment 1 Darwin H. Webb 2006-04-28 20:45:09 UTC
Created attachment 128376 [details]
Shows the erros messages of the kernel fault.

Comment 2 Dave Jones 2006-09-11 00:19:13 UTC
When cpu intensive applications such as FAH are run, it's pushing the hardware
to the limits. If any part of the system is marginal, then sooner or later,
something will break. A typical failure mode of such systems under stress is
single bits of memory flipping state. The output you show could be explained by
this.  Furthermore, this is an isolated case, which adds further credibility to
this theory. (This sort of problem is something that should either affect a lot
of people, or no-one. 1-2 cases of it are pretty much guaranteed to be hardware
problems of some kind).

You may find running memtest86+ for a day or so may yield further info.


Comment 3 Dave Jones 2006-10-16 21:25:01 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 4 Darwin H. Webb 2006-10-17 01:12:30 UTC
have run memtest for 4 hrs 11 passes - no errors
have run the other kernels including 2.6.18 inital

did a yum update for lastest FC5 including kernel 2.18.1 2200
System is running good. No errors or problems on SMP ht.
rebooted into non smp kernel
gnome-termal failed and needed restart.

machine is now running fah (a big unit) and time will tell if it helped.
About 24-72 hrs.

thank you

Darwin

Comment 5 Darwin H. Webb 2006-10-17 02:53:07 UTC
Well that didn't take long.

First fah wu exited on 0x0 code after 50 min.
A fah unknown error.

The next fah wu was using 3Dnow 
It ran for 20 min and dumpped as usual on the tty screen indicating a page fault.
The computer was locked up.
some text referred spin lock, then to big spin lock.

upon reboot many idnoe ophans recovered (i get this a lot in FC6, no so much in
FC5. This is FC5).

Looked thru the logs but no sign of dump.

Some message about ioctl on dbus but no detail with it.

some avc activity could not be logged but I'm not sure what or when those happened.

Only other activity was beagle index ran for a min or 2.

There was not enough time for the CPU to overheat.
It usually runs for a fw hours before crashing on fah.

Darwin

Comment 6 Dave Jones 2006-10-17 05:27:25 UTC
I need to see the exact messages to make any progress on this.
The backtraces similar to what you filed in comment 2

Comment 7 petrosyan 2008-03-11 17:11:20 UTC
The information we've requested above is required in order
to review this problem report further and diagnose/fix the
issue if it is still present.  Since there have not been any
updates to the report since thirty (30) days or more since we
requested additional information, we're assuming the problem
is either no longer present in the current Fedora release, or
that there is no longer any interest in tracking the problem.

Setting status to "INSUFFICIENT_DATA".  If you still
experience this problem after updating to our latest Fedora
release and can provide the information previously requested, 
please feel free to reopen the bug report.

Thank you in advance.