Bug 447826 - Segfaults & Recursive Faults
Segfaults & Recursive Faults
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
All Linux
low Severity low
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-21 19:13 EDT by Nigel Jones
Modified: 2008-05-23 09:33 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-23 09:33:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output after Recursive Fault (49.41 KB, text/plain)
2008-05-21 19:13 EDT, Nigel Jones
no flags Details
dmesg | mcelog --ascii (41.96 KB, text/plain)
2008-05-23 01:54 EDT, Nigel Jones
no flags Details

  None (edit)
Description Nigel Jones 2008-05-21 19:13:20 EDT
Description of problem:
I'm getting a few segfaults with the latest kernel, but in addition I'm also
thinking that it may be the same cause of a weird X bug that I've experienced
with a fully up-to-date Fedora 9 (and LiveCDs) where I can't start X more than
once.  I managed to catch the dmesg (attached) when I was rebooting the other
day, I hadn't done anything 'special' either.

There are no 3rd Party Modules, no Livna kmods/akmods either.

Version-Release number of selected component (if applicable):
 2.6.25.3-18.fc9.x86_64

How reproducible:
I have no idea, but I suspect always for my desktop machine.

Additional Info:
[root@localhost ~]# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
(rev 02)
00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root
Port (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI
Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI
Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI
Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI
Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller
(rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1
(rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5
(rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6
(rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI
Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI
Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA IDE
Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE
Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G71 [GeForce 7300 GS] (rev a1)
02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet
Adapter (rev b0)
03:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II Controller
(rev b1)

[root@localhost ~]# lsmod
Module                  Size  Used by
ppdev                  15624  0
parport_pc             33816  0
lp                     19300  0
parport                42784  3 ppdev,parport_pc,lp
bridge                 59304  0
bnep                   21632  2
rfcomm                 44448  4
l2cap                  29312  16 bnep,rfcomm
bluetooth              59044  5 bnep,rfcomm,l2cap
fuse                   51008  1
sunrpc                185000  3
ipt_REJECT             11776  2
nf_conntrack_ipv4      17416  2
iptable_filter         11392  1
ip_tables              25232  1 iptable_filter
ip6t_REJECT            12544  2
xt_tcpudp              11648  2
nf_conntrack_ipv6      22984  2
xt_state               10752  4
nf_conntrack           64528  3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state
ip6table_filter        11264  1
ip6_tables             26640  1 ip6table_filter
x_tables               26248  6
ipt_REJECT,ip_tables,ip6t_REJECT,xt_tcpudp,xt_state,ip6_tables
ipv6                  276232  38 ip6t_REJECT,nf_conntrack_ipv6
cpufreq_ondemand       15760  1
acpi_cpufreq           16656  1
freq_table             13440  2 cpufreq_ondemand,acpi_cpufreq
dm_mirror              32004  0
dm_multipath           24976  0
dm_mod                 62104  2 dm_mirror,dm_multipath
snd_hda_intel         447540  3
snd_seq_dummy          11524  0
snd_seq_oss            39232  0
ahci                   35976  0
snd_seq_midi_event     15104  1 snd_seq_oss
snd_seq                61840  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
floppy                 66216  0
snd_seq_device         15508  3 snd_seq_dummy,snd_seq_oss,snd_seq
i2c_i801               17692  0
snd_pcm_oss            52096  0
sg                     40528  0
i2c_core               28448  1 i2c_i801
pcspkr                 11136  0
iTCO_wdt               19920  0
snd_mixer_oss          23296  1 snd_pcm_oss
snd_pcm                86024  2 snd_hda_intel,snd_pcm_oss
iTCO_vendor_support    11780  1 iTCO_wdt
atl1                   39052  0
snd_timer              29584  2 snd_seq,snd_pcm
mii                    13184  1 atl1
snd_page_alloc         16912  2 snd_hda_intel,snd_pcm
usb_storage            95008  0
button                 15776  0
snd_hwdep              16520  1 snd_hda_intel
pata_marvell           13696  0
sr_mod                 23732  0
snd                    66808  16
snd_hda_intel,snd_seq_dummy,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer,snd_hwdep
soundcore              14864  1 snd
cdrom                  40616  1 sr_mod
ata_piix               29188  4
pata_acpi              13824  0
ata_generic            14724  0
libata                149280  5 ahci,pata_marvell,ata_piix,pata_acpi,ata_generic
sd_mod                 33200  6
scsi_mod              150744  5 sg,usb_storage,sr_mod,libata,sd_mod
ext3                  130320  3
jbd                    53160  1 ext3
mbcache                15876  1 ext3
uhci_hcd               29984  0
ohci_hcd               28932  0
ehci_hcd               40588  0
Comment 1 Nigel Jones 2008-05-21 19:13:20 EDT
Created attachment 306326 [details]
dmesg output after Recursive Fault
Comment 2 Chuck Ebbert 2008-05-22 02:57:15 EDT
kernel BUG at mm/filemap.c:126!
        BUG_ON(page_mapped(page));


Your machine has taken a machine check error:
"Tainted: G   M"

Can you run the mcelog program when that happens and see what the error is?
Comment 3 Dave Jones 2008-05-22 09:54:23 EDT
it might be worth a run of memtest86 for a while too, just to rule out bad ram.
These things are common indications of hardware problems of some kind (bad
ram/insufficient power/cooling, or just general flakyness)
Comment 4 Nigel Jones 2008-05-23 01:50:35 EDT
(In reply to comment #2)
> kernel BUG at mm/filemap.c:126!
>         BUG_ON(page_mapped(page));
> 
> 
> Your machine has taken a machine check error:
> "Tainted: G   M"
> 
> Can you run the mcelog program when that happens and see what the error is?
> 
I shall attempt this when I next time I see it, already had two segfaults today.

I also just noticed another recursive fault (Kernel Oops spotted it in all
fairness), so how am I meant to run mcelog?

(In reply to comment #3)
> it might be worth a run of memtest86 for a while too, just to rule out bad ram.
> These things are common indications of hardware problems of some kind (bad
> ram/insufficient power/cooling, or just general flakyness)
I was considering bad RAM, I recently installed an extra 2 gig but dare I say
it, it seems to run Vista okay, and I ran Fedora 8 quite happily until recently.
I can understand your other points too although I'll give credit that it's
worked pretty well for the last 8ish months w/ both Linux and Windows.  I'll run
memtest86 when I go to sleep tonight or have dinner though.
Comment 5 Nigel Jones 2008-05-23 01:54:09 EDT
Created attachment 306449 [details]
dmesg | mcelog --ascii

Okay scrap my last comment, google'd and got 'dmesg | mcelog --ascii' this is
the result.

"HARDWARE ERROR" seems to be the tell tale sign, I take it this is referring to
my nice dual core processor and in fact not a Kernel Bug?  I'm a little
confused here, so a point in the right direction would be most appreciated.
Comment 6 Chuck Ebbert 2008-05-23 02:14:18 EDT
Looking at the mcelog manpage, I think you just want to run mcelog without any
arguments. The old method of writing events to the syslog is obsolete.
Comment 7 Nigel Jones 2008-05-23 06:29:31 EDT
(In reply to comment #6)
> Looking at the mcelog manpage, I think you just want to run mcelog without any
> arguments. The old method of writing events to the syslog is obsolete.
That returned absolutely nothing

(In reply to comment #3)
> it might be worth a run of memtest86 for a while too, just to rule out bad ram.
> These things are common indications of hardware problems of some kind (bad
> ram/insufficient power/cooling, or just general flakyness)
I think you might be right, I gave up memtest86+'ing it after 900 errors (spread
over all 4 slots) in 40 minutes.  Looks like I need to have a fiddle with the
RAM config etc and work out whats going on.

IMO it's a 'notabug' agree?
Comment 8 Dave Jones 2008-05-23 09:33:46 EDT
yeah, sounds like a hardware fault of some sort, and given the recent addition
of RAM, that's a likely suspect.  We frequently see things like this where
Windows runs just fine. It's purely by luck really. The access patterns of the
two operating systems are completely different, and perhaps Linux employs more
aggressive caching of data (or maybe we just read more of disk, or ...)
So many variables, that it's not really a data point worth putting any faith in.

Note You need to log in before you can comment on or make changes to this bug.