Red Hat Bugzilla – Bug 164091
(bad pmd) mmap bug detected in kernel shortly after crond run
Last modified: 2015-01-04 17:21:00 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6
Description of problem:
Machine was locked up and fortunately after reset produced a log of problems and stack dumps. The log itself reports a mmap bug some seconds after crond runs: there may or may not be a causal relationship. Machine was "idle" when this happend, and had a (GNOME) desktop running.
Version-Release number of selected component (if applicable):
System was installed with FC3, anaconda upgraded to FC4.
Jul 24 04:22:01 xxxxxx crond(pam_unix): session opened for user root by (uid=0)
Jul 24 04:22:08 xxxxxx kernel: mm/memory.c:105: bad pmd ffff810060eea0f0(00007fffff8cefd6).
Jul 24 04:22:08 xxxxxx kernel: sh: segfault at 0000003340d13b1a rip 0000003340d13b1a rsp 00007fffff8cc438 error 14
Jul 24 04:22:08 xxxxxx kernel: ----------- [cut here ] --------- [please bite here ] ---------
Jul 24 04:22:08 xxxxxx kernel: Kernel BUG at "mm/mmap.c":2026
Jul 24 04:22:08 xxxxxx kernel: invalid operand: 0000  SMP
Jul 24 04:22:08 xxxxxx kernel: CPU 0
We saw 'bad pmd' messages on x86-64 around 2.6.11, but these seemed to have
disappeared with 2.6.12. Can you run memtest86 on this box for a few hours just
to check the most obvious thing?
If it passes the memtest, this bug is sadly still with us.
Sorry about the delay. Took me a while, ran memtest86 through a few passes: no
It took about a month of continuous runtime, but the same exact problem (with
the same kernel) happened again, in exactly the same sequence (4:22am crond and
this time 30 seconds later the bad pmd showed up, machine locked).
Please try the latest updates-testing kernel, which contains a workaround for a
hardware bug in the CPU.
Mass update to all FC4 bugs:
An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (184.108.40.206). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.
Please retest with this update, and update this bug if necessary.
I stuck with 2.6.12-1.1435_FC4smp (updates-testing) and it has not hit this
problem in over 2 months of continuous runtime. Baring regressions in the newer
kernels (including *1526_FC4smp), it appears we're okay against this bug.
Change status of bug to appropriate resolved.