Bug 458748 - Repeated segfaults in a wide range of processes.
Repeated segfaults in a wide range of processes.
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
All Linux
medium Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-11 22:57 EDT by Danny Yee
Modified: 2008-08-13 21:33 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-13 21:33:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Danny Yee 2008-08-11 22:57:02 EDT
Repeated segfaults in a wide range of processes.

Upgraded a mail server to Fedora 9 with kernel 2.6.25.11-97.fc9.i686 #1 SMP

Segfaults started about ten hours after the upgrade, early in the morning.  There were nearly 400 in total, over four hours, affecting common processes: mostly dovecot-auth and spamd, but a few imap-logins.  Then there was a gap of 13 hours with no segfaults, followed by segfaults in a Networker backup process and postgrey (that was bad) in the evening.  The following morning (six hours later), there were segfaults in mostly postfix processes: smtpd, local, trivial-rewrite, qmgr, etc.

Aug 11 02:27:39 mail kernel: dovecot-auth[16512]: segfault at a01d4c ip 00126c4a sp bf87ade0 error 4 in ld-2.8.so[110000+1c000]
Aug 11 02:27:44 mail kernel: dovecot-auth[16521]: segfault at a01d4c ip 00126c4a sp bf87ade0 error 4 in ld-2.8.so[110000+1c000]
   .
   .
Aug 11 02:30:33 mail kernel: spamd[12374]: segfault at 7f ip 00233007 sp bfe98670 error 4 in libperl.so[12f000+26a000]
Aug 11 02:31:17 mail kernel: dovecot-auth[17196]: segfault at a01d4c ip 00126c4a sp bf87ade0 error 4 in ld-2.8.so[110000+1c000]
Aug 11 02:34:17 mail kernel: dovecot-auth[17956]: segfault at a01d4c ip 00126c4a sp bf87ade0 error 4 in ld-2.8.so[110000+1c000]
Aug 11 02:36:33 mail kernel: spamd[18520]: segfault at 0 ip 001f7057 sp bfe998d0 error 4 in libperl.so[12f000+26a000]
  .
  .
Aug 11 06:47:02 mail kernel: dovecot-auth[4240]: segfault at 976eb6 ip 00976eb6 sp bf87bbac error 4 in dovecot-auth[8048000+3e000]
Aug 11 06:47:02 mail kernel: imap-login[7393]: segfault at fddbf4c2 ip 0044a047 sp bfb9248c error 6 in libgssapi_krb5.so.2.2.#prelink#.PpOFcj (deleted)[44a000+2d000]
  .
  .

Aug 12 02:02:45 mail kernel: smtpd[1353]: segfault at 4 ip 00460662 sp bf9ed3a0 error 4 in libcrypto.so.0.9.8g[39d000+137000]
Aug 12 02:10:58 mail kernel: smtpd[32430]: segfault at b66db78c ip b7f02c7b sp bfc24098 error 6 in smtpd[b7eb2000+73000]
Aug 12 02:32:10 mail kernel: local[6608]: segfault at 96889d5 ip 00119985 sp bfad3abc error 4 in ld-2.8.so[110000+1c000]
Aug 12 02:32:10 mail kernel: local[31800]: segfault at 96889d5 ip 00119985 sp bfad39ac error 4 in ld-2.8.so[110000+1c000]
Aug 12 03:27:48 mail kernel: trivial-rewrite[2901]: segfault at b68d063d ip b68d063d sp bfde7a1c error 4
Aug 12 07:03:31 mail kernel: smtpd[9909]: segfault at b6800dbc ip 0044eacb sp bfa68760 error 4 in libcrypto.so.0.9.8g[39d000+137000]
Aug 12 07:41:01 mail kernel: imap-login[10515]: segfault at 0 ip 00000000 sp bfa905ec error 4
Aug 12 11:58:25 mail kernel: qmgr[2276]: segfault at 8fb20fd ip 00119985 sp bfd9e49c error 4 in ld-2.8.so[110000+1c000]

At the moment I'm still trying to stop/fix this, and I suspect it won't be reproducible once I've done that (I'm not prepared to play with a production server).

Possible clues.

The failures cluster in the early morning, when a tape dump runs, and in the evening (when a Networker backup runs).  Which suggests a disk access issue.

The server has i2o RAID arrays.  (Which used to have driver problems but have worked flawlessly for a couple of years now.)

I upgraded from Fedora 8 to Fedora 9 using yum.  Could something critical not have been updated?

Any suggestions would be most welcome.  If I get another segfault I will probably just revert to the last "known to work" Fedora 8 kernel.
Comment 1 Dave Jones 2008-08-12 00:49:33 EDT
The first thing I'd suggest is to try running memtest86 for a while.
That it only seems to trigger under high disk activity smells like bad memory or similar hardware problem.  Also, to the best of my knowledge, we've had no similar reports.

If memtest doesn't turn anything up, it would be interesting to know if the f8 kernel still works, as they're quite similar (pretty much the same code, but with different config options).
Comment 2 Danny Yee 2008-08-13 21:33:09 EDT
Yes, it was bad memory - just a coincidence that it started after my upgrade!  Sorry to trouble you all with this.

Note You need to log in before you can comment on or make changes to this bug.