Description of problem:
I have recently built a new mail server (Postfix & Cyrus-Imap) which
authenticates users from my Windows 2003 AD. Prior to installation, I
ran the full suite of memory tests from the install cd with no errors.
During testing, the system had 2-3 NMI errors and locked up. These
seemed to happen most often when accessing the CD/DVD drive. Adding
"acpi=off" seemed to have fixed that problem.
I migrated our old, failing mail system to this machine and put it
into production. This morning, on it's 4th day of operation, I found
it locked up and had to do a hard reset. No video, keyboard led's,
nothing. /var/log/messages had no indication of the problem, just an
IMAP login (or off), then the kernel bootup messages when I rebooted.
This afternoon, while attempting to clean up the backup routine, I was
attempting to rsync files from one directory to another and the system
locked up again. This time, the caps-lock and scroll-lock keys were
In addition to "acpi=off", I have added "nmi_watchdog=1" to my kernel
boot options even though I no longer get the NMI error. I think that
may be what allowed the keyboard leds to blink this time however.
This is a dual PIII-1Ghz with 2GB RAM (Dell/Crucial), Dell PERC, one
unused e100 interface and an add-in e1000 card.
It's a basic server install with Postfix, Cyrus-imapd, squirrelmail.
I have added Trend Micro's Interscan VirusWall for Unix and applied
all available updates. Also installed is awstats and keychain.
I've applied all RHN updates EXCEPT for the krb5-libs &
krb5-workstation. kernel-smp-2.6.9-5.0.3.EL was applied but not
booted until after first lockup today. It's now the default and
SELinux was disabled after first lockup. It had been running in
permissive mode, but winbind generated a lot of warnings.
Most non-essential services have been disabled (PCMCIA, ISDN, etc)
Same system ran with no problems under Windows 2000 and RedHat 9.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Created attachment 111873 [details]
lspc and /var/log/messages
A bit more experimenting (actually, just trying to back up the
machine) and I feel I can reproduce the problem fairly reliably now.
I have a simple shell script that stops Cyrus-Imapd and Postfix,
rsync's the mail data and config directories to a backup directory,
restart the services, then creates an iso image from the backup
directory to be burned onto DVD.
Running this script will lock up the system during the rsync phase.
Sometimes, it locks up immediately and no rsync'ing is done other
times it gets a small way into it.
As the machine locked up when I ran it this morning, I decided to try
in run-level 1 and unloaded both e1000 and e100 drivers. Running the
script in this environment worked, although I stopped it during the
mkisofs phase to get e-mail back on line for the users.
Searching on-line does reveal a history of lockups related to the
# postfix rc.d script also starts/stops TrendMicro Interscan Viruswall
# Added verbose and progress options to rsync commands to trace when
partition=`grep "^partition-default:" /etc/imapd.conf | cut -f2 -d" "`
config=`grep "^configdirectory:" /etc/imapd.conf | cut -f2 -d" "`
service postfix stop
service cyrus-imapd stop
su - cyrus -c "/usr/lib/cyrus-imapd/ctl_mboxlist -d" >
rsync -avR --progress --delete $partition $BACKUPDIR
rsync -avR --progress --delete $config $BACKUPDIR
rsync -avR --progress --delete /etc/imapd.conf $BACKUPDIR
rsync -avR --progress --delete /etc/cyrus.conf $BACKUPDIR
rsync -avR --progress --delete /etc/postfix $BACKUPDIR
rsync -avR --progress --delete /var/spool/postfix $BACKUPDIR
service cyrus-imapd start
service postfix start
mkisofs -R -J -o /backup/mail/mail-backup.iso /backup/mail/*
I have switched from the e1000 NIC to the e100, rebooted and unloaded
the e1000 module (for some reason it loaded after reboot anyway), then
attempted a backup. This time the backup worked without a problem.
Based on this, I'd say there is a problem with the e1000 module that
causes certain configurations to lock up.
Please let me know if more information is needed or additional testing
I'm not sure if this is a request for support, or simply an attempt to let us
know of a bug. Bugzilla is simply a bug reporting tool, and not a support
If you require support, please contact support by calling 800-REDHAT1 or by
going to http://www.redhat.com/support.
Otherwise, thank you for letting us know about the problem!
hmm. we've updated the e1000 in U2 to version 6.0.54-k2-NAPI. If you want to
test the beta kernel its at: http://people.redhat.com/~jbaron/rhel4/
"Searching on-line does reveal a history of lockups related to the
Could you provide a pointer to this information? It may help to identify what
you are seeing...thanks!
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life.
Please See https://access.redhat.com/support/policy/updates/errata/
If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.