Bug 97471 - System locks up weekly
Summary: System locks up weekly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: sendmail
Version: 9
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Florian La Roche
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-06-16 14:01 UTC by Need Real Name
Modified: 2007-04-18 16:54 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-07-08 13:34:32 UTC
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2003-06-16 14:01:45 UTC
Description of problem:
The system seems to go into some sort of writing sequence to the Hard Drive and 
starts showing a high load average at various times on Sunday.  Monday morning 
I have to hit the reset button to reboot the system to gain access.  I cannot 
even log into the console or SSH into it.

[1024010]  watson1.medna.com.cpu red Sun Jun 15 12:37:16 CDT 2003 up: 6 days, 0 
users, 69 procs, load=312
LOAD AVG on watson1,medna,com is 312

Jun 15 12:55:37 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 12
Jun 15 12:55:51 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 12
Jun 15 12:56:09 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 12
Jun 15 12:56:28 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 12
Jun 15 12:57:37 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 12
Jun 15 12:57:54 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 14
Jun 15 12:58:08 watson1 sendmail[1686]: rejecting connections on daemon MTA: 
load average: 15

Monitor Report: 
Sun Jun 15 12:37:17 2003 red  19:17:12 
Sun Jun 15 04:06:20 2003 red 0:09:37 
Sun Jun 8 06:34:07 2003 red  1 day 01:20:22 


Version-Release number of selected component (if applicable):


How reproducible:
Every Sunday - various times

Steps to Reproduce:
1.  RH 9.0 
2.  up2date
3.  Snort 2.0, Syslog, Daemon, Big Brother, MRTG
    
Actual results:


Expected results:


Additional info:
/etc/cron.daily/date:

15 Jun 04:02:43 ntpdate[26515]: step time server 192.43.244.18 offset -1.639247 
sec


 04:02:45  up 5 days, 20:10,  0 users,  load average: 1.57, 0.76, 0.51
65 processes: 64 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   7.0% user   3.8% system   0.0% nice   0.0% iowait  89.0% idle
Mem:   125984k av,  123820k used,    2164k free,       0k shrd,    3920k buff
                      9192k actv,    2320k in_d,    1716k in_c
Swap:  514032k av,   59640k used,  454392k free                    6776k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
26517 bb        24   0  1036 1036   848 R     1.9  0.8   0:00   0 top
    1 root      15   0    80   52    24 S     0.0  0.0   0:04   0 init
    2 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 keventd
    3 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kapmd
    4 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0 ksoftirqd_CPU
    9 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 bdflush
    5 root      15   0     0    0     0 DW    0.0  0.0   0:07   0 kswapd
    6 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kscand/DMA
    7 root      15   0     0    0     0 SW    0.0  0.0   0:03   0 kscand/Normal
    8 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kscand/HighMe
   10 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kupdated
   11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 mdrecoveryd
   17 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 scsi_eh_0
   20 root      15   0     0    0     0 SW    0.0  0.0   0:32   0 kjournald
   78 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 khubd
 1165 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
 1166 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
 1507 root      15   0   348  320   252 S     0.0  0.2   0:36   0 syslogd
 1511 root      15   0    56    4     0 S     0.0  0.0   0:00   0 klogd
 1548 rpcuser   25   0    76    4     0 S     0.0  0.0   0:00   0 rpc.statd
 1616 root      24   0    52    4     0 S     0.0  0.0   0:00   0 apmd
 1653 root      15   0   244    4     0 S     0.0  0.0   0:02   0 sshd
 1667 root      23   0   152    4     0 S     0.0  0.0   0:00   0 xinetd
 1686 root      15   0   804  320   152 S     0.0  0.2   0:01   0 sendmail
 1695 smmsp     15   0   532    4     0 S     0.0  0.0   0:00   0 sendmail
 1705 root      15   0    56    4     0 S     0.0  0.0   0:00   0 gpm
 1714 root      15   0    72    4     0 S     0.0  0.0   0:00   0 crond
 1806 xfs       15   0  2300    4     0 S     0.0  0.0   0:00   0 xfs
 1826 daemon    15   0    60    4     0 S     0.0  0.0   0:00   0 atd
 1836 root      15   0    48    4     0 S     0.0  0.0   0:00   0 rhnsd
 1849 root      15   0    68   20     4 S     0.0  0.0   0:38   0 portsentry
 1851 root      15   0    84   32    16 S     0.0  0.0   0:02   0 portsentry
 1951 bb        16   0   232   68    20 S     0.0  0.0   1:38   0 bbd
 1979 bb        15   0   244    4     0 S     0.0  0.0   0:00   0 runbb.sh
 1981 bb        15   0   244    4     0 S     0.0  0.0   0:00   0 runbb.sh
 1985 bb        24   0   244    4     0 S     0.0  0.0   0:00   0 runbb.sh
 1986 bb        20   0   180    4     0 S     0.0  0.0   0:06   0 bbrun
 2255 root      15   0  2540   80    48 S     0.0  0.0   0:13   0 httpd
 2273 root      18   0  5960    4     0 S     0.0  0.0   2:21   0 mrtg
 2276 root      21   0  5992    4     0 S     0.0  0.0   8:27   0 mrtg
 2277 root      21   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2278 root      21   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2279 root      21   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2280 root      21   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2281 root      21   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2282 root      21   0    52    4     0 S     0.0  0.0   0:00   0 mingetty
 2368 bb        19   0   176    4     0 S     0.0  0.0   0:04   0 bbrun
 4784 bb        19   0   176    4     0 S     0.0  0.0   0:04   0 bbrun
26910 root      15   0 39208 2404   168 S     0.0  1.9  54:04   0 snort
23963 bb        20   0    76    4     0 S     0.0  0.0   0:00   0 sleep
25790 bb        23   0    80    4     0 S     0.0  0.0   0:00   0 sleep
26250 bb        19   0    76    4     0 S     0.0  0.0   0:00   0 sleep
26255 root      20   0    72    4     0 S     0.0  0.0   0:00   0 crond
26256 root      18   0   392  372   268 S     0.0  0.2   0:00   0 run-parts
26513 root      22   0   696  696   580 S     0.0  0.5   0:00   0 date
26514 root      21   0   476  472   380 S     0.0  0.3   0:00   0 awk
26516 root      23   0   548  548   388 S     0.0  0.4   0:00   0 su
root     pts/0        ghostsrvr.medna. Fri Jun 13 08:12 - 17:20  (09:08)    
root     pts/0        ghostsrvr.medna. Wed Jun 11 08:44 - 17:00  (08:16)    
root     pts/0        ghostsrvr.medna. Tue Jun 10 07:56 - 11:25  (03:29)    
root     pts/0        ghostsrvr.medna. Mon Jun  9 15:59 - 16:52  (00:53)    
root     pts/0        ghostsrvr.medna. Mon Jun  9 08:01 - 13:57  (05:55)    
reboot   system boot  2.4.20-13.9      Mon Jun  9 07:53         (5+20:08)   
root     pts/1        ghostsrvr.medna. Fri Jun  6 09:16 - 09:30  (00:13)    
root     pts/0        ghostsrvr.medna. Thu Jun  5 08:33 - 13:32  (04:58)    
root     pts/0        ghostsrvr.medna. Wed Jun  4 08:38 - 11:37  (02:58)    
root     pts/0        ghostsrvr.medna. Wed Jun  4 08:32 - 08:38  (00:05)    
root     pts/0        ghostsrvr.medna. Mon Jun  2 08:30 - 16:32  (08:01)    
reboot   system boot  2.4.20-13.9      Mon Jun  2 08:27         (12+19:35)  

wtmp begins Mon Jun  2 08:23:00 2003

btmp begins Thu Jun  5 09:09:38 2003
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 *:1984                  *:*                     LISTEN      
tcp        0      0 *:1024                  *:*                     LISTEN      
tcp        0      0 watson1.medna.com:1025  *:*                     LISTEN      
tcp        0      0 *:http                  *:*                     LISTEN      
tcp        0      0 *:ssh                   *:*                     LISTEN      
tcp        0      0 *:smtp                  *:*                     LISTEN      
tcp        0      0 *:https                 *:*                     LISTEN      
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4075 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4074 TIME_WAIT   
tcp        0      0 watson1.medna.com:2689  mail.medna.com:smtp     TIME_WAIT   
tcp        0      0 watson1.medna.com:2688  watson1.medna.com:smtp  TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4079 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4078 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4076 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4083 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4082 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4081 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4080 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4070 TIME_WAIT   
tcp        0      0 watson1.medna.com:http  monitor03.medna.co:4084 TIME_WAIT   
udp        0      0 *:1024                  *:*                                 
udp        0      0 *:1025                  *:*                                 
udp        0      0 *:syslog                *:*                                 
udp        0      0 *:1027                  *:*                                 
udp        0      0 *:876                   *:*                                 
raw        0      0 *:tcp                   *:*                     7           
raw        0      0 *:udp                   *:*                     7           
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  2      [ ACC ]     STREAM     LISTENING     2244   /dev/gpmctl
unix  12     [ ]         DGRAM                    1668   /dev/log
unix  2      [ ACC ]     STREAM     LISTENING     2435   /tmp/.font-unix/fs7100
unix  2      [ ]         DGRAM                    4121232 
unix  2      [ ]         DGRAM                    2510   
unix  2      [ ]         DGRAM                    2495   
unix  2      [ ]         DGRAM                    2243   
unix  2      [ ]         DGRAM                    2210   
unix  2      [ ]         DGRAM                    2196   
unix  2      [ ]         DGRAM                    2144   
unix  2      [ ]         DGRAM                    1878   
unix  2      [ ]         DGRAM                    1730   
unix  2      [ ]         DGRAM                    1680   
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda6             11803992   2342608   8861760  21% /
/dev/sda5               101089     14438     81432  16% /boot
/dev/sda2              5036316   1764232   3016252  37% /old
none                     62992         0     62992   0% /dev/shm
  PID TTY          TIME CMD
    1 ?        00:00:04 init
    2 ?        00:00:00 keventd
    3 ?        00:00:00 kapmd
    4 ?        00:00:00 ksoftirqd_CPU0
    9 ?        00:00:00 bdflush
    5 ?        00:00:07 kswapd
    6 ?        00:00:00 kscand/DMA
    7 ?        00:00:03 kscand/Normal
    8 ?        00:00:00 kscand/HighMem
   10 ?        00:00:00 kupdated
   11 ?        00:00:00 mdrecoveryd
   17 ?        00:00:00 scsi_eh_0
   20 ?        00:00:32 kjournald
   78 ?        00:00:00 khubd
 1165 ?        00:00:00 kjournald
 1166 ?        00:00:00 kjournald
 1507 ?        00:00:36 syslogd
 1511 ?        00:00:00 klogd
 1548 ?        00:00:00 rpc.statd
 1616 ?        00:00:00 apmd
 1653 ?        00:00:02 sshd
 1667 ?        00:00:00 xinetd
 1686 ?        00:00:01 sendmail
 1695 ?        00:00:00 sendmail
 1705 ?        00:00:00 gpm
 1714 ?        00:00:00 crond
 1806 ?        00:00:00 xfs
 1826 ?        00:00:00 atd
 1836 ?        00:00:00 rhnsd
 1849 ?        00:00:38 portsentry
 1851 ?        00:00:02 portsentry
 1951 ?        00:01:38 bbd
 1979 ?        00:00:00 runbb.sh
 1981 ?        00:00:00 runbb.sh
 1985 ?        00:00:00 runbb.sh
 1986 ?        00:00:06 bbrun
 2255 ?        00:00:04 httpd
 2273 ?        00:02:21 mrtg
 2276 ?        00:08:27 mrtg
 2277 tty1     00:00:00 mingetty
 2278 tty2     00:00:00 mingetty
 2279 tty3     00:00:00 mingetty
 2280 tty4     00:00:00 mingetty
 2281 tty5     00:00:00 mingetty
 2282 tty6     00:00:00 mingetty
 2368 ?        00:00:04 bbrun
 4784 ?        00:00:04 bbrun
26910 ?        00:54:04 snort
17214 ?        00:00:01 httpd
31933 ?        00:00:01 httpd
15763 ?        00:00:01 httpd
18350 ?        00:00:01 httpd
23081 ?        00:00:01 httpd
12286 ?        00:00:01 httpd
 5197 ?        00:00:00 httpd
32330 ?        00:00:00 httpd
23963 ?        00:00:00 sleep
25790 ?        00:00:00 sleep
26250 ?        00:00:00 sleep
26255 ?        00:00:00 crond
26256 ?        00:00:00 run-parts
26513 ?        00:00:00 date
26514 ?        00:00:00 awk
26547 ?        00:00:00 sendmail
26549 ?        00:00:00 ps
/etc/cron.daily/logrotate:

Null message body; hope that's ok
Null message body; hope that's ok

Comment 1 Need Real Name 2003-06-16 14:12:24 UTC
CRONTAB-----------------------
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=support
HOME=/

# run-parts
01 * * * * root run-parts /etc/cron.hourly
16 * * * * root run-parts /etc/cron.minute
32 * * * * root run-parts /etc/cron.minute
47 * * * * root run-parts /etc/cron.minute
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly
-----------------------------------
[root@watson1 cron.weekly]# ls -al
total 76
drwxr-xr-x    2 root     root         4096 May  2 18:16 .
drwxr-xr-x   61 root     root         4096 Jun 16 08:46 ..
-rwxr-xr-x    1 root     root          277 Jan 24 15:26 0anacron
-rwxr-xr-x    1 root     root          414 Feb 10 09:20 makewhatis.cron
-rwxr-xr-x    1 root     root          455 May 21  2002 snortlogs
-rw-r--r--    1 root     root        50908 Jun 15 04:24 test
---------------------------------------------


Comment 2 Florian La Roche 2003-06-30 14:39:58 UTC
sendmail stops processing email if the load is too high. This can be adjusted
as a configuration param if the machine should still accept new email with very
high load.

Overall looks fine from a sendmail perspective.

Thanks for your bug-report,

Florian La Roche


Comment 3 Need Real Name 2003-06-30 21:46:21 UTC
I have changed the settings in sendmail.mc, but am getting errors in the 
teens.  I don't thing the settings are transferring to sendmail.cf from 
sendmail.mc.  Yes, I ran the make command.

---------------from sendmail.mc
dnl define(`confQUEUE_LA', `20')dnl
dnl define(`confREFUSE_LA', `26')dnl
-------------------

---------------from sendmail.cf
# load average at which we just queue messages
#O QueueLA=8

# load average at which we refuse connections
#O RefuseLA=12
-----------------


un 30 15:44:49 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 16
Jun 30 15:45:09 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 17
Jun 30 15:45:23 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 18
Jun 30 15:45:42 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 17
Jun 30 15:46:02 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 19
Jun 30 15:46:20 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 18
Jun 30 15:46:38 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 18
Jun 30 15:46:56 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 17
Jun 30 15:47:15 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 17
Jun 30 15:47:30 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 18
Jun 30 15:47:50 watson1 sendmail[1666]: rejecting connections on daemon MTA: 
load average: 18


Comment 4 Need Real Name 2003-07-08 13:34:32 UTC
Oops - my bad.
---------------from sendmail.mc
define(`confQUEUE_LA', `20')dnl
define(`confREFUSE_LA', `26')dnl
-------------------

Removed the "dnl" to it could compile.  Now showing in sendmail.cf


Note You need to log in before you can comment on or make changes to this bug.