Bug 144133

Summary: crond in loop creating new crond processes and hanging the system.
Product: Red Hat Enterprise Linux 3 Reporter: Celso Medina Kern <celso.kern>
Component: vixie-cronAssignee: Marcela Mašláňová <mmaslano>
Status: CLOSED CANTFIX QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: high    
Version: 3.0CC: celso.kern
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: vixie-cron-4.1-1_EL3 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-30 09:42:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Celso Medina Kern 2005-01-04 19:33:36 UTC
Description of problem: 
System hang once a day in diferent times. Sysrq t reports lots of 
crond processes(up to 5800 seen in a dump). This exhaust system 
resources, hanging it.

If cron jobs are removed from user and system crontab, the hangs 
disappear. I have left only sa1 and mrtg only in schedule
(/etc/cron.d) and it has hung. I commented it out, reinserted other 
cron jobs and system has hung either, what leads to believe it is not 
related to an specific job, but a crond problem.

Version-Release number of selected component (if applicable):
vixie-cron-3.0.1-74

How reproducible:
Everytime

Steps to Reproduce:
1. enable system crontab jobs to run at the system predefined time.
2. wait for two days.  
Actual results:
System hang. Collecting sysrq t dump, i always see lots of crond 
processes being started in loop.

Expected results:
no crond loop.

Additional info:

This is the second system i see with this problem. The first one, 
crond was simply disabled. Both were HP Proliant DL 580 running Red 
Hat EL AS 3.0

Linux localhost 2.4.21-15.ELsmp #1 SMP Thu Apr 22 00:18:24 EDT 2004 
i686 i686 i386 GNU/Linux

Red Hat Enterprise Linux AS release 3 (Taroon Update 2)

/etc/crontab:
=============
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

/etc/cron.d:
============
mrtg file: 
0-59/5 * * * * root /usr/bin/mrtg /etc/mrtg/mrtg.cfg

sysstat:
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib/sa/sa2 -A

/etc/cron.daily:
================
00webalizer, certwatch, logrotate, makewhatis.cron, prelink, rpm, 
srotate.cron, tetex.cron, tmpwatch

/etc/cron.weekly:
=================
makewhatis.cron

The first occurencies of crond processes in sysrq are:
SysRq : Show State

                         free                        sibling
  task             PC    stack   pid father child younger older
crond         S 00000002  4424  1190      1  2612    1214  1180 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xf4e77f20)
[<c01340b5>] schedule_timeout [kernel] 0x65 (0xf4e77f64)
[<c0134040>] process_timeout [kernel] 0x0 (0xf4e77f84)
[<c01341f3>] sys_nanosleep [kernel] 0xd3 (0xf4e77f9c)

crond         S 00000000     0  2612   1190  2613    2624       
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xde7aded0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xde7adf14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xde7adf68)
[<c0160767>] sys_read [kernel] 0x97 (0xde7adf94)

sadc          D 00000003  1792  2613   2612                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xd102be90)
[<c0123a92>] sleep_on [kernel] 0x52 (0xd102bed4)
[<f88fcb58>] log_wait_commit_Rsmp_c80020b3 [jbd] 0x68 (0xd102bf04)
[<f88f7bd3>] journal_stop_Rsmp_74af6844 [jbd] 0x193 (0xd102bf1c)
[<f88f6445>] journal_start_Rsmp_25661df5 [jbd] 0xa5 (0xd102bf28)
[<c0162d69>] fsync_buffers_list [kernel] 0xe9 (0xd102bf38)
[<f88f7ccc>] journal_force_commit_Rsmp_2a9443c3 [jbd] 0x7c 
(0xd102bf4c)
[<f89140b1>] ext3_force_commit [ext3] 0x51 (0xd102bf5c)
[<f8908fb4>] ext3_sync_file [ext3] 0x84 (0xd102bf68)
[<f890c270>] ext3_writepage [ext3] 0x0 (0xd102bf70)
[<c0162600>] do_fdatasync [kernel] 0x50 (0xd102bf88)
[<c0162684>] sys_fdatasync [kernel] 0x44 (0xd102bfa8)

crond         S 00000000  5192  2624   1190  2625    2628  2612 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xf40c5ed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xf40c5f14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xf40c5f68)
[<c0160767>] sys_read [kernel] 0x97 (0xf40c5f94)

sh            D 00000001     0  2625   2624                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdecffc0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdecffc50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdecffca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdecffcc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdecffcf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdecffd1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdecffd38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdecffd4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdecffd5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdecffd9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdecffdac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdecffdb4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdecffde8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdecffe00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdecffe24)
[<c016c38c>] do_execve [kernel] 0xec (0xdecffe44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdecfffa4)

crond         S 00000002     0  2628   1190  2629    2630  2624 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdeb49ed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xdeb49f14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xdeb49f68)
[<c0160767>] sys_read [kernel] 0x97 (0xdeb49f94)

sh            D 00000002     0  2629   2628                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf1c3c0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdf1c3c50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdf1c3ca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdf1c3cc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdf1c3cf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdf1c3d1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdf1c3d38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdf1c3d4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdf1c3d5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdf1c3d9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdf1c3dac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdf1c3db4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdf1c3de8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdf1c3e00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdf1c3e24)
[<c016c38c>] do_execve [kernel] 0xec (0xdf1c3e44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdf1c3fa4)

crond         S 00000002     0  2630   1190  2631    2632  2628 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xded3ded0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xded3df14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xded3df68)
[<c0160767>] sys_read [kernel] 0x97 (0xded3df94)

sh            D 00000000     0  2631   2630                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdbe3bc0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdbe3bc50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdbe3bca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdbe3bcc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdbe3bcf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdbe3bd1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdbe3bd38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdbe3bd4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdbe3bd5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdbe3bd9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdbe3bdac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdbe3bdb4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdbe3bde8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdbe3be00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdbe3be24)
[<c016c38c>] do_execve [kernel] 0xec (0xdbe3be44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdbe3bfa4)

crond         S 00000000  1788  2632   1190  2633    2635  2630 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xce197ed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xce197f14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xce197f68)
[<c0160767>] sys_read [kernel] 0x97 (0xce197f94)

sh            S 00000000     0  2633   2632  2634               
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf193efc)
[<c012d194>] sys_wait4 [kernel] 0x1b4 (0xdf193f40)
[<c012d2f7>] sys_waitpid [kernel] 0x27 (0xdf193fac)

sh            D 00000002     0  2634   2633                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xde855c0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xde855c50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xde855ca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xde855cc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xde855cf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xde855d1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xde855d38)
[<c017cefb>] update_atime [kernel] 0x6b (0xde855d4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xde855d5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xde855d9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xde855dac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xde855db4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xde855de8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xde855e00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xde855e24)
[<c016c38c>] do_execve [kernel] 0xec (0xde855e44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xde855fa4)

crond         S 00000002     0  2635   1190  2636    2637  2632 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdebebed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xdebebf14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xdebebf68)
[<c0160767>] sys_read [kernel] 0x97 (0xdebebf94)

bash          D 00000000     0  2636   2635                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdec53cd4)
[<c01245aa>] io_schedule [kernel] 0x2a (0xdec53d18)
[<c0145d49>] ___wait_on_page [kernel] 0x89 (0xdec53d24)
[<c0146ad8>] do_generic_file_read [kernel] 0x428 (0xdec53d5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdec53d9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdec53dac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdec53db4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdec53de8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdec53e00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdec53e24)
[<c016c38c>] do_execve [kernel] 0xec (0xdec53e44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdec53fa4)

crond         S 00000000     0  2637   1190  2638    2639  2635 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xde71fed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xde71ff14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xde71ff68)
[<c0160767>] sys_read [kernel] 0x97 (0xde71ff94)

sh            D 00000002     0  2638   2637                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdebddc0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdebddc50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdebddca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdebddcc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdebddcf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdebddd1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdebddd38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdebddd4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdebddd5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdebddd9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdebdddac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdebdddb4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdebddde8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdebdde00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdebdde24)
[<c016c38c>] do_execve [kernel] 0xec (0xdebdde44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdebddfa4)

crond         S 00000001     0  2639   1190  2640    2641  2637 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf171ed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xdf171f14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xdf171f68)
[<c0160767>] sys_read [kernel] 0x97 (0xdf171f94)

sh            D 00000001     0  2640   2639                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf2f9c0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdf2f9c50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdf2f9ca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdf2f9cc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdf2f9cf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdf2f9d1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdf2f9d38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdf2f9d4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdf2f9d5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdf2f9d9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdf2f9dac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdf2f9db4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdf2f9de8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdf2f9e00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdf2f9e24)
[<c016c38c>] do_execve [kernel] 0xec (0xdf2f9e44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdf2f9fa4)

crond         S 00000001     0  2641   1190  2642    2643  2639 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf5f7ed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xdf5f7f14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xdf5f7f68)
[<c0160767>] sys_read [kernel] 0x97 (0xdf5f7f94)

sh            D 00000000     0  2642   2641                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf381c0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdf381c50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdf381ca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdf381cc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdf381cf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdf381d1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdf381d38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdf381d4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdf381d5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdf381d9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdf381dac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdf381db4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdf381de8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdf381e00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdf381e24)
[<c016c38c>] do_execve [kernel] 0xec (0xdf381e44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdf381fa4)

crond         S 00000001     0  2643   1190  2644    2646  2641 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdefd7ed0)
[<c016d4e0>] pipe_wait [kernel] 0x70 (0xdefd7f14)
[<c016d5d4>] pipe_read [kernel] 0xc4 (0xdefd7f68)
[<c0160767>] sys_read [kernel] 0x97 (0xdefd7f94)

sh            S 00000003     0  2644   2643  2645               
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf19befc)
[<c012d194>] sys_wait4 [kernel] 0x1b4 (0xdf19bf40)
[<c012d2f7>] sys_waitpid [kernel] 0x27 (0xdf19bfac)

sh            D 00000003     0  2645   2644                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdec3bc0c)
[<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdec3bc50)
[<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 
(0xdec3bca8)
[<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdec3bcc8)
[<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdec3bcf0)
[<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdec3bd1c)
[<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdec3bd38)
[<c017cefb>] update_atime [kernel] 0x6b (0xdec3bd4c)
[<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdec3bd5c)
[<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdec3bd9c)
[<c0147220>] file_read_actor [kernel] 0x0 (0xdec3bdac)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdec3bdb4)
[<c014750f>] generic_file_read [kernel] 0x2f (0xdec3bde8)
[<c016b8c2>] kernel_read [kernel] 0x72 (0xdec3be00)
[<c016bd96>] prepare_binprm [kernel] 0x136 (0xdec3be24)
[<c016c38c>] do_execve [kernel] 0xec (0xdec3be44)
[<c0109db0>] sys_execve [kernel] 0x50 (0xdec3bfa4)

crond         D 00000001     0  2646   1190  2647    2648  2643 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdef3fed0)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdef3ff14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdef3ff68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdef3ffa4)

crond         S 00000001     0  2647   2646                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdeaffd48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdeaffd8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdeaffdc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdeaffde0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdeaffe18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdeaffe40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdeaffe54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdeaffe84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdeaffe98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdeaffef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdeafff64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdeafff80)

crond         D 00000001     0  2648   1190  2649    2650  2646 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf1b3ed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdf1b3ee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf1b3f14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdf1b3f68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdf1b3fa4)

crond         S 00000000     0  2649   2648                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf159d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf159d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf159dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf159de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf159e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf159e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf159e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf159e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf159e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf159ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf159f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf159f80)

crond         D 00000001     0  2650   1190  2652    2651  2648 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf079ed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdf079ee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf079f14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdf079f68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdf079fa4)

crond         S 00000001     0  2652   2650                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf3f7d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf3f7d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf3f7dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf3f7de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf3f7e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf3f7e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf3f7e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf3f7e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf3f7e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf3f7ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf3f7f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf3f7f80)

crond         D 00000000     0  2651   1190  2653    2656  2650 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdeddfed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdeddfee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdeddff14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdeddff68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdeddffa4)

crond         S 00000000     0  2653   2651                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf2e3d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf2e3d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf2e3dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf2e3de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf2e3e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf2e3e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf2e3e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf2e3e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf2e3e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf2e3ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf2e3f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf2e3f80)

crond         D 00000002     0  2656   1190  2657    2660  2651 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf3eded0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdf3edee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf3edf14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdf3edf68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdf3edfa4)

crond         S 00000002     0  2657   2656                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf257d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf257d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf257dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf257de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf257e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf257e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf257e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf257e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf257e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf257ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf257f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf257f80)

crond         D 00000002     0  2660   1190  2661    2662  2656 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf09ded0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdf09dee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf09df14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdf09df68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdf09dfa4)

crond         S 00000003     0  2661   2660                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf613d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf613d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf613dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf613de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf613e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf613e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf613e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf613e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf613e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf613ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf613f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf613f80)

crond         D 00000002  3216  2662   1190  2663    2664  2660 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xf16e7ed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xf16e7ee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xf16e7f14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xf16e7f68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xf16e7fa4)

crond         S 00000003  3788  2663   2662                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdd63dd48)
[<c02483ff>] ip_finish_output2 [kernel] 0xcf (0xdd63dd5c)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdd63dd8c)
[<c0246560>] ip_queue_xmit [kernel] 0x310 (0xdd63dd9c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdd63ddc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdd63dde0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdd63de18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdd63de40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdd63de54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdd63de84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdd63de98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdd63def4)
[<c0123274>] schedule [kernel] 0x2f4 (0xdd63df20)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdd63df64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdd63df80)

crond         D 00000002     0  2664   1190  2665    2666  2662 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xe36dbed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xe36dbee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xe36dbf14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xe36dbf68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xe36dbfa4)

crond         S 00000002  1792  2665   2664                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf0cbd48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf0cbd8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf0cbdc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf0cbde0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf0cbe18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf0cbe40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf0cbe54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf0cbe84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf0cbe98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf0cbef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf0cbf64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf0cbf80)

crond         D 00000001     0  2666   1190  2667    2668  2664 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf43ded0)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf43df14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdf43df68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdf43dfa4)

crond         S 00000001     0  2667   2666                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdf0b1d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdf0b1d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf0b1dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf0b1de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf0b1e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf0b1e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf0b1e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf0b1e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdf0b1e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf0b1ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdf0b1f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf0b1f80)

crond         D 00000001  4144  2668   1190  2669    2670  2666 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xf16efed0)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xf16eff14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xf16eff68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xf16effa4)

crond         S 00000001  5168  2669   2668                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdd56bd48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdd56bd8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdd56bdc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdd56bde0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdd56be18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdd56be40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdd56be54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdd56be84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdd56be98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdd56bef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdd56bf64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdd56bf80)

crond         D 00000001  5308  2670   1190  2671    2672  2668 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdd693ed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdd693ee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdd693f14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdd693f68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdd693fa4)

crond         S 00000001  1792  2671   2670                     
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdd651d48)
[<c013410c>] schedule_timeout [kernel] 0xbc (0xdd651d8c)
[<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdd651dc4)
[<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdd651de0)
[<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdd651e18)
[<c0155d17>] __alloc_pages [kernel] 0x97 (0xdd651e40)
[<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdd651e54)
[<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdd651e84)
[<c021f993>] sys_sendto [kernel] 0xe3 (0xdd651e98)
[<c011f5ac>] do_page_fault [kernel] 0x14c (0xdd651ef4)
[<c021f9e7>] sys_send [kernel] 0x37 (0xdd651f64)
[<c02202a7>] sys_socketcall [kernel] 0x147 (0xdd651f80)

crond         D 00000001  1788  2672   1190  2673    2674  2670 
(NOTLB)
Call Trace:   [<c0123274>] schedule [kernel] 0x2f4 (0xdd695ed0)
[<c029abf7>] vsnprintf [kernel] 0x207 (0xdd695ee4)
[<c01238e1>] wait_for_completion [kernel] 0x71 (0xdd695f14)
[<c0126fa9>] do_fork [kernel] 0x109 (0xdd695f68)
[<c0109d57>] sys_vfork [kernel] 0x37 (0xdd695fa4)

Comment 1 Jason Vas Dias 2005-01-04 19:54:04 UTC
Please upgrade your cron to the latest version for RHEL-3, which is
vixie-cron-4.1-1_EL3 - this version does not have this problem, and
should be in RHEL-3-U5. Meanwhile, you can download it from:
   http://people.redhat.com/~jvdias/cron/RHEL-3/
By doing an RHN update to RHEL-3-U4, you would obtain 
vixie-cron-3.0.1-75.1, which I think might also fix this problem -
please try vixie-cron-4.1-1_EL3 and let me know if it works OK.


Comment 2 Celso Medina Kern 2005-01-12 12:52:28 UTC
We installed vixie-cron 4.1.1 in january, 7th. In january, 11th the 
system has hung again. This time we could not collect SysRq dump, 
because serial console was hung either. The only thing customer could 
do was to switch from graphical console to virtual console F1, but 
system was not responsive after that.

We intend to disable cron at all and monitor to see if we get rid of 
hangs.

Comment 3 Jason Vas Dias 2005-05-26 17:19:27 UTC
It would appear that this problem still can occur with the latest cron releases -
I just got another report of it today: 
> I was never able to reproduce this bug here - I just
> > suggested that people try the latest version, and it
> > seemed to fix the problem - now it appears not .
> 
> Ah. Well it's not re-produceable here. It just seems to take down oracle
> machines from time to time.
>  
> > It sounds like you have a cron job that is executed
> > frequently but which never completes . The parent
> > cron process will wait for completion of the cron
> > job child; if this never occurs, then a situation
> > as you describe could result. The problem is with
> > the cron job that never completes. I'm also working
> > on some major enhancements for cron - one of them
> > should be that if the process from a previous run
> > of the job is still active, it should not initiate
> > another run of the job - this would be a major
> > change in behavior from all previous cron releases,
> > and needs extensive testing.
> 
> Yeh. I found a lot of instances of the mailman qrunner running. I just
> got hold of a top -d output from the machine, from the day it crashed.
> 
> > Please can you send me:
> >  - The compressed /var/log/cron file from the system
> >    and your cron configuration
> >    # tar -cpf - /var/log/cron /etc/cron.d /etc/crontab /var/spool/cron > 
/tmp/cron.tar.gz

> > The latest cron version for RHEL-3 is vixie-cron-4.1-6_EL3,
> > available from:
> >    http://people.redhat.com/~jvdias/cron/RHEL-3
> > If possible, please try out this version and let me know
> > if you can reproduce the problem with it.
> 


Yes, it is possible to create an ever increasing number of crond processes,
eg. with this job:

  * * * * *  root while /bin/true; do sleep 62; done

The problem is with the job that never completes.

If cron finds a previous job run still running when it comes time start the
next run, what should it do ? 
o kill the previous process 
  - but what if there is a process that depends on being run at periodic
    intervals, but sometimes takes a bit longer than its interval ?
o not run the next process 
  - again, some processes might really depend on being kicked off at 
    regular intervals.

So I don't think it should be the default for all cron jobs to be treated
this way.

The best way of fixing this might be to create an explicit tag in the
cron job file, such as :
  ?* * * * * root while /bin/true; do sleep 62; done
meaning "Don't run this job if a previous instance is still running" and
  ?!* * * * * root while /bin/true; do sleep 62; done
meaning "Kill the previous job instance and run the next instance" .

I'll investigate such an enhancement for the next cron version - it would
need extensive testing, as it would be a major departure from the behaviour
of all previous cron releases.

Really, the best short-term solution is to make cron jobs ensure that 
they complete.





Comment 4 Jason Vas Dias 2005-05-26 18:31:57 UTC
NOTE: There is a problem with some configurations of the "mailman" 
      system which can cause this problem to occur.
      The mailman cron job SHOULD NOT contain this line:
"
* * * * * /usr/bin/python -S /var/mailman/cron/qrunner
"
      Versions of mailman that install this crontab have never
      been shipped by Red Hat for RHEL-3 and are not supported
      by Red Hat .

      The qrunner process should be run from the mailman controller
      daemon, not from cron .
 
      Please ensure that you have a supported version of mailman
      installed, ( > 2.1 ), eg: mailman-2.1.5-25.rhel3 .





Comment 5 Marcela Mašláňová 2006-10-30 09:42:47 UTC
Since there are insufficient details provided in this report for us to 
investigate the issue further, and we have not received the feedback we 
requested, we will assume the problem was not reproduceable or has been fixed 
in a later update for this product.