Description of problem: System hang once a day in diferent times. Sysrq t reports lots of crond processes(up to 5800 seen in a dump). This exhaust system resources, hanging it. If cron jobs are removed from user and system crontab, the hangs disappear. I have left only sa1 and mrtg only in schedule (/etc/cron.d) and it has hung. I commented it out, reinserted other cron jobs and system has hung either, what leads to believe it is not related to an specific job, but a crond problem. Version-Release number of selected component (if applicable): vixie-cron-3.0.1-74 How reproducible: Everytime Steps to Reproduce: 1. enable system crontab jobs to run at the system predefined time. 2. wait for two days. Actual results: System hang. Collecting sysrq t dump, i always see lots of crond processes being started in loop. Expected results: no crond loop. Additional info: This is the second system i see with this problem. The first one, crond was simply disabled. Both were HP Proliant DL 580 running Red Hat EL AS 3.0 Linux localhost 2.4.21-15.ELsmp #1 SMP Thu Apr 22 00:18:24 EDT 2004 i686 i686 i386 GNU/Linux Red Hat Enterprise Linux AS release 3 (Taroon Update 2) /etc/crontab: ============= SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/ # run-parts 01 * * * * root run-parts /etc/cron.hourly 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly /etc/cron.d: ============ mrtg file: 0-59/5 * * * * root /usr/bin/mrtg /etc/mrtg/mrtg.cfg sysstat: # run system activity accounting tool every 10 minutes */10 * * * * root /usr/lib/sa/sa1 1 1 # generate a daily summary of process accounting at 23:53 53 23 * * * root /usr/lib/sa/sa2 -A /etc/cron.daily: ================ 00webalizer, certwatch, logrotate, makewhatis.cron, prelink, rpm, srotate.cron, tetex.cron, tmpwatch /etc/cron.weekly: ================= makewhatis.cron The first occurencies of crond processes in sysrq are: SysRq : Show State free sibling task PC stack pid father child younger older crond S 00000002 4424 1190 1 2612 1214 1180 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xf4e77f20) [<c01340b5>] schedule_timeout [kernel] 0x65 (0xf4e77f64) [<c0134040>] process_timeout [kernel] 0x0 (0xf4e77f84) [<c01341f3>] sys_nanosleep [kernel] 0xd3 (0xf4e77f9c) crond S 00000000 0 2612 1190 2613 2624 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xde7aded0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xde7adf14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xde7adf68) [<c0160767>] sys_read [kernel] 0x97 (0xde7adf94) sadc D 00000003 1792 2613 2612 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xd102be90) [<c0123a92>] sleep_on [kernel] 0x52 (0xd102bed4) [<f88fcb58>] log_wait_commit_Rsmp_c80020b3 [jbd] 0x68 (0xd102bf04) [<f88f7bd3>] journal_stop_Rsmp_74af6844 [jbd] 0x193 (0xd102bf1c) [<f88f6445>] journal_start_Rsmp_25661df5 [jbd] 0xa5 (0xd102bf28) [<c0162d69>] fsync_buffers_list [kernel] 0xe9 (0xd102bf38) [<f88f7ccc>] journal_force_commit_Rsmp_2a9443c3 [jbd] 0x7c (0xd102bf4c) [<f89140b1>] ext3_force_commit [ext3] 0x51 (0xd102bf5c) [<f8908fb4>] ext3_sync_file [ext3] 0x84 (0xd102bf68) [<f890c270>] ext3_writepage [ext3] 0x0 (0xd102bf70) [<c0162600>] do_fdatasync [kernel] 0x50 (0xd102bf88) [<c0162684>] sys_fdatasync [kernel] 0x44 (0xd102bfa8) crond S 00000000 5192 2624 1190 2625 2628 2612 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xf40c5ed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xf40c5f14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xf40c5f68) [<c0160767>] sys_read [kernel] 0x97 (0xf40c5f94) sh D 00000001 0 2625 2624 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdecffc0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdecffc50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdecffca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdecffcc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdecffcf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdecffd1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdecffd38) [<c017cefb>] update_atime [kernel] 0x6b (0xdecffd4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdecffd5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdecffd9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdecffdac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdecffdb4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdecffde8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdecffe00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdecffe24) [<c016c38c>] do_execve [kernel] 0xec (0xdecffe44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdecfffa4) crond S 00000002 0 2628 1190 2629 2630 2624 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdeb49ed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xdeb49f14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xdeb49f68) [<c0160767>] sys_read [kernel] 0x97 (0xdeb49f94) sh D 00000002 0 2629 2628 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf1c3c0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdf1c3c50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdf1c3ca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdf1c3cc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdf1c3cf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdf1c3d1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdf1c3d38) [<c017cefb>] update_atime [kernel] 0x6b (0xdf1c3d4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdf1c3d5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdf1c3d9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdf1c3dac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdf1c3db4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdf1c3de8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdf1c3e00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdf1c3e24) [<c016c38c>] do_execve [kernel] 0xec (0xdf1c3e44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdf1c3fa4) crond S 00000002 0 2630 1190 2631 2632 2628 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xded3ded0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xded3df14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xded3df68) [<c0160767>] sys_read [kernel] 0x97 (0xded3df94) sh D 00000000 0 2631 2630 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdbe3bc0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdbe3bc50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdbe3bca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdbe3bcc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdbe3bcf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdbe3bd1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdbe3bd38) [<c017cefb>] update_atime [kernel] 0x6b (0xdbe3bd4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdbe3bd5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdbe3bd9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdbe3bdac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdbe3bdb4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdbe3bde8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdbe3be00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdbe3be24) [<c016c38c>] do_execve [kernel] 0xec (0xdbe3be44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdbe3bfa4) crond S 00000000 1788 2632 1190 2633 2635 2630 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xce197ed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xce197f14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xce197f68) [<c0160767>] sys_read [kernel] 0x97 (0xce197f94) sh S 00000000 0 2633 2632 2634 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf193efc) [<c012d194>] sys_wait4 [kernel] 0x1b4 (0xdf193f40) [<c012d2f7>] sys_waitpid [kernel] 0x27 (0xdf193fac) sh D 00000002 0 2634 2633 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xde855c0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xde855c50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xde855ca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xde855cc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xde855cf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xde855d1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xde855d38) [<c017cefb>] update_atime [kernel] 0x6b (0xde855d4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xde855d5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xde855d9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xde855dac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xde855db4) [<c014750f>] generic_file_read [kernel] 0x2f (0xde855de8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xde855e00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xde855e24) [<c016c38c>] do_execve [kernel] 0xec (0xde855e44) [<c0109db0>] sys_execve [kernel] 0x50 (0xde855fa4) crond S 00000002 0 2635 1190 2636 2637 2632 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdebebed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xdebebf14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xdebebf68) [<c0160767>] sys_read [kernel] 0x97 (0xdebebf94) bash D 00000000 0 2636 2635 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdec53cd4) [<c01245aa>] io_schedule [kernel] 0x2a (0xdec53d18) [<c0145d49>] ___wait_on_page [kernel] 0x89 (0xdec53d24) [<c0146ad8>] do_generic_file_read [kernel] 0x428 (0xdec53d5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdec53d9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdec53dac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdec53db4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdec53de8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdec53e00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdec53e24) [<c016c38c>] do_execve [kernel] 0xec (0xdec53e44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdec53fa4) crond S 00000000 0 2637 1190 2638 2639 2635 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xde71fed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xde71ff14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xde71ff68) [<c0160767>] sys_read [kernel] 0x97 (0xde71ff94) sh D 00000002 0 2638 2637 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdebddc0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdebddc50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdebddca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdebddcc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdebddcf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdebddd1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdebddd38) [<c017cefb>] update_atime [kernel] 0x6b (0xdebddd4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdebddd5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdebddd9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdebdddac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdebdddb4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdebddde8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdebdde00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdebdde24) [<c016c38c>] do_execve [kernel] 0xec (0xdebdde44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdebddfa4) crond S 00000001 0 2639 1190 2640 2641 2637 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf171ed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xdf171f14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xdf171f68) [<c0160767>] sys_read [kernel] 0x97 (0xdf171f94) sh D 00000001 0 2640 2639 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf2f9c0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdf2f9c50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdf2f9ca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdf2f9cc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdf2f9cf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdf2f9d1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdf2f9d38) [<c017cefb>] update_atime [kernel] 0x6b (0xdf2f9d4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdf2f9d5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdf2f9d9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdf2f9dac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdf2f9db4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdf2f9de8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdf2f9e00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdf2f9e24) [<c016c38c>] do_execve [kernel] 0xec (0xdf2f9e44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdf2f9fa4) crond S 00000001 0 2641 1190 2642 2643 2639 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf5f7ed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xdf5f7f14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xdf5f7f68) [<c0160767>] sys_read [kernel] 0x97 (0xdf5f7f94) sh D 00000000 0 2642 2641 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf381c0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdf381c50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdf381ca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdf381cc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdf381cf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdf381d1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdf381d38) [<c017cefb>] update_atime [kernel] 0x6b (0xdf381d4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdf381d5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdf381d9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdf381dac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdf381db4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdf381de8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdf381e00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdf381e24) [<c016c38c>] do_execve [kernel] 0xec (0xdf381e44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdf381fa4) crond S 00000001 0 2643 1190 2644 2646 2641 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdefd7ed0) [<c016d4e0>] pipe_wait [kernel] 0x70 (0xdefd7f14) [<c016d5d4>] pipe_read [kernel] 0xc4 (0xdefd7f68) [<c0160767>] sys_read [kernel] 0x97 (0xdefd7f94) sh S 00000003 0 2644 2643 2645 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf19befc) [<c012d194>] sys_wait4 [kernel] 0x1b4 (0xdf19bf40) [<c012d2f7>] sys_waitpid [kernel] 0x27 (0xdf19bfac) sh D 00000003 0 2645 2644 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdec3bc0c) [<f88f6ea8>] do_get_write_access [jbd] 0x508 (0xdec3bc50) [<f88f7032>] journal_get_write_access_Rsmp_ba6f366a [jbd] 0x52 (0xdec3bca8) [<f890ea1e>] ext3_reserve_inode_write [ext3] 0x7e (0xdec3bcc8) [<f890ea9b>] ext3_mark_inode_dirty [ext3] 0x2b (0xdec3bcf0) [<f890ebdc>] ext3_dirty_inode [ext3] 0x10c (0xdec3bd1c) [<c017b5f6>] __mark_inode_dirty [kernel] 0xb6 (0xdec3bd38) [<c017cefb>] update_atime [kernel] 0x6b (0xdec3bd4c) [<c0146a19>] do_generic_file_read [kernel] 0x369 (0xdec3bd5c) [<c01473e5>] generic_file_new_read [kernel] 0xc5 (0xdec3bd9c) [<c0147220>] file_read_actor [kernel] 0x0 (0xdec3bdac) [<c016f3e6>] link_path_walk [kernel] 0x656 (0xdec3bdb4) [<c014750f>] generic_file_read [kernel] 0x2f (0xdec3bde8) [<c016b8c2>] kernel_read [kernel] 0x72 (0xdec3be00) [<c016bd96>] prepare_binprm [kernel] 0x136 (0xdec3be24) [<c016c38c>] do_execve [kernel] 0xec (0xdec3be44) [<c0109db0>] sys_execve [kernel] 0x50 (0xdec3bfa4) crond D 00000001 0 2646 1190 2647 2648 2643 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdef3fed0) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdef3ff14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdef3ff68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdef3ffa4) crond S 00000001 0 2647 2646 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdeaffd48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdeaffd8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdeaffdc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdeaffde0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdeaffe18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdeaffe40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdeaffe54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdeaffe84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdeaffe98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdeaffef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdeafff64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdeafff80) crond D 00000001 0 2648 1190 2649 2650 2646 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf1b3ed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdf1b3ee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf1b3f14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdf1b3f68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdf1b3fa4) crond S 00000000 0 2649 2648 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf159d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf159d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf159dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf159de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf159e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf159e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf159e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf159e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf159e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf159ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf159f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf159f80) crond D 00000001 0 2650 1190 2652 2651 2648 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf079ed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdf079ee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf079f14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdf079f68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdf079fa4) crond S 00000001 0 2652 2650 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf3f7d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf3f7d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf3f7dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf3f7de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf3f7e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf3f7e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf3f7e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf3f7e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf3f7e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf3f7ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf3f7f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf3f7f80) crond D 00000000 0 2651 1190 2653 2656 2650 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdeddfed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdeddfee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdeddff14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdeddff68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdeddffa4) crond S 00000000 0 2653 2651 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf2e3d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf2e3d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf2e3dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf2e3de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf2e3e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf2e3e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf2e3e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf2e3e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf2e3e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf2e3ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf2e3f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf2e3f80) crond D 00000002 0 2656 1190 2657 2660 2651 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf3eded0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdf3edee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf3edf14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdf3edf68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdf3edfa4) crond S 00000002 0 2657 2656 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf257d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf257d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf257dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf257de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf257e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf257e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf257e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf257e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf257e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf257ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf257f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf257f80) crond D 00000002 0 2660 1190 2661 2662 2656 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf09ded0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdf09dee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf09df14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdf09df68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdf09dfa4) crond S 00000003 0 2661 2660 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf613d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf613d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf613dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf613de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf613e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf613e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf613e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf613e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf613e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf613ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf613f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf613f80) crond D 00000002 3216 2662 1190 2663 2664 2660 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xf16e7ed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xf16e7ee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xf16e7f14) [<c0126fa9>] do_fork [kernel] 0x109 (0xf16e7f68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xf16e7fa4) crond S 00000003 3788 2663 2662 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdd63dd48) [<c02483ff>] ip_finish_output2 [kernel] 0xcf (0xdd63dd5c) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdd63dd8c) [<c0246560>] ip_queue_xmit [kernel] 0x310 (0xdd63dd9c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdd63ddc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdd63dde0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdd63de18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdd63de40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdd63de54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdd63de84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdd63de98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdd63def4) [<c0123274>] schedule [kernel] 0x2f4 (0xdd63df20) [<c021f9e7>] sys_send [kernel] 0x37 (0xdd63df64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdd63df80) crond D 00000002 0 2664 1190 2665 2666 2662 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xe36dbed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xe36dbee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xe36dbf14) [<c0126fa9>] do_fork [kernel] 0x109 (0xe36dbf68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xe36dbfa4) crond S 00000002 1792 2665 2664 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf0cbd48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf0cbd8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf0cbdc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf0cbde0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf0cbe18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf0cbe40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf0cbe54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf0cbe84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf0cbe98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf0cbef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf0cbf64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf0cbf80) crond D 00000001 0 2666 1190 2667 2668 2664 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf43ded0) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdf43df14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdf43df68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdf43dfa4) crond S 00000001 0 2667 2666 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdf0b1d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdf0b1d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdf0b1dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdf0b1de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdf0b1e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdf0b1e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdf0b1e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdf0b1e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdf0b1e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdf0b1ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdf0b1f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdf0b1f80) crond D 00000001 4144 2668 1190 2669 2670 2666 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xf16efed0) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xf16eff14) [<c0126fa9>] do_fork [kernel] 0x109 (0xf16eff68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xf16effa4) crond S 00000001 5168 2669 2668 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdd56bd48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdd56bd8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdd56bdc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdd56bde0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdd56be18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdd56be40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdd56be54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdd56be84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdd56be98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdd56bef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdd56bf64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdd56bf80) crond D 00000001 5308 2670 1190 2671 2672 2668 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdd693ed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdd693ee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdd693f14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdd693f68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdd693fa4) crond S 00000001 1792 2671 2670 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdd651d48) [<c013410c>] schedule_timeout [kernel] 0xbc (0xdd651d8c) [<c028b05d>] unix_wait_for_peer [kernel] 0xbd (0xdd651dc4) [<c02216ae>] sock_alloc_send_pskb [kernel] 0xce (0xdd651de0) [<c028ba89>] unix_dgram_sendmsg [kernel] 0x229 (0xdd651e18) [<c0155d17>] __alloc_pages [kernel] 0x97 (0xdd651e40) [<c021e6e8>] sock_sendmsg [kernel] 0x78 (0xdd651e54) [<c021e4bc>] sockfd_lookup [kernel] 0x1c (0xdd651e84) [<c021f993>] sys_sendto [kernel] 0xe3 (0xdd651e98) [<c011f5ac>] do_page_fault [kernel] 0x14c (0xdd651ef4) [<c021f9e7>] sys_send [kernel] 0x37 (0xdd651f64) [<c02202a7>] sys_socketcall [kernel] 0x147 (0xdd651f80) crond D 00000001 1788 2672 1190 2673 2674 2670 (NOTLB) Call Trace: [<c0123274>] schedule [kernel] 0x2f4 (0xdd695ed0) [<c029abf7>] vsnprintf [kernel] 0x207 (0xdd695ee4) [<c01238e1>] wait_for_completion [kernel] 0x71 (0xdd695f14) [<c0126fa9>] do_fork [kernel] 0x109 (0xdd695f68) [<c0109d57>] sys_vfork [kernel] 0x37 (0xdd695fa4)
Please upgrade your cron to the latest version for RHEL-3, which is vixie-cron-4.1-1_EL3 - this version does not have this problem, and should be in RHEL-3-U5. Meanwhile, you can download it from: http://people.redhat.com/~jvdias/cron/RHEL-3/ By doing an RHN update to RHEL-3-U4, you would obtain vixie-cron-3.0.1-75.1, which I think might also fix this problem - please try vixie-cron-4.1-1_EL3 and let me know if it works OK.
We installed vixie-cron 4.1.1 in january, 7th. In january, 11th the system has hung again. This time we could not collect SysRq dump, because serial console was hung either. The only thing customer could do was to switch from graphical console to virtual console F1, but system was not responsive after that. We intend to disable cron at all and monitor to see if we get rid of hangs.
It would appear that this problem still can occur with the latest cron releases - I just got another report of it today: > I was never able to reproduce this bug here - I just > > suggested that people try the latest version, and it > > seemed to fix the problem - now it appears not . > > Ah. Well it's not re-produceable here. It just seems to take down oracle > machines from time to time. > > > It sounds like you have a cron job that is executed > > frequently but which never completes . The parent > > cron process will wait for completion of the cron > > job child; if this never occurs, then a situation > > as you describe could result. The problem is with > > the cron job that never completes. I'm also working > > on some major enhancements for cron - one of them > > should be that if the process from a previous run > > of the job is still active, it should not initiate > > another run of the job - this would be a major > > change in behavior from all previous cron releases, > > and needs extensive testing. > > Yeh. I found a lot of instances of the mailman qrunner running. I just > got hold of a top -d output from the machine, from the day it crashed. > > > Please can you send me: > > - The compressed /var/log/cron file from the system > > and your cron configuration > > # tar -cpf - /var/log/cron /etc/cron.d /etc/crontab /var/spool/cron > /tmp/cron.tar.gz > > The latest cron version for RHEL-3 is vixie-cron-4.1-6_EL3, > > available from: > > http://people.redhat.com/~jvdias/cron/RHEL-3 > > If possible, please try out this version and let me know > > if you can reproduce the problem with it. > Yes, it is possible to create an ever increasing number of crond processes, eg. with this job: * * * * * root while /bin/true; do sleep 62; done The problem is with the job that never completes. If cron finds a previous job run still running when it comes time start the next run, what should it do ? o kill the previous process - but what if there is a process that depends on being run at periodic intervals, but sometimes takes a bit longer than its interval ? o not run the next process - again, some processes might really depend on being kicked off at regular intervals. So I don't think it should be the default for all cron jobs to be treated this way. The best way of fixing this might be to create an explicit tag in the cron job file, such as : ?* * * * * root while /bin/true; do sleep 62; done meaning "Don't run this job if a previous instance is still running" and ?!* * * * * root while /bin/true; do sleep 62; done meaning "Kill the previous job instance and run the next instance" . I'll investigate such an enhancement for the next cron version - it would need extensive testing, as it would be a major departure from the behaviour of all previous cron releases. Really, the best short-term solution is to make cron jobs ensure that they complete.
NOTE: There is a problem with some configurations of the "mailman" system which can cause this problem to occur. The mailman cron job SHOULD NOT contain this line: " * * * * * /usr/bin/python -S /var/mailman/cron/qrunner " Versions of mailman that install this crontab have never been shipped by Red Hat for RHEL-3 and are not supported by Red Hat . The qrunner process should be run from the mailman controller daemon, not from cron . Please ensure that you have a supported version of mailman installed, ( > 2.1 ), eg: mailman-2.1.5-25.rhel3 .
Since there are insufficient details provided in this report for us to investigate the issue further, and we have not received the feedback we requested, we will assume the problem was not reproduceable or has been fixed in a later update for this product.