Description of problem: Running an EC2E job on RHEL4, the mailer dumps during the finalization process Version-Release number of selected component (if applicable): condor-7.2.0-0.7 How reproducible: universe = vanilla executable = multi_output.sh output = stdout.$(PROCESS) error = /tmp/job.stderr.$(PROCESS) requirements = Arch == "INTEL" log = ulog.$(PROCESS) transfer_output_files = out1,/tmp/out2 should_transfer_files = yes when_to_transfer_output = on_exit +WantAWS = True +WantArch = "INTEL" +WantCPUs = 1 +AmazonAccessKey = "/home/rsquared/.ec2/access_key" +AmazonSecretKey = "/home/rsquared/.ec2/secret_access_key" +AmazonPublicKey = "/home/rsquared/.ec2/cert-B4R2HOPR74CAW5EGDKWTQEUND6EX6Y2G.pem" +AmazonPrivateKey = "/home/rsquared/.ec2/pk-B4R2HOPR74CAW5EGDKWTQEUND6EX6Y2G.pem" +AmazonKeyPairFile = "/tmp/keypair-$(PROCESS)" queue Steps to Reproduce: 1. Submit EC2E job 2. Tail /var/lib/condog/log/JobRouterLog 3. Actual results: 12/5 00:17:32 FileLock object is updating timestamp on: /home/testmonkey/ec2e/ulog.0 12/5 00:17:32 FileLock::obtain(1) - @1228454252.800580 lock on /home/testmonkey/ec2e/ulog.0 now WRITE 12/5 00:17:32 FileLock::obtain(2) - @1228454252.804652 lock on /home/testmonkey/ec2e/ulog.0 now UNLOCKED 12/5 00:17:32 Forking Mailer process... Stack dump for process 14584 at timestamp 1228454252 (22 frames) condor_job_router(dprintf_dump_stack+0x9b)[0x4bfddc] condor_job_router[0x4c0053] /lib64/tls/libc.so.6[0x32c0d2e300] 12/5 00:17:32 JobRouter (src=4.0,dest=5.0,route=Amazon Small): finalized job /lib64/tls/libc.so.6(gsignal+0x3d)[0x32c0d2e26d] /lib64/tls/libc.so.6(abort+0xfe)[0x32c0d2fa6e] condor_job_router(_EXCEPT_+0x178)[0x4be76a] condor_job_router[0x4c1462] condor_job_router(email_open+0x35c)[0x4c10cc] condor_job_router[0x4cc667] condor_job_router(_Z15email_user_openP7ClassAdPKc+0x59)[0x4cc525] condor_job_router(_Z19EmailTerminateEventP7ClassAdb+0xf4)[0x476b2c] condor_job_router(_Z19EmailTerminateEventRKN7classad7ClassAdE+0x8f)[0x4770cf] condor_job_router(_ZN9JobRouter17FinishFinalizeJobEP9RoutedJob+0x1ca)[0x481b28] condor_job_router(_ZN10ExitClient10hookExitedEi+0x130)[0x49c88e] condor_job_router(_ZN13HookClientMgr12reaperOutputEii+0x96)[0x4ba42e] condor_job_router(_ZN10DaemonCore10CallReaperEiPKcii+0x193)[0x4b0051] condor_job_router(_ZN10DaemonCore17HandleProcessExitEii+0x1de)[0x4b0264] condor_job_router(_ZN10DaemonCore24HandleDC_SERVICEWAITPIDSEi+0x3f)[0x4afe85] condor_job_router(_ZN10DaemonCore6DriverEv+0x627)[0x4a58ff] condor_job_router(main+0x1770)[0x4b908a] /lib64/tls/libc.so.6(__libc_start_main+0xdb)[0x32c0d1c40b] condor_job_router(__gxx_personality_v0+0x162)[0x4728fa] Expected results: Additional info:
This is a misconfiguration, MAIL=/usr/bin/mail when mail lives in /bin This will be fixed in 7.2.0-0.9 as a patch to /etc/condor/condor_config
In package condor-7.2.0-0.9.el5.i386.rpm there is still MAIL=/usr/bin/mail in /etc/condor/condor_config
The src/condor_example/condor_config.generic is processed by a perl script (.../customize) when it is installed. That script steps on MAIL, always setting it to /usr/bin/mail on Linux. Talk about annoying. This is fixed for 7.2.0-0.10. Until then, everyone should put MAIL=/bin/mail in their condor_config.local.
Sorry to say that, but it is still the same, MAIL=/usr/bin/mail. Tested on both (just to make sure) condor-7.2.0-0.9.el5.{i386,x86_64}.rpm
Oops, of course I have tested it on both 7.2.0-0.10.el5
Tested on condor-7.2.0-0.11.el5.i386.rpm The MAIL line in /etc/condor/condor_config reads '/bin/mail' Well done!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0036.html