Description of problem: On a stock installation of Fedora 15, torque-server was installed and is attempting to be configured. Every time qmgr touches pbs_server, pbs_server crashes. When run under gdb, the following backtrace is obtained: (gdb) bt #0 0x0000003632435285 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003632436b9b in abort () at abort.c:93 #2 0x0000003632470b7e in __libc_message (do_abort=2, fmt=0x363255afde "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x00000036324f7b87 in __fortify_fail ( msg=0x363255afc6 "stack smashing detected") at fortify_fail.c:32 #4 0x00000036324f7b50 in __stack_chk_fail () at stack_chk_fail.c:29 #5 0x0000000000421a29 in unmunge_request (s=11, preq=0x11ac920) at req_getcred.c:372 #6 0x0000000000421c1a in req_altauthenuser (preq=0x2020202020202020) at req_getcred.c:468 #7 0x000000000041e02a in process_request (sfds=11) at process_request.c:713 #8 0x00007ffff7cfcaed in wait_request (waittime=<optimized out>, SState=0x72dcf8) at ../Libnet/net_server.c:506 #9 0x000000000041c174 in main_loop () at pbsd_main.c:1226 #10 0x0000000000407452 in main (argc=<optimized out>, argv=<optimized out>) at pbsd_main.c:1781 The problem seems to be munge, which had to be run manually as it too crashes when started by systemd Version-Release number of selected component (if applicable): torque-server-3.0.1-1.fc15.x86_64 munge-0.5.10-1.fc15.x86_64 How reproducible: Every time Steps to Reproduce: 1. Start pbs_server 2. qmgr any command to it, e.g., torque.setup script Actual results: pbs_server crashes, apparently smashing its stack Expected results: pbs_server works Additional info:
Addendum: munge starts fine from systemd and problem persists with pbs_server.
I've spent some time debugging pbs_server, and I think I've found the problem. On line 306 of req_getcred.c, there is a call to "memcpy(ptr, buf, bytes_read);", which results in a buffer overflow: ptr is munge_buf which is only MUNGE_SIZE of 256 bytes, but this loop (on my system) will write 368 bytes, smashing the stack.
#define MUNGE_SIZE 256 /* I do not know what the proper size of this should be. My testing with munge shows it creates a string of 128 bytes */ Yes, but the req_getcred.c code uses MUNGE_SIZE for the output of unmunge, not munge.
Created attachment 505309 [details] Increase MUNGE_SIZE to avoid smashing stack Increase MUNGE_SIZE to accommodate potentially large output from unmunge -- this is ~300 bytes on my system, next power of two would be 512, but use 1024 "just in case".
Hi Alex, Thanks for the bug and diagnosis. Upstream bug now here: http://www.clusterresources.com/bugzilla/show_bug.cgi?id=136 new packages very shortly. Steve.
torque-3.0.1-3.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15
Package torque-3.0.1-3.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing torque-3.0.1-3.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15 then log in and leave karma (feedback).
Package torque-3.0.1-4.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing torque-3.0.1-4.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/torque-3.0.1-4.fc15 then log in and leave karma (feedback).
torque-3.0.1-4.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.