Bug 713996
| Summary: | pbs_server crashes whenever anything connects | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Alex Chernyakhovsky <achernya> | ||||
| Component: | torque | Assignee: | Steve Traylen <steve.traylen> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 15 | CC: | fotis, steve.traylen | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | torque-3.0.1-4.fc15 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-07-12 22:01:43 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Addendum: munge starts fine from systemd and problem persists with pbs_server. I've spent some time debugging pbs_server, and I think I've found the problem. On line 306 of req_getcred.c, there is a call to "memcpy(ptr, buf, bytes_read);", which results in a buffer overflow: ptr is munge_buf which is only MUNGE_SIZE of 256 bytes, but this loop (on my system) will write 368 bytes, smashing the stack. #define MUNGE_SIZE 256 /* I do not know what the proper size of this should be. My
testing with munge shows it creates a string of 128 bytes */
Yes, but the req_getcred.c code uses MUNGE_SIZE for the output of unmunge, not munge.
Created attachment 505309 [details]
Increase MUNGE_SIZE to avoid smashing stack
Increase MUNGE_SIZE to accommodate potentially large output from unmunge -- this is ~300 bytes on my system, next power of two would be 512, but use 1024 "just in case".
Hi Alex, Thanks for the bug and diagnosis. Upstream bug now here: http://www.clusterresources.com/bugzilla/show_bug.cgi?id=136 new packages very shortly. Steve. torque-3.0.1-3.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15 Package torque-3.0.1-3.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing torque-3.0.1-3.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15 then log in and leave karma (feedback). Package torque-3.0.1-4.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing torque-3.0.1-4.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/torque-3.0.1-4.fc15 then log in and leave karma (feedback). torque-3.0.1-4.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. |
Description of problem: On a stock installation of Fedora 15, torque-server was installed and is attempting to be configured. Every time qmgr touches pbs_server, pbs_server crashes. When run under gdb, the following backtrace is obtained: (gdb) bt #0 0x0000003632435285 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003632436b9b in abort () at abort.c:93 #2 0x0000003632470b7e in __libc_message (do_abort=2, fmt=0x363255afde "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x00000036324f7b87 in __fortify_fail ( msg=0x363255afc6 "stack smashing detected") at fortify_fail.c:32 #4 0x00000036324f7b50 in __stack_chk_fail () at stack_chk_fail.c:29 #5 0x0000000000421a29 in unmunge_request (s=11, preq=0x11ac920) at req_getcred.c:372 #6 0x0000000000421c1a in req_altauthenuser (preq=0x2020202020202020) at req_getcred.c:468 #7 0x000000000041e02a in process_request (sfds=11) at process_request.c:713 #8 0x00007ffff7cfcaed in wait_request (waittime=<optimized out>, SState=0x72dcf8) at ../Libnet/net_server.c:506 #9 0x000000000041c174 in main_loop () at pbsd_main.c:1226 #10 0x0000000000407452 in main (argc=<optimized out>, argv=<optimized out>) at pbsd_main.c:1781 The problem seems to be munge, which had to be run manually as it too crashes when started by systemd Version-Release number of selected component (if applicable): torque-server-3.0.1-1.fc15.x86_64 munge-0.5.10-1.fc15.x86_64 How reproducible: Every time Steps to Reproduce: 1. Start pbs_server 2. qmgr any command to it, e.g., torque.setup script Actual results: pbs_server crashes, apparently smashing its stack Expected results: pbs_server works Additional info: