Bug 713996

Summary: pbs_server crashes whenever anything connects
Product: [Fedora] Fedora Reporter: Alex Chernyakhovsky <achernya>
Component: torqueAssignee: Steve Traylen <steve.traylen>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: fotis, steve.traylen
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: torque-3.0.1-4.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-12 18:01:43 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Increase MUNGE_SIZE to avoid smashing stack none

Description Alex Chernyakhovsky 2011-06-16 21:39:39 EDT
Description of problem:
On a stock installation of Fedora 15, torque-server was installed and is attempting to be configured. Every time qmgr touches pbs_server, pbs_server crashes. When run under gdb, the following backtrace is obtained:

(gdb) bt
#0  0x0000003632435285 in raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003632436b9b in abort () at abort.c:93
#2  0x0000003632470b7e in __libc_message (do_abort=2,
    fmt=0x363255afde "*** %s ***: %s terminated\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00000036324f7b87 in __fortify_fail (
    msg=0x363255afc6 "stack smashing detected") at fortify_fail.c:32
#4  0x00000036324f7b50 in __stack_chk_fail () at stack_chk_fail.c:29
#5  0x0000000000421a29 in unmunge_request (s=11, preq=0x11ac920)
    at req_getcred.c:372
#6  0x0000000000421c1a in req_altauthenuser (preq=0x2020202020202020)
    at req_getcred.c:468
#7  0x000000000041e02a in process_request (sfds=11) at process_request.c:713
#8  0x00007ffff7cfcaed in wait_request (waittime=<optimized out>,
    SState=0x72dcf8) at ../Libnet/net_server.c:506
#9  0x000000000041c174 in main_loop () at pbsd_main.c:1226
#10 0x0000000000407452 in main (argc=<optimized out>, argv=<optimized out>)
    at pbsd_main.c:1781

The problem seems to be munge, which had to be run manually as it too crashes when started by systemd

Version-Release number of selected component (if applicable):
torque-server-3.0.1-1.fc15.x86_64
munge-0.5.10-1.fc15.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Start pbs_server
2. qmgr any command to it, e.g., torque.setup script
  
Actual results:
pbs_server crashes, apparently smashing its stack

Expected results:
pbs_server works

Additional info:
Comment 1 Alex Chernyakhovsky 2011-06-16 22:10:08 EDT
Addendum: munge starts fine from systemd and problem persists with pbs_server.
Comment 2 Alex Chernyakhovsky 2011-06-17 09:30:55 EDT
I've spent some time debugging pbs_server, and I think I've found the problem. On line 306 of req_getcred.c, there is a call to "memcpy(ptr, buf, bytes_read);", which results in a buffer overflow: ptr is munge_buf which is only MUNGE_SIZE of 256 bytes, but this loop (on my system) will write 368 bytes, smashing the stack.
Comment 3 Alex Chernyakhovsky 2011-06-17 09:53:11 EDT
#define MUNGE_SIZE 256 /* I do not know what the proper size of this should be. My
                          testing with munge shows it creates a string of 128 bytes */

Yes, but the req_getcred.c code uses MUNGE_SIZE for the output of unmunge, not munge.
Comment 4 Alex Chernyakhovsky 2011-06-17 12:21:27 EDT
Created attachment 505309 [details]
Increase MUNGE_SIZE to avoid smashing stack

Increase MUNGE_SIZE to accommodate potentially large output from unmunge -- this is ~300 bytes on my system, next power of two would be 512, but use 1024 "just in case".
Comment 5 Steve Traylen 2011-06-17 13:18:41 EDT
Hi Alex,

Thanks for the bug and diagnosis.

Upstream bug now here:
 http://www.clusterresources.com/bugzilla/show_bug.cgi?id=136

new packages very shortly.

Steve.
Comment 6 Fedora Update System 2011-06-17 14:18:51 EDT
torque-3.0.1-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15
Comment 7 Fedora Update System 2011-06-21 13:20:31 EDT
Package torque-3.0.1-3.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing torque-3.0.1-3.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15
then log in and leave karma (feedback).
Comment 8 Fedora Update System 2011-06-27 19:52:22 EDT
Package torque-3.0.1-4.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing torque-3.0.1-4.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/torque-3.0.1-4.fc15
then log in and leave karma (feedback).
Comment 9 Fedora Update System 2011-07-12 18:01:27 EDT
torque-3.0.1-4.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.