Bug 713996 - pbs_server crashes whenever anything connects
Summary: pbs_server crashes whenever anything connects
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: torque
Version: 15
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Steve Traylen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-17 01:39 UTC by Alex Chernyakhovsky
Modified: 2011-07-12 22:01 UTC (History)
2 users (show)

Fixed In Version: torque-3.0.1-4.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-12 22:01:43 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Increase MUNGE_SIZE to avoid smashing stack (510 bytes, patch)
2011-06-17 16:21 UTC, Alex Chernyakhovsky
no flags Details | Diff

Description Alex Chernyakhovsky 2011-06-17 01:39:39 UTC
Description of problem:
On a stock installation of Fedora 15, torque-server was installed and is attempting to be configured. Every time qmgr touches pbs_server, pbs_server crashes. When run under gdb, the following backtrace is obtained:

(gdb) bt
#0  0x0000003632435285 in raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003632436b9b in abort () at abort.c:93
#2  0x0000003632470b7e in __libc_message (do_abort=2,
    fmt=0x363255afde "*** %s ***: %s terminated\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00000036324f7b87 in __fortify_fail (
    msg=0x363255afc6 "stack smashing detected") at fortify_fail.c:32
#4  0x00000036324f7b50 in __stack_chk_fail () at stack_chk_fail.c:29
#5  0x0000000000421a29 in unmunge_request (s=11, preq=0x11ac920)
    at req_getcred.c:372
#6  0x0000000000421c1a in req_altauthenuser (preq=0x2020202020202020)
    at req_getcred.c:468
#7  0x000000000041e02a in process_request (sfds=11) at process_request.c:713
#8  0x00007ffff7cfcaed in wait_request (waittime=<optimized out>,
    SState=0x72dcf8) at ../Libnet/net_server.c:506
#9  0x000000000041c174 in main_loop () at pbsd_main.c:1226
#10 0x0000000000407452 in main (argc=<optimized out>, argv=<optimized out>)
    at pbsd_main.c:1781

The problem seems to be munge, which had to be run manually as it too crashes when started by systemd

Version-Release number of selected component (if applicable):
torque-server-3.0.1-1.fc15.x86_64
munge-0.5.10-1.fc15.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Start pbs_server
2. qmgr any command to it, e.g., torque.setup script
  
Actual results:
pbs_server crashes, apparently smashing its stack

Expected results:
pbs_server works

Additional info:

Comment 1 Alex Chernyakhovsky 2011-06-17 02:10:08 UTC
Addendum: munge starts fine from systemd and problem persists with pbs_server.

Comment 2 Alex Chernyakhovsky 2011-06-17 13:30:55 UTC
I've spent some time debugging pbs_server, and I think I've found the problem. On line 306 of req_getcred.c, there is a call to "memcpy(ptr, buf, bytes_read);", which results in a buffer overflow: ptr is munge_buf which is only MUNGE_SIZE of 256 bytes, but this loop (on my system) will write 368 bytes, smashing the stack.

Comment 3 Alex Chernyakhovsky 2011-06-17 13:53:11 UTC
#define MUNGE_SIZE 256 /* I do not know what the proper size of this should be. My
                          testing with munge shows it creates a string of 128 bytes */

Yes, but the req_getcred.c code uses MUNGE_SIZE for the output of unmunge, not munge.

Comment 4 Alex Chernyakhovsky 2011-06-17 16:21:27 UTC
Created attachment 505309 [details]
Increase MUNGE_SIZE to avoid smashing stack

Increase MUNGE_SIZE to accommodate potentially large output from unmunge -- this is ~300 bytes on my system, next power of two would be 512, but use 1024 "just in case".

Comment 5 Steve Traylen 2011-06-17 17:18:41 UTC
Hi Alex,

Thanks for the bug and diagnosis.

Upstream bug now here:
 http://www.clusterresources.com/bugzilla/show_bug.cgi?id=136

new packages very shortly.

Steve.

Comment 6 Fedora Update System 2011-06-17 18:18:51 UTC
torque-3.0.1-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15

Comment 7 Fedora Update System 2011-06-21 17:20:31 UTC
Package torque-3.0.1-3.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing torque-3.0.1-3.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15
then log in and leave karma (feedback).

Comment 8 Fedora Update System 2011-06-27 23:52:22 UTC
Package torque-3.0.1-4.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing torque-3.0.1-4.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/torque-3.0.1-4.fc15
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2011-07-12 22:01:27 UTC
torque-3.0.1-4.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.