This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 713996 - pbs_server crashes whenever anything connects
pbs_server crashes whenever anything connects
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: torque (Show other bugs)
15
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Steve Traylen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-16 21:39 EDT by Alex Chernyakhovsky
Modified: 2011-07-12 18:01 EDT (History)
2 users (show)

See Also:
Fixed In Version: torque-3.0.1-4.fc15
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-12 18:01:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Increase MUNGE_SIZE to avoid smashing stack (510 bytes, patch)
2011-06-17 12:21 EDT, Alex Chernyakhovsky
no flags Details | Diff

  None (edit)
Description Alex Chernyakhovsky 2011-06-16 21:39:39 EDT
Description of problem:
On a stock installation of Fedora 15, torque-server was installed and is attempting to be configured. Every time qmgr touches pbs_server, pbs_server crashes. When run under gdb, the following backtrace is obtained:

(gdb) bt
#0  0x0000003632435285 in raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003632436b9b in abort () at abort.c:93
#2  0x0000003632470b7e in __libc_message (do_abort=2,
    fmt=0x363255afde "*** %s ***: %s terminated\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00000036324f7b87 in __fortify_fail (
    msg=0x363255afc6 "stack smashing detected") at fortify_fail.c:32
#4  0x00000036324f7b50 in __stack_chk_fail () at stack_chk_fail.c:29
#5  0x0000000000421a29 in unmunge_request (s=11, preq=0x11ac920)
    at req_getcred.c:372
#6  0x0000000000421c1a in req_altauthenuser (preq=0x2020202020202020)
    at req_getcred.c:468
#7  0x000000000041e02a in process_request (sfds=11) at process_request.c:713
#8  0x00007ffff7cfcaed in wait_request (waittime=<optimized out>,
    SState=0x72dcf8) at ../Libnet/net_server.c:506
#9  0x000000000041c174 in main_loop () at pbsd_main.c:1226
#10 0x0000000000407452 in main (argc=<optimized out>, argv=<optimized out>)
    at pbsd_main.c:1781

The problem seems to be munge, which had to be run manually as it too crashes when started by systemd

Version-Release number of selected component (if applicable):
torque-server-3.0.1-1.fc15.x86_64
munge-0.5.10-1.fc15.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Start pbs_server
2. qmgr any command to it, e.g., torque.setup script
  
Actual results:
pbs_server crashes, apparently smashing its stack

Expected results:
pbs_server works

Additional info:
Comment 1 Alex Chernyakhovsky 2011-06-16 22:10:08 EDT
Addendum: munge starts fine from systemd and problem persists with pbs_server.
Comment 2 Alex Chernyakhovsky 2011-06-17 09:30:55 EDT
I've spent some time debugging pbs_server, and I think I've found the problem. On line 306 of req_getcred.c, there is a call to "memcpy(ptr, buf, bytes_read);", which results in a buffer overflow: ptr is munge_buf which is only MUNGE_SIZE of 256 bytes, but this loop (on my system) will write 368 bytes, smashing the stack.
Comment 3 Alex Chernyakhovsky 2011-06-17 09:53:11 EDT
#define MUNGE_SIZE 256 /* I do not know what the proper size of this should be. My
                          testing with munge shows it creates a string of 128 bytes */

Yes, but the req_getcred.c code uses MUNGE_SIZE for the output of unmunge, not munge.
Comment 4 Alex Chernyakhovsky 2011-06-17 12:21:27 EDT
Created attachment 505309 [details]
Increase MUNGE_SIZE to avoid smashing stack

Increase MUNGE_SIZE to accommodate potentially large output from unmunge -- this is ~300 bytes on my system, next power of two would be 512, but use 1024 "just in case".
Comment 5 Steve Traylen 2011-06-17 13:18:41 EDT
Hi Alex,

Thanks for the bug and diagnosis.

Upstream bug now here:
 http://www.clusterresources.com/bugzilla/show_bug.cgi?id=136

new packages very shortly.

Steve.
Comment 6 Fedora Update System 2011-06-17 14:18:51 EDT
torque-3.0.1-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15
Comment 7 Fedora Update System 2011-06-21 13:20:31 EDT
Package torque-3.0.1-3.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing torque-3.0.1-3.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/torque-3.0.1-3.fc15
then log in and leave karma (feedback).
Comment 8 Fedora Update System 2011-06-27 19:52:22 EDT
Package torque-3.0.1-4.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing torque-3.0.1-4.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/torque-3.0.1-4.fc15
then log in and leave karma (feedback).
Comment 9 Fedora Update System 2011-07-12 18:01:27 EDT
torque-3.0.1-4.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.