Bug 620856

Summary: Large POST may cause loop in mod_nss
Product: [Fedora] Fedora Reporter: Ade Lee <alee>
Component: mod_nssAssignee: Rob Crittenden <rcritten>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 13CC: alee, jgalipea, jmagne, rcritten
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 634685 634687 (view as bug list) Environment:
Last Closed: 2011-06-01 15:36:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 445047, 634685, 634687    
Attachments:
Description Flags
gdb traces and some valgrind info
none
test module none

Description Ade Lee 2010-08-03 16:23:59 UTC
Description of problem:

After editing a large profile like userKey - and saving the changes multiple times - a couple of the threads appear to be spinning waiting on a mutex and the cpu usage goes through the roof.  

We need to fix this.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Ade Lee 2010-08-26 15:58:12 UTC
Created attachment 441265 [details]
gdb traces and some valgrind info

Comment 3 Ade Lee 2010-08-26 21:04:45 UTC
Some more findings:

As I follow through using gdb, I see that after one post, we have a thread that is looping within mod_nss.

in ./mod_nss-1.0.8/nss_engine_io.c

343             if (!inctx->filter_ctx->pssl) { 
351             rc = PR_Read(inctx->filter_ctx->pssl, buf + bytes, wanted - bytes);
353             if (rc > 0) {
361             else if (rc == 0) {
390                 int nss_err = PR_GetError();
392                 if (nss_err == PR_WOULD_BLOCK_ERROR) {
403                     if (*len > 0) {
401                     inctx->rc = APR_EAGAIN;
403                     if (*len > 0) {
407                     if (inctx->block == APR_NONBLOCK_READ) {
343             if (!inctx->filter_ctx->pssl) {

so .. 
!inctx->filter_ctx->pssl == 0
rc = -1 (from PR_Read)
nss_err = PR_WOULD_BLOCK_ERROR
*len =0
inctx->block = APR_BLOCK_READ
continue ..

this is the culprit ..

Comment 4 Rob Crittenden 2010-08-26 21:59:18 UTC
Trying to narrow down the scope of this. It seems like the brigade is confused. It thinks there is more data when perhaps there isn't and so it waits forever.

I tested both threaded and forked Apache with a simple echoing CGI with 15k or so of data and that works ok:

#!/usr/bin/perl
binmode(STDOUT);
binmode(STDIN);
print "Content-Type: text/plain\r\n";
print "\r\n";

while (<>) {
    print $_;
}

exit (0);

$ curl -kv --negotiate -u : https://server.example.com/cgi-bin/echo.pl -d @in > out
$ diff -u in out

(this is a web server configured in an IPA install hence the Kerberos negotiation)

Comment 5 Ade Lee 2010-09-02 21:20:54 UTC
Created attachment 442731 [details]
test module

I've done some investigation and found that changing the mechanism by which the posted data is read to use buckets and brigades does not affect the behavior.
rcrit: 

This really is a problem in mod_nss.

I have created a test module which just basically reads in some posted data and echoes it back.  This uses the old method -- although I can create one using brigades if needed.

To compile, use the following:

apxs -c mod_helloworld.c

To install (as root):

apxs -i mod_helloworld.c

Comment 7 Rob Crittenden 2010-09-16 03:42:27 UTC
The problem was reproducable on F-13 with the latest dogtag bits. Ade set up a demonstration server for me. It is rather complex, the sample module doesn't actually show the problem.

I think I have it nailed down though I haven't done thorough regression testing yet. I think this patch does the trick:

diff -u -r1.10 nss_engine_io.c
--- nss_engine_io.c     11 Nov 2009 18:20:39 -0000      1.10
+++ nss_engine_io.c     16 Sep 2010 03:37:24 -0000
@@ -259,7 +259,8 @@
          */
         if (APR_STATUS_IS_EAGAIN(inctx->rc) || APR_STATUS_IS_EINTR(inctx->rc)
                || (inctx->rc == APR_SUCCESS && APR_BRIGADE_EMPTY(inctx->bb))) {
-            PR_SetError(PR_WOULD_BLOCK_ERROR, 0);
+            nspr_filter_out_ctx_t *outctx = filter_ctx->outctx;
+            inctx->rc = outctx->rc;
             return -1;
         }

The PR_SetError() code was added relatively recently to fix SSL reverse proxy over SSL. I did a quick smoke test to ensure that it still worked but I need to look at it more closely.

Ade, this works for me on the instance you set up. Can you hammer it a bit doing some more realistic operations to see if it falls over? I did a bunch of updates as you described (with ~22k content-lengths) and saw no sustained > 100% CPU usage on httpd.worker. I left my updated mod_nss installed.

Comment 8 Rob Crittenden 2010-09-16 15:24:05 UTC
Change subject from "editing large profiles using UI results in loop in mod_nss" to "Large POST may cause loop in mod_nss"

Comment 9 Rob Crittenden 2010-09-17 19:40:13 UTC
committed upstream:

Checking in nss_engine_io.c;
/cvs/dirsec/mod_nss/nss_engine_io.c,v  <--  nss_engine_io.c
new revision: 1.11; previous revision: 1.10
done

Comment 10 Fedora Update System 2010-09-20 14:33:55 UTC
mod_nss-1.0.8-7.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/mod_nss-1.0.8-7.fc13

Comment 11 Fedora Update System 2010-09-20 14:34:07 UTC
mod_nss-1.0.8-7.fc12 has been submitted as an update for Fedora 12.
https://admin.fedoraproject.org/updates/mod_nss-1.0.8-7.fc12

Comment 12 Fedora Update System 2010-09-20 14:34:16 UTC
mod_nss-1.0.8-8.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/mod_nss-1.0.8-8.fc14

Comment 13 Fedora Update System 2010-09-20 18:40:42 UTC
mod_nss-1.0.8-8.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mod_nss'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/mod_nss-1.0.8-8.fc14

Comment 14 Rob Crittenden 2010-09-24 02:08:23 UTC
My first fix was bogus, reverted. Pulled and deleted the bohdi updates.

Seems that the NSPR error wasn't being cleared so was always being returned. I reset the error to 0 before calling PR_Read() and that seems to have really
fixed it.

Checking in nss_engine_io.c;
/cvs/dirsec/mod_nss/nss_engine_io.c,v  <--  nss_engine_io.c
new revision: 1.12; previous revision: 1.11
done

Comment 15 Bug Zapper 2011-06-01 12:12:03 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 16 Rob Crittenden 2011-06-01 15:36:33 UTC
Fixed in mod_nss-1.0.8-9