Description of problem: After editing a large profile like userKey - and saving the changes multiple times - a couple of the threads appear to be spinning waiting on a mutex and the cpu usage goes through the roof. We need to fix this. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 441265 [details] gdb traces and some valgrind info
Some more findings: As I follow through using gdb, I see that after one post, we have a thread that is looping within mod_nss. in ./mod_nss-1.0.8/nss_engine_io.c 343 if (!inctx->filter_ctx->pssl) { 351 rc = PR_Read(inctx->filter_ctx->pssl, buf + bytes, wanted - bytes); 353 if (rc > 0) { 361 else if (rc == 0) { 390 int nss_err = PR_GetError(); 392 if (nss_err == PR_WOULD_BLOCK_ERROR) { 403 if (*len > 0) { 401 inctx->rc = APR_EAGAIN; 403 if (*len > 0) { 407 if (inctx->block == APR_NONBLOCK_READ) { 343 if (!inctx->filter_ctx->pssl) { so .. !inctx->filter_ctx->pssl == 0 rc = -1 (from PR_Read) nss_err = PR_WOULD_BLOCK_ERROR *len =0 inctx->block = APR_BLOCK_READ continue .. this is the culprit ..
Trying to narrow down the scope of this. It seems like the brigade is confused. It thinks there is more data when perhaps there isn't and so it waits forever. I tested both threaded and forked Apache with a simple echoing CGI with 15k or so of data and that works ok: #!/usr/bin/perl binmode(STDOUT); binmode(STDIN); print "Content-Type: text/plain\r\n"; print "\r\n"; while (<>) { print $_; } exit (0); $ curl -kv --negotiate -u : https://server.example.com/cgi-bin/echo.pl -d @in > out $ diff -u in out (this is a web server configured in an IPA install hence the Kerberos negotiation)
Created attachment 442731 [details] test module I've done some investigation and found that changing the mechanism by which the posted data is read to use buckets and brigades does not affect the behavior. rcrit: This really is a problem in mod_nss. I have created a test module which just basically reads in some posted data and echoes it back. This uses the old method -- although I can create one using brigades if needed. To compile, use the following: apxs -c mod_helloworld.c To install (as root): apxs -i mod_helloworld.c
The problem was reproducable on F-13 with the latest dogtag bits. Ade set up a demonstration server for me. It is rather complex, the sample module doesn't actually show the problem. I think I have it nailed down though I haven't done thorough regression testing yet. I think this patch does the trick: diff -u -r1.10 nss_engine_io.c --- nss_engine_io.c 11 Nov 2009 18:20:39 -0000 1.10 +++ nss_engine_io.c 16 Sep 2010 03:37:24 -0000 @@ -259,7 +259,8 @@ */ if (APR_STATUS_IS_EAGAIN(inctx->rc) || APR_STATUS_IS_EINTR(inctx->rc) || (inctx->rc == APR_SUCCESS && APR_BRIGADE_EMPTY(inctx->bb))) { - PR_SetError(PR_WOULD_BLOCK_ERROR, 0); + nspr_filter_out_ctx_t *outctx = filter_ctx->outctx; + inctx->rc = outctx->rc; return -1; } The PR_SetError() code was added relatively recently to fix SSL reverse proxy over SSL. I did a quick smoke test to ensure that it still worked but I need to look at it more closely. Ade, this works for me on the instance you set up. Can you hammer it a bit doing some more realistic operations to see if it falls over? I did a bunch of updates as you described (with ~22k content-lengths) and saw no sustained > 100% CPU usage on httpd.worker. I left my updated mod_nss installed.
Change subject from "editing large profiles using UI results in loop in mod_nss" to "Large POST may cause loop in mod_nss"
committed upstream: Checking in nss_engine_io.c; /cvs/dirsec/mod_nss/nss_engine_io.c,v <-- nss_engine_io.c new revision: 1.11; previous revision: 1.10 done
mod_nss-1.0.8-7.fc13 has been submitted as an update for Fedora 13. https://admin.fedoraproject.org/updates/mod_nss-1.0.8-7.fc13
mod_nss-1.0.8-7.fc12 has been submitted as an update for Fedora 12. https://admin.fedoraproject.org/updates/mod_nss-1.0.8-7.fc12
mod_nss-1.0.8-8.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/mod_nss-1.0.8-8.fc14
mod_nss-1.0.8-8.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mod_nss'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/mod_nss-1.0.8-8.fc14
My first fix was bogus, reverted. Pulled and deleted the bohdi updates. Seems that the NSPR error wasn't being cleared so was always being returned. I reset the error to 0 before calling PR_Read() and that seems to have really fixed it. Checking in nss_engine_io.c; /cvs/dirsec/mod_nss/nss_engine_io.c,v <-- nss_engine_io.c new revision: 1.12; previous revision: 1.11 done
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fixed in mod_nss-1.0.8-9