Description of problem: there is a problem with the combination of subversion and apache where the checkout or update of a project fails with an error when large volumes of data are transferred. Basically, an error like this occurs: REPORT request failed on '/svn/srt/!svn/vcc/default' REPORT of '/svn/srt/!svn/vcc/default': Could not read chunk size: connection was closed Version-Release number of selected component (if applicable): Apache/2.2.2 svn, version 1.3.2 How reproducible: somewhat reproducable. Steps to Reproduce: when a bunch of large files (multi MB) are checked into a repository and then you proceed to get them using a subversion client via checkout or update (slower network connection exacerbates the problem, i.e. wireless), it will regularly fail. This happens on clients for windows, mac, and linux (fc5) in the same manner, so the server side is the suspect here. Apache with mod_dav and subversion are the server side components. Actual results: svn: REPORT request failed on '/svn/srt/!svn/vcc/default' svn: REPORT of '/svn/srt/!svn/vcc/default': Could not read chunk size: connection was closed by server. (other similar error messages appeared with the 'Could not read chunk size' message) Expected results: finished the checkout/update without errors Additional info: This is described on the tigris site as well here: http://subversion.tigris.org/faq.html#tiger-apr-0.9.6 Suggested solution: (quoted from the tigris faq above for completeness) Currently, the best solution is to rebuild Apache and Subversion from their respective sources, setting the following environment variable before running configure for Apache: setenv ac_cv_func_poll no or in Bourne shell syntax, like this: ac_cv_func_poll=no; export ac_cv_func_poll If you built APR / APRUTIL separately (i.e., you did not use the ones that come as part of the Apache tarball), you must set that environment variable before running configure for APR, as this is where the problem lies.
A connection truncation like this is generally caused by the server doing something wrong. Do you have a reliable reproduction case for the error? The error_log will likely contain a mod_dav error (or a segfault) around where the report was truncated. Can you provide error_log details?
I am able to reproduce this on our internal server quite easily. I ran an update on my project which takes several minutes to do, and it caused the error 2 times during 2 sequential updates. I show both error messages below. This is interesting, too, as it indicates php as a source of the error/failure (although I don't know why php would be involked for subversion update). [I was the only person using the server at the time.] from the error_log: PHP Fatal error: Maximum execution time of 30 seconds exceeded in Unknown on line 0 [Tue Aug 29 14:37:59 2006] [notice] child pid 20552 exit signal Segmentation fault (11) PHP Fatal error: Maximum execution time of 30 seconds exceeded in Unknown on line 0 [Tue Aug 29 14:39:08 2006] [notice] child pid 4570 exit signal Segmentation fault (11) I suspect I could increase the php timeout to hide the problem. ;-)
That's pretty odd. Can you capture a backtrace? Please do: yum install httpd-debuginfo subversion-debuginfo php-debuginfo echo CoreDumpDirectory /tmp > /etc/httpd/conf.d/coredump.conf service httpd restart <... trigger segfault...> gdb /usr/sbin/httpd /tmp/core.<pid>
I installed: yum install httpd-devel subversion-devel php-devel and followed the rest of your instructions, however, it seems to have changed its frequency of occurance. (haven't yet had it fail), so the *devel libraries have affected this problem (in a positive way?). If/when it does fail again I will follow up on this.
It's the *debuginfo* packages not -devel - these will simply give a better backtrace when you can get a core dump so will aid diagnosis, not fix the issue.
Created attachment 135615 [details] debug output from failures running subversion This is the requested debug information when the system fails during a subversion update, commit or other functions.
Thanks very much, this is very useful. It seems like the PHP alarm signal is not getting reset after a previous request in this child. Can you check your access_log and see what PHP requests had been processed in the seconds/minutes prior up to the segfault? And give me an idea of what these PHP scripts are doing? Are there any abnormal PHP errors logged in error_log prior to the "Maximum execution time" error?
You're welcome. There aren't many errors and it is hard to know if they are related. I see these type of errors before some of the failures - this pair occurred 5 minutes before pid=21010 error (from the error log): [Tue Sep 05 10:53:49 2006] [error] [client 192.168.1.100] Provider encountered an error while streaming a REPORT response. [500, #0] [Tue Sep 05 10:53:49 2006] [error] [client 192.168.1.100] A failure occurred while driving the update report editor [500, #190004] The PHP code that was called prior was 6 minutes earlier (from the access_log). 192.168.1.1 - name [05/Sep/2006:10:51:37 -0600] "GET /bugs/view_all_bug_page.php?page_number=1 HTTP/1.1" 200 27162 192.168.1.1 - name [05/Sep/2006:10:51:38 -0600] "GET /bugs/images/mantis_logo_button.gif HTTP/1.1" 304 - and the activitity just before the crash was like this: 192.168.1.1 - - [05/Sep/2006:10:58:48 -0600] "PROPFIND /svn/srt/branches/version-0.6 HTTP/1.1" 401 491 192.168.1.1 - name [05/Sep/2006:10:58:48 -0600] "PROPFIND /svn/srt/branches/version-0.6 HTTP/1.1" 207 714 192.168.1.1 - name [05/Sep/2006:10:58:48 -0600] "PROPFIND /svn/srt/!svn/vcc/default HTTP/1.1" 207 399 192.168.1.1 - name [05/Sep/2006:10:58:48 -0600] "PROPFIND /svn/srt/!svn/bln/3387 HTTP/1.1" 207 456 192.168.1.1 - name [05/Sep/2006:10:58:48 -0600] "PROPFIND /svn/srt/branches/version-0.6 HTTP/1.1" 207 714 192.168.1.1 - name [05/Sep/2006:10:58:48 -0600] "REPORT /svn/srt/!svn/vcc/default HTTP/1.1" 200 260372 192.168.1.100 - - [05/Sep/2006:10:58:50 -0600] "PROPFIND /svn/srt/branches/version-0.6 HTTP/1.1" 401 479 192.168.1.100 - name2 [05/Sep/2006:10:58:50 -0600] "PROPFIND /svn/srt/branches/version-0.6 HTTP/1.1" 207 714 192.168.1.100 - name2 [05/Sep/2006:10:58:50 -0600] "PROPFIND /svn/srt/!svn/vcc/default HTTP/1.1" 207 399 192.168.1.100 - name2 [05/Sep/2006:10:58:50 -0600] "PROPFIND /svn/srt/!svn/bln/3387 HTTP/1.1" 207 456 192.168.1.100 - name2 [05/Sep/2006:10:58:50 -0600] "PROPFIND /svn/srt/branches/version-0.6 HTTP/1.1" 207 714
Thanks. Reproducing this will be very difficult. Would you be willing to try running this diagnostic module for a while to help track this down further? http://people.apache.org/~jorton/mod_traptimer.c To use: download the C file, and run "apxs -cia mod_traptimer.c" then restart httpd. This will abort() a child process if PHP leaves its timer set after the request has been completed, which might help narrow the issue down. With this loaded you should get errors like; "profiling timer still set for /blah" in the error_log, and more core dumps.
Created attachment 136212 [details] backtrace for pid 11731 Got an error, and this is it: PHP Fatal error: Maximum execution time of 30 seconds exceeded in Unknown on line 0 [Wed Sep 13 15:01:51 2006] [notice] child pid 11731 exit signal Segmentation fault (11), possible coredump in /tmp debug output from failure attached
Created attachment 136275 [details] additional backtrace an additional backtrace (for pid 11732) I'm not seeing the additional information regarding the errors in the logs. Should I be doing something more than just restarting apache?
No - can you just check you have an uncommented "LoadModule" line for mod_traptimer in the httpd.conf?
Created attachment 137422 [details] backtrace I'm getting all the time Hi, I was waiting for the original problem to surface and it seems that it is being interfered with by the "mod_traptimer" code. What happens is that I am getting a large number of core dumps with the "mod_traptimer" activated (It's generating an error related to an "rv" variable.) I haven't witnessed the originally reported problem but it kind-of makes sense if the current httpd thread is being closed early and regularly. I don't quite see why the rv is not being accessed. Perhaps you have some insight into this. :-)
That's mod_traptimer catching exactly the bad state we're looking for! Can you bzip2-up and attach the core file from one of those segfaults? (the "rv" thing is just gdb being confused, ignore that)
Created attachment 137568 [details] core dump with the 'rv' error found Hi, you're insight is quite a bit better! ;-) Cool... I've attached a core dump example. (I have many and it is rather significant in size: 1.2MB) I hope this is the bees knees!
Created attachment 139369 [details] Another core dump An additional core dump that might help in sorting through this. Let me know if you need anything.
Updated from FC5 to FC6 - problem has been resolved.
Oops, I replied too soon. core dumps are still happening in FC6 - just didn't see anything until larger xfers I guess.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.