Bug 899402 (JBEWS-83)
| Summary: | Failover triggers 500 Internal Server Error, 503 Service Temporarily Unavailable | ||
|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Web Server 1 | Reporter: | Michal Karm Babacek <mbabacek> |
| Component: | unspecified | Assignee: | Jean-Frederic Clere <jfclere> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | EWS 1.0.2 | CC: | ddoyle, jfclere, mbabacek, rajesh.rajasekaran |
| Target Milestone: | --- | ||
| Target Release: | TBD EWS | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | http://jira.jboss.org/jira/browse/JBEWS-83 | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
rhel, solaris
|
|
| Last Closed: | 2011-05-26 16:10:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Michal Karm Babacek
2011-05-04 08:13:48 UTC
Link: Added: This issue Cloned from JBPAPP-3614 Writer: Removed: mistysj Added: mbabacek Affects: Removed: [Documentation (Ref Guide, User Guide, etc.)] Release Notes Docs Status: Removed: Documented as Known Issue Release Notes Text: Removed: Due to a race condition caused by a bug in mod_proxy, a <literal>503</literal> error may be returned when a node is killed, instead of failing over to another node. Log from the driver node perf17. Attachment: Added: 8-eap-5x-mod_cluster-rhel6-x86_64-failover-log.zip Attachment: Removed: 8-eap-5x-mod_cluster-rhel6-x86_64-failover-log.zip Load driver (client) log from perf17. Attachment: Added: 9-eap-5x-mod_cluster-rhel6-x86_64-failover-load-driver.zip Attachment: Added: 9-eap-5x-mod_cluster-rhel6-x86_64-failover-access_log.zip Attachment: Added: 9-eap-5x-mod_cluster-rhel6-x86_64-failover-error_log.zip [jfclere@jfcpc TMP]$ grep 503 access_log 10.16.90.52 - - [04/May/2011:04:45:41 -0400] "GET /st/SessionTestServlet?invalidate=true HTTP/1.1" 503 399 "-" "Jakarta Commons-HttpClient/3.1" 10.16.90.52 - - [04/May/2011:04:51:05 -0400] "GET /st/SessionTestServlet?invalidate=true HTTP/1.1" 503 399 "-" "Jakarta Commons-HttpClient/3.1" [jfclere@jfcpc TMP]$ wc -l access_log 367040 access_log Hm that is NOT 1% @Jean-Frederic Clere: Correction: Around 0.0005449% of sessions result in 503. :-) I can't find any clues in the error_log... The error comes from httpd I would be need a debug of httpd to find what goes wrong. I've encountered this behaviour in several other scenarios on some other platforms as well, however I haven't been able to reproduce it deterministically. I'm gonna provide additional data as soon as I figure it out. You may download these results: *Httpd configuration:* http://scratch.englab.brq.redhat.com/mbabacek/34-eap-5x-mod_cluster-rhel6-i386-failover-conf.zip *Httpd logs:* http://scratch.englab.brq.redhat.com/mbabacek/34-eap-5x-mod_cluster-rhel6-i386-failover-logs.zip *Client's log:* http://scratch.englab.brq.redhat.com/mbabacek/34-eap-5x-mod_cluster-rhel6-i386-failover-console.zip *The actual run link:* http://hudson.qa.jboss.com/hudson/job/eap-5x-mod_cluster-rhel6-i386-failover/34/ *The client's log shows the following types of errors:* {noformat} Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: Invalid response code: 500 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: Invalid response code: 503 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.BindException: Address already in use> Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: Stale session data received. Expected 10, received 0, Runner: 1034> {noformat} *Httpd error_log contains these types of errors:* {noformat} [error] (104)Connection reset by peer: ajp_ilink_receive() can't receive header [error] (111)Connection refused: proxy: ajp: attempt to connect to 10.16.88.179:8009 (10.16.88.179) failed [error] (111)Connection refused: proxy: AJP: attempt to connect to 10.16.88.179:8009 (10.16.88.179) failed [error] (111)Connection refused: proxy: ajp: attempt to connect to 10.16.88.180:8009 (10.16.88.180) failed [error] (111)Connection refused: proxy: AJP: attempt to connect to 10.16.88.180:8009 (10.16.88.180) failed [error] (111)Connection refused: proxy: ajp: attempt to connect to 10.16.88.181:8009 (10.16.88.181) failed [error] (111)Connection refused: proxy: AJP: attempt to connect to 10.16.88.181:8009 (10.16.88.181) failed [error] (111)Connection refused: proxy: ajp: attempt to connect to 10.16.88.182:8009 (10.16.88.182) failed [error] (120006)APR does not understand this error code: proxy: AJP: cping/cpong failed to 10.16.88.179:8009 (10.16.88.179) [error] (120006)APR does not understand this error code: proxy: AJP: cping/cpong failed to 10.16.88.181:8009 (10.16.88.181) [error] (120006)APR does not understand this error code: proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed [error] (70014)End of file found: ajp_ilink_receive() can't receive header [error] ajp_cping_cpong: apr_socket_recv failed [error] ajp_handle_cping_cpong: ajp_ilink_receive failed [error] ajp_read_header: ajp_ilink_receive failed [error] ap_proxy_connect_backend disabling worker for (10.16.88.179) [error] ap_proxy_connect_backend disabling worker for (10.16.88.180) [error] ap_proxy_connect_backend disabling worker for (10.16.88.181) [error] ap_proxy_connect_backend disabling worker for (10.16.88.182) [error] proxy: ajp: disabled connection for (10.16.88.179) [error] proxy: AJP: disabled connection for (10.16.88.179) [error] proxy: AJP: failed to make connection to backend: 10.16.88.179 [error] proxy: AJP: failed to make connection to backend: 10.16.88.180 [error] proxy: AJP: failed to make connection to backend: 10.16.88.181 [error] proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed {noformat} The test was a failover scenario, so "failed to make connection" are related to that... [jfclere@jfcpc TMP]$ grep 500 access_log 10.16.88.178 - - [09/May/2011:10:00:54 -0400] "GET /st/SessionTestServlet? HTTP/1.1" 500 609 "-" "Jakarta Commons-HttpClient/3.1" 10.16.88.178 - - [09/May/2011:10:01:01 -0400] "GET /st/SessionTestServlet? HTTP/1.1" 500 609 "-" "Jakarta Commons-HttpClient/3.1" 10.16.88.178 - - [09/May/2011:10:05:23 -0400] "GET /st/SessionTestServlet? HTTP/1.1" 500 609 "-" "Jakarta Commons-HttpClient/3.1" [jfclere@jfcpc TMP]$ grep 503 access_log 10.16.88.178 - - [09/May/2011:10:02:40 -0400] "GET /st/SessionTestServlet? HTTP/1.1" 503 398 "-" "Jakarta Commons-HttpClient/3.1" the 500 correspond to something like: +++ [Mon May 09 10:05:23 2011] [debug] proxy_util.c(2011): proxy: AJP: has acquired connection for (10.16.88.179) [Mon May 09 10:05:23 2011] [debug] proxy_util.c(2067): proxy: connecting ajp://10.16.88.179:8009/st/SessionTestServlet? to 10.16.88.179:8009 [Mon May 09 10:05:23 2011] [debug] proxy_util.c(2193): proxy: connected /st/SessionTestServlet? to 10.16.88.179:8009 [Mon May 09 10:05:23 2011] [debug] ajp_utils.c(31): Into ajp_handle_cping_cpong [Mon May 09 10:05:23 2011] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done [Mon May 09 10:05:23 2011] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb [Mon May 09 10:05:23 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [User-Agent] = [Jakarta Commons-HttpClient/3.1] [Mon May 09 10:05:23 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [Host] = [dev89:8888] [Mon May 09 10:05:23 2011] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Cookie] = [$Version=0; JSESSIONID=jBhmXOxp9S191fthutFcNw__.node1; $Path=/st] [Mon May 09 10:05:23 2011] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done [Mon May 09 10:05:23 2011] [debug] mod_proxy_ajp.c(265): proxy: APR_BUCKET_IS_EOS [Mon May 09 10:05:23 2011] [debug] mod_proxy_ajp.c(270): proxy: data to read (max 8186 at 4) [Mon May 09 10:05:23 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data [Mon May 09 10:05:23 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 09 [Mon May 09 10:05:23 2011] [debug] ajp_header.c(697): ajp_parse_type: got 09 [Mon May 09 10:05:23 2011] [debug] mod_proxy_ajp.c(546): proxy: Processing of request failed backend: 1, output: 0 [Mon May 09 10:05:23 2011] [error] proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed +++ the backend seems to send a CPONG instead the headers. I've just verified that the same behaviour takes place on Solaris 10 Sparc64 as well: *Logs* - [31-eap-5x-mod_cluster-solaris10-sparc64-access_log.zip|http://scratch.englab.brq.redhat.com/mbabacek/31-eap-5x-mod_cluster-solaris10-sparc64-access_log.zip] - [31-eap-5x-mod_cluster-solaris10-sparc64-error_log.zip|http://scratch.englab.brq.redhat.com/mbabacek/31-eap-5x-mod_cluster-solaris10-sparc64-error_log.zip] *Client's errors* {noformat} 2011/05/17 11:54:40:794 EDT [WARN ][Runner - 20] Invalid response code: 500 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2011/05/17 11:54:46:010 EDT [WARN ][Runner - 194] Invalid response code: 500 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2011/05/17 12:02:24:789 EDT [WARN ][Runner - 225] Invalid response code: 503 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2011/05/17 12:02:24:819 EDT [WARN ][Runner - 225] SFCORE_LOG - Error sampling data: <Invalid response code: 503 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2011/05/17 12:02:24:820 EDT [WARN ][Runner - 96] Invalid response code: 503 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2011/05/17 12:02:24:821 EDT [WARN ][Runner - 96] SFCORE_LOG - Error sampling data: <Invalid response code: 503 Content: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> {noformat} *Error_log digest* {noformat} [hudson@dev89 logs]$ cat error_log | grep -A4 -B3 "failed backend" [Mon May 09 10:00:54 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data [Mon May 09 10:00:54 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 09 [Mon May 09 10:00:54 2011] [debug] ajp_header.c(697): ajp_parse_type: got 09 [Mon May 09 10:00:54 2011] [debug] mod_proxy_ajp.c(546): proxy: Processing of request failed backend: 1, output: 0 [Mon May 09 10:00:54 2011] [error] proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed [Mon May 09 10:00:54 2011] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (10.16.88.179) [Mon May 09 10:00:54 2011] [debug] mod_proxy_cluster.c(1245): proxy: byrequests balancer DONE (ajp://10.16.88.181:8009) [Mon May 09 10:00:54 2011] [debug] mod_proxy.c(1015): Running scheme balancer handler (attempt 0) -- [Mon May 09 10:01:01 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data [Mon May 09 10:01:01 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 09 [Mon May 09 10:01:01 2011] [debug] ajp_header.c(697): ajp_parse_type: got 09 [Mon May 09 10:01:01 2011] [debug] mod_proxy_ajp.c(546): proxy: Processing of request failed backend: 1, output: 0 [Mon May 09 10:01:01 2011] [error] proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed [Mon May 09 10:01:01 2011] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (10.16.88.179) [Mon May 09 10:01:01 2011] [debug] mod_proxy_cluster.c(1739): cluster: Found value aVjGbiLTA6+bo4Kx9OmqDA__.node1 for stickysession JSESSIONID|jsessionid [Mon May 09 10:01:01 2011] [debug] mod_proxy_cluster.c(2108): cluster: Using route node1 -- [Mon May 09 10:02:40 2011] [error] (104)Connection reset by peer: ajp_ilink_receive() can't receive header [Mon May 09 10:02:40 2011] [debug] mod_proxy_ajp.c(531): (120006)APR does not understand this error code: ajp_read_header failed [Mon May 09 10:02:40 2011] [error] ajp_handle_cping_cpong: ajp_ilink_receive failed [Mon May 09 10:02:40 2011] [debug] mod_proxy_ajp.c(546): proxy: Processing of request failed backend: 1, output: 0 [Mon May 09 10:02:40 2011] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done [Mon May 09 10:02:40 2011] [error] (120006)APR does not understand this error code: proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed [Mon May 09 10:02:40 2011] [error] (120006)APR does not understand this error code: proxy: AJP: cping/cpong failed to 10.16.88.179:8009 (10.16.88.179) [Mon May 09 10:02:40 2011] [debug] proxy_util.c(2067): proxy: connecting ajp://10.16.88.179:8009/st/SessionTestServlet? to 10.16.88.179:8009 -- [Mon May 09 10:05:23 2011] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data [Mon May 09 10:05:23 2011] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 09 [Mon May 09 10:05:23 2011] [debug] ajp_header.c(697): ajp_parse_type: got 09 [Mon May 09 10:05:23 2011] [debug] mod_proxy_ajp.c(546): proxy: Processing of request failed backend: 1, output: 0 [Mon May 09 10:05:23 2011] [error] proxy: dialog to 10.16.88.179:8009 (10.16.88.179) failed [Mon May 09 10:05:23 2011] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (10.16.88.179) [Mon May 09 10:05:23 2011] [debug] ajp_utils.c(102): ajp_handle_cping_cpong: Done [Mon May 09 10:05:23 2011] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb {noformat} Link: Added: This issue Cloned to JBPAPP-6578 Michal, it would have been better to have reopened JBPAPP-3614 than continue the discussion in a new JIRA. Or is this a new bug after fixing the mod_proxy in httpd2.2.17 as I see from the comments on JBPAPP-3614? @[~rrajesh] I guess that this is a new, or at least slightly different problem. @[~jfclere] What do you think? I have reproduced JBPAPP-6451, It looks like something different and a bug in the log on the ajp module will cause me more work to investigate it. @[~jfclere] Is there anything I can do for you to facilitate the debug process? I'm talking not only about [JBPAPP-6451|https://issues.jboss.org/browse/JBPAPP-6451], the [JBPAPP-6578|https://issues.jboss.org/browse/JBPAPP-6578] is an issue as well. As I see it, the 500 Http Error affects all platforms (didn't try Windoz) as long as the balancer is stressed a bit, or to be more specific, as long as there are enough sessions for a certain time. Regarding the [JBPAPP-6578|https://issues.jboss.org/browse/JBPAPP-6578], would it help you if I run the same scenario with a very small number of sessions, quite high number of sessions and a very high number of sessions and cross-reference the *mod_proxy_ajp* logs? the 503 correspond to normal errors: we can't replay a request interrupted while waiting for the answer. The 500 are JBPAPP-6578 The 503 are normal errors. The 500 are in JBPAPP-6578 Writer: Removed: mbabacek |