Bug 900559 (JBPAPP6-1407) - mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup
Summary: mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup
Keywords:
Status: CLOSED DUPLICATE of bug 2040112
Alias: JBPAPP6-1407
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Apache Server (httpd) and Connectors, mod_cluster
Version: 6.0.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: TBD EAP 6
Assignee: Paul Ferraro
QA Contact:
URL: http://jira.jboss.org/jira/browse/JBP...
Whiteboard: eap6 ipv6
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-30 22:51 UTC by Michal Karm Babacek
Modified: 2014-06-28 12:33 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
RHEL6 x86_64, pure IPv6 (vmg35, vmg36)
Last Closed: 2012-11-20 14:19:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
access_log-IPv4-OK.zip (2.47 KB, application/zip)
2012-05-30 22:52 UTC, Michal Karm Babacek
no flags Details
error_log-IPv4-OK.zip (62.44 KB, application/zip)
2012-05-30 22:52 UTC, Michal Karm Babacek
no flags Details
error_log-IPv6-NOK.zip (3.34 KB, application/zip)
2012-05-30 22:52 UTC, Michal Karm Babacek
no flags Details
perf07-Ctrl+C-log-IPv4.zip (14.94 KB, application/zip)
2012-05-30 22:52 UTC, Michal Karm Babacek
no flags Details
perf08-Ctrl+C-log-IPv4.zip (15.31 KB, application/zip)
2012-05-30 22:52 UTC, Michal Karm Babacek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 900052 0 high CLOSED mod_cluster: Failover on worker shutdown takes too much time 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 900484 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 900606 0 high CLOSED CLONE - mod_cluster: HTTP 404 on node shutdown 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 901197 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Issue Tracker JBPAPP6-1407 0 Critical Closed mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup 2017-02-01 12:54:31 UTC
Red Hat Issue Tracker MODCLUSTER-314 0 Critical Closed mod_cluster: HTTP 404 on node shutdown with pure IPv6 setup 2017-02-01 12:54:31 UTC

Internal Links: 900052 900484 900606 901197

Description Michal Karm Babacek 2012-05-30 22:51:17 UTC
project_key: JBPAPP6

[~jkudrnac] reported she had experienced HTTP 503 errors with mod_cluster & pure IPv6 setup on node shutdown. I have done some investigation and I can reproduce the problem.

h3. Http client
I have a curl client issuing requests to [2620:52:0:105f::ffff:c]:8888/SessionTest/hostname periodically, delay being 1 s. Note that there is always a new session for each request (no JSESSIONID stuff anywhere). There are two nodes I switch off and on randomly, always giving enough time so as the starting one may take off safely. After several dozens of this start-ups and shut-downs a 503 error occurs.
{noformat}
Wed May 30 14:07:15 EDT 2012 [Balancer]:8888 0
Wed May 30 14:07:16 EDT 2012 [Balancer]:8888 0
Wed May 30 14:07:17 EDT 2012 HTTP Status 503 - This application is not currently available
Wed May 30 14:07:18 EDT 2012 [Balancer]:8888 0
Wed May 30 14:07:19 EDT 2012 [Balancer]:8888 0
{noformat}
please, note the time *14:07:17* as it is very well synchronized with the other node and it will show us what happened.

h3. Worker nodes
h4. Node vmg36 [2620:52:0:105f::ffff:3] (/)
running normally, there were requests served by it before and after the incident.
{noformat}
14:07:11,920 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:0%2:8009-3) Setting up new session - QpLxi52+8NmJNFMW2YDZm2MT.dc7bd552-a020-3d08-acee-4ae3e0f178a8
14:07:12,985 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:0%2:8009-1) Setting up new session - 16FUOxeUglbTql5+PzEJSLno.dc7bd552-a020-3d08-acee-4ae3e0f178a8
14:07:18,187 INFO  [org.jboss.as.clustering] (Incoming-20,null) JBAS010225: New cluster view for partition web (id: 8, delta: -1, merge: false) : [vmg36/web]
14:07:18,192 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-20,null) ISPN000094: Received new cluster view: [vmg36/web|8] [vmg36/web]
14:07:18,342 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:0%2:8009-2) Setting up new session - gdiFjJDwfFUSL2vlkqweUGzi.dc7bd552-a020-3d08-acee-4ae3e0f178a8
{noformat}

h4. Node vmg35 [2620:52:0:105f::ffff:c] (x)
Please, note the moment when *^C* was pressed at *^C14:07:17,110*:
{noformat}
14:07:15,113 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:c%2:8009-8) Setting up new session - XGBdEy3g22SQZI0nyOQT7kRF.f49689d6-cdbb-3015-a642-f8200ea456ff
14:07:16,184 INFO  [org.jboss.test.cluster.servlet.SessionHostnameServlet] (ajp-/2620:52:0:105f:0:0:ffff:c%2:8009-7) Setting up new session - 9l8RklnPOpvG5D2BKRHNjKV2.f49689d6-cdbb-3015-a642-f8200ea456ff
^C14:07:17,110 INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-3) Pausing Coyote HTTP/1.1 on http-/2620:52:0:105f:0:0:ffff:c%2:8080
14:07:17,122 INFO  [org.apache.coyote.http11.Http11Protocol] (MSC service thread 1-3) Stopping Coyote HTTP/1.1 on http-/2620:52:0:105f:0:0:ffff:c%2:8080
14:07:17,291 INFO  [org.jboss.as.logging] JBAS011503: Restored bootstrap log handlers
14:07:17,272 INFO  [org.jboss.modcluster.ModClusterService] All active sessions drained from context [/SessionTest] in 0 seconds
14:07:17,408 INFO  [org.apache.coyote.ajp.AjpProtocol] Pausing Coyote AJP/1.3 on ajp-/2620:52:0:105f:0:0:ffff:c%2:8009
14:07:17,421 INFO  [org.apache.coyote.ajp.AjpProtocol] Stopping Coyote AJP/1.3 on ajp-/2620:52:0:105f:0:0:ffff:c%2:8009
+++
14:07:17,810 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000029: Passivating all entries to disk
14:07:17,811 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000030: Passivated 0 entries in 0 milliseconds
14:07:17,848 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000029: Passivating all entries to disk
14:07:17,869 INFO  [com.arjuna.ats.jbossatx] ARJUNA032018: Destroying TransactionManagerService
14:07:17,873 INFO  [org.infinispan.eviction.PassivationManagerImpl] ISPN000030: Passivated 1 entries in 24 milliseconds
14:07:17,895 INFO  [com.arjuna.ats.jbossatx] ARJUNA032014: Stopping transaction recovery manager
14:07:17,911 INFO  [org.jboss.as.clustering.infinispan] JBAS010282: Stopped default-host/SessionTest cache from web container
14:07:17,925 INFO  [org.jboss.as.clustering.infinispan] JBAS010282: Stopped repl cache from web container
14:07:18,005 INFO  [org.jboss.as.server.deployment] JBAS015877: Stopped deployment SessionTest.war in 960ms
14:07:18,152 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] ISPN000082: Stopping the RpcDispatcher
14:07:18,506 INFO  [org.jboss.as] JBAS015950: JBoss EAP 6.0.0.GA (AS 7.1.2.Final-redhat-1) stopped in 1424ms
{noformat}

h3. HTTPD
In *Access log*, there is the 503 (in the middle) as the aforementioned http client has shown already:
{noformat}
2620:52:0:105f::ffff:3 - - [30/May/2012:14:07:13 -0400] "STATUS / HTTP/1.1" 200 85 "-" "ClusterListener/1.0"
2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:15 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:16 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "DISABLE-APP / HTTP/1.1" 200 - "-" "ClusterListener/1.0"
2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "STOP-APP / HTTP/1.1" 200 107 "-" "ClusterListener/1.0"
2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:17 -0400] "GET /SessionTest/hostname HTTP/1.1" 503 1114 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "REMOVE-APP / HTTP/1.1" 200 - "-" "ClusterListener/1.0"
2620:52:0:105f::ffff:c - - [30/May/2012:14:07:17 -0400] "REMOVE-APP /* HTTP/1.1" 200 - "-" "ClusterListener/1.0"
2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:18 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
2620:52:0:105f:5054:ff:fe52:e008 - - [30/May/2012:14:07:19 -0400] "GET /SessionTest/hostname HTTP/1.1" 200 31 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2"
{noformat}

finally, we may observe the exact part of the *Error log* that differs from the healthy behaviour:
{noformat}
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(224): Into ajp_marshal_into_msgb
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[0] [User-Agent] = [curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.12.9.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[1] [Host] = [[2620:52:0:105f::ffff:c]:8888]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[2] [Accept] = [*/*]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[3] [Via] = [1.1 proxy-01.mw.lab.eng.bos.redhat.com (squid/3.1.10)]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[4] [X-Forwarded-For] = [2620:52:0:105f:5054:ff:fe9e:282e]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[5] [Cache-Control] = [max-age=259200]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(290): ajp_marshal_into_msgb: Header[6] [Connection] = [keep-alive]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(450): ajp_marshal_into_msgb: Done
[Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(265): proxy: APR_BUCKET_IS_EOS
[Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(270): proxy: data to read (max 8186 at 4)
[Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(285): proxy: got 0 bytes of data
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans DISABLE-APP (/)
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler DISABLE-APP (/) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff&Alias=default-host%2Clocalhost%2Cexample.com&Context=%2FSessionTest"
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler DISABLE-APP  OK
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans STOP-APP (/)
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler STOP-APP (/) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff&Alias=default-host%2Clocalhost%2Cexample.com&Context=%2FSessionTest"
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler STOP-APP  OK
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 04
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(697): ajp_parse_type: got 04
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(516): ajp_unmarshal_response: status = 503
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(537): ajp_unmarshal_response: Number of headers is = 2
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(599): ajp_unmarshal_response: Header[0] [Content-Type] = [text/html;charset=utf-8]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(609): ajp_unmarshal_response: ap_set_content_type done
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(599): ajp_unmarshal_response: Header[1] [Content-Length] = [1114]
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 03
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(697): ajp_parse_type: got 03
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(687): ajp_read_header: ajp_ilink_received 05
[Wed May 30 14:07:17 2012] [debug] ajp_header.c(697): ajp_parse_type: got 05
[Wed May 30 14:07:17 2012] [debug] mod_proxy_ajp.c(577): proxy: got response from [2620:52:0:105f::ffff:c]:8009 (2620:52:0:105f:0:0:ffff:c%2)
[Wed May 30 14:07:17 2012] [debug] proxy_util.c(2029): proxy: AJP: has released connection for (2620:52:0:105f:0:0:ffff:c%2)
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans REMOVE-APP (/)
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler REMOVE-APP (/) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff&Alias=default-host%2Clocalhost%2Cexample.com&Context=%2FSessionTest"
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler REMOVE-APP  OK
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(1817): manager_trans REMOVE-APP (/*)
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2473): manager_handler REMOVE-APP (/NODE_COMMAND) processing: "JVMRoute=f49689d6-cdbb-3015-a642-f8200ea456ff"
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(1169): process_node_cmd 4 processing node: 2
[Wed May 30 14:07:17 2012] [debug] mod_manager.c(2522): manager_handler REMOVE-APP  OK
[Wed May 30 14:07:17 2012] [debug] mod_proxy_cluster.c(678): update_workers_node starting
{noformat}
It looks like the *manager_handler* did not make it in time so as to avoid the choice of wrong node.
(i) *Note:* Does the fact that nodes are talking via *proxy-01.mw.lab.eng.bos.redhat.com (squid/3.1.10)* anything to do with the problem on hand?

h3. Configuration
Configs are pretty simple, no silly tricks here:
h4. HTTPD
{noformat}
<IfModule manager_module>
Listen [2620:52:0:105f::ffff:c]:8888
<VirtualHost [2620:52:0:105f::ffff:c]:8888>
   ServerName vmg35
   ServerAlias *.vmg35
  <Directory />
    Order deny,allow
    Deny from all
    Allow from all
  </Directory>
  AdvertiseGroup [ff01::1]:23364 
  ServerAdvertise off
  KeepAliveTimeout 60
  MaxKeepAliveRequests 0
 
  ManagerBalancerName mbabaceksCluster
  AdvertiseFrequency 5
  EnableMCPMReceive

  <Location /mcm>
     SetHandler mod_cluster-manager
     Order deny,allow
     Deny from all
     Allow from all
  </Location>
</VirtualHost>
</IfModule>
{noformat}

h4. AS7
{noformat}
<subsystem xmlns="urn:jboss:domain:modcluster:1.1">
  <mod-cluster-config advertise="false" proxy-list="[2620:52:0:105f::ffff:c]:8888" sticky-session="true" sticky-session-force="false" sticky-session-remove="false" connector="ajp" >
    <dynamic-load-provider>
      <load-metric type="busyness"/>
    </dynamic-load-provider>
  </mod-cluster-config>
</subsystem>
{noformat}

h3. IPv4
I tried to reproduce this problem on IPv4 with the exactly same instances and configurations (only IP addresses being different)  and I was not able to get any 503 errors.
There are some error-free IPv4 logs attached for comparison:

 * access_log-IPv4-OK.zip
 * error_log-IPv4-OK.zip
 * error_log-IPv6-NOK.zip
 * perf07-Ctrl+C-log-IPv4.zip
 * perf08-Ctrl+C-log-IPv4.zip

Comment 1 Michal Karm Babacek 2012-05-30 22:52:13 UTC
Attachment: Added: access_log-IPv4-OK.zip
Attachment: Added: error_log-IPv4-OK.zip
Attachment: Added: error_log-IPv6-NOK.zip
Attachment: Added: perf07-Ctrl+C-log-IPv4.zip
Attachment: Added: perf08-Ctrl+C-log-IPv4.zip


Comment 2 Michal Karm Babacek 2012-05-30 22:58:46 UTC
* Is it possible it has anything to do with [JBPAPP-9087], provided *ServerAdvertise off* ?

 * How about the good old [JBPAPP-8466] ?

Comment 3 Michal Karm Babacek 2012-05-30 22:58:46 UTC
Link: Added: This issue is related to JBPAPP-8466


Comment 4 Michal Karm Babacek 2012-05-30 22:58:46 UTC
Link: Added: This issue is related to JBPAPP-9087


Comment 5 Michal Karm Babacek 2012-05-31 00:36:16 UTC
Link: Added: This issue relates to MODCLUSTER-314


Comment 6 Jean-Frederic Clere 2012-05-31 06:24:53 UTC
that is a kind of race condition the request is been processed when the application is stopped.
The 503 is coming from AS7.

Comment 7 Jean-Frederic Clere 2012-05-31 06:30:37 UTC
@Michal:
I don't think it is related to JBPAPP-9087 nor JBPAPP-8466 as the error comes for AS7.

Comment 8 Jean-Frederic Clere 2012-05-31 06:32:56 UTC
that is JBPAPP-8466

Comment 9 Michal Karm Babacek 2012-06-06 13:50:20 UTC
Link: Added: This issue relates to MODCLUSTER-316


Comment 10 Anne-Louise Tangring 2012-11-13 20:57:55 UTC
Docs QE Status: Removed: NEW 


Comment 11 Permaine Cheung 2012-11-20 14:55:11 UTC
Link: Added: This issue is related to JBEWS-496



Note You need to log in before you can comment on or make changes to this bug.