Bug 900052 (JBPAPP6-833) - mod_cluster: Failover on worker shutdown takes too much time
Summary: mod_cluster: Failover on worker shutdown takes too much time
Keywords:
Status: CLOSED NEXTRELEASE
Alias: JBPAPP6-833
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: unspecified
Version: 6.0.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: EAP 6.0.0
Assignee: Paul Ferraro
QA Contact:
URL: http://jira.jboss.org/jira/browse/JBP...
Whiteboard: eap6_need_triage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-14 20:35 UTC by Michal Karm Babacek
Modified: 2014-06-28 12:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
mod_cluster 1.2.Final Windows, RHEL, Solaris
Last Closed: 2012-11-20 12:53:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 900559 0 urgent CLOSED mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 900606 0 high CLOSED CLONE - mod_cluster: HTTP 404 on node shutdown 2021-02-22 00:41:40 UTC
Red Hat Issue Tracker JBPAPP-8502 0 Major Closed 503 errors on failover: Comparing:EAP5.1.2+m_c1.0.10.GA_CP02 with EAP6ER3+m_c1.2.Final 2017-07-26 14:45:46 UTC
Red Hat Issue Tracker JBPAPP6-833 0 Major Closed mod_cluster: Failover on worker shutdown takes too much time 2017-07-26 14:45:46 UTC
Red Hat Issue Tracker MODCLUSTER-314 0 Critical Closed mod_cluster: HTTP 404 on node shutdown with pure IPv6 setup 2017-07-26 14:45:46 UTC

Internal Links: 900559 900606

Description Michal Karm Babacek 2012-03-14 20:35:03 UTC
Affects: Release Notes
project_key: JBPAPP6

This JIRA captures the fact that failover, even with shutdown (not kill) is quite slow.
What do you think about this:
{noformat}
10.16.89.39 - - [14/Mar/2012:16:12:24 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 200 2
10.16.89.39 - - [14/Mar/2012:16:12:24 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 200 2
10.16.88.188 - - [14/Mar/2012:16:12:27 -0400] "DISABLE-APP / HTTP/1.1" 200 -
10.16.88.188 - - [14/Mar/2012:16:12:27 -0400] "DISABLE-APP / HTTP/1.1" 200 -
10.16.88.188 - - [14/Mar/2012:16:12:27 -0400] "STOP-APP / HTTP/1.1" 200 74
10.16.88.188 - - [14/Mar/2012:16:12:27 -0400] "STOP-APP / HTTP/1.1" 200 81
10.16.88.188 - - [14/Mar/2012:16:12:27 -0400] "REMOVE-APP / HTTP/1.1" 200 -
10.16.88.188 - - [14/Mar/2012:16:12:27 -0400] "REMOVE-APP /* HTTP/1.1" 200 -
10.16.89.39 - - [14/Mar/2012:16:12:28 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:29 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:30 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:31 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:33 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:35 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:36 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 503 323
10.16.89.39 - - [14/Mar/2012:16:12:39 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 200 2
10.16.89.39 - - [14/Mar/2012:16:12:40 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 200 2
10.16.89.39 - - [14/Mar/2012:16:12:41 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 200 2
10.16.89.39 - - [14/Mar/2012:16:12:41 -0400] "GET /SessionTest/SessionTestServlet HTTP/1.1" 200 2
{noformat} 
There were 7 "503" HTTP errors in 15 seconds time span, despite the fact that the balancer has received the *REMOVE-APP /\** message... [Error_log on pastebin|http://pastebin.com/aF7P2iSn].

Is it ok, that there was no DISABLE-APP and STOP-APP for context */\** ?
Mod_cluster 1.1.3 with EAP5 was not presenting this behaviour :-(

(i) Note: We are talking just manual testing on windows(balancer) and 2 RHEL workers here, just Ctrl+F5 in Firefox and Ctrl+C in terminal. No hundreds of thousands of requests and killing jvm with -9.

Comment 1 Rajesh Rajasekaran 2012-03-19 19:07:58 UTC
Labels: Added: eap6_need_triage


Comment 2 Michal Karm Babacek 2012-03-21 17:09:36 UTC
Link: Added: This issue relates to JBPAPP-8502


Comment 3 Paul Ferraro 2012-03-26 20:57:13 UTC
This sounds like a race condition between the Connector stopping and the deployment stopping.
Connectors don't actually have a dependent services, therefore nothing prevents a Service<Connector> from stopping before a application undeploys during server shutdown.
A simple fix would be to add dependency to Service<ModCluster> on the requisite Service<Connector>.  This would trigger mod_cluster's shutdown hook before shutting down the web connectors.  This would require a change to the mod_cluster subsystem schema - to identify the dependent connector; and ideally, to mod_cluster upstream, to allow AS7 to indicate which connector mod_cluster should use, in lieu of the current logic which tries to figure out which connector is most ideal.

Comment 4 Paul Ferraro 2012-03-26 20:59:35 UTC
The reason this isn't an issue in EAP5/AS5/AS6, is that mod_cluster listens for JMX notifications emitted by the server before it shuts down.

Comment 5 Paul Ferraro 2012-04-10 16:46:51 UTC
Link: Added: This issue depends AS7-4448


Comment 6 Rajesh Rajasekaran 2012-04-30 22:55:31 UTC
Michal, looks like the upstream issue has been fixed (ER6 build). Do you still see this issue?

Comment 7 Michal Karm Babacek 2012-05-30 22:58:46 UTC
Link: Added: This issue relates to JBPAPP-9195


Comment 8 Michal Karm Babacek 2012-05-31 00:36:15 UTC
Link: Added: This issue relates to MODCLUSTER-314


Comment 10 Michal Karm Babacek 2012-06-06 13:50:20 UTC
Link: Added: This issue relates to MODCLUSTER-316


Comment 11 Jean-Frederic Clere 2012-06-07 15:55:04 UTC
Fixed by MODCLUSTER-302 and
AS7-4448.

Comment 12 Rajesh Rajasekaran 2012-06-07 16:57:40 UTC
Setting fix version to CR1 even though this issue was resolved today as the linked fixes might have even been available a few builds back.
Michal you also indicated affects version of CR1 on JBPAPP-9195 which is basically the same issue.
Can you check with dev on how this issue was resolved and cross check your test setup? 

Comment 13 Dana Mison 2012-06-12 02:04:25 UTC
Release Notes Docs Status: Added: Documented as Resolved Issue
Release Notes Text: Added: During server shutdown, the connector used for mod_cluster communication was shutdown before the mod_cluster service itself was stopped.  This resulted in many failed mod_cluster requests in the intervening time period.  A service dependency has now been added on the web connector service and the mod_cluster subsystem must declare which connector it is using. This means that the mod_cluster service will be automatically be stopped when the connector is shutdown.



Comment 14 Dana Mison 2012-06-12 02:04:52 UTC
Docs QE Status: Removed: NEW Added: ASSIGNED


Comment 15 Dana Mison 2012-06-12 02:08:28 UTC
Writer: Added: Darrin
Release Notes Text: Removed: During server shutdown, the connector used for mod_cluster communication was shutdown before the mod_cluster service itself was stopped.  This resulted in many failed mod_cluster requests in the intervening time period.  A service dependency has now been added on the web connector service and the mod_cluster subsystem must declare which connector it is using. This means that the mod_cluster service will be automatically be stopped when the connector is shutdown.
 Added: During server shutdown, the connector used for mod_cluster communication was shutdown before the mod_cluster service was stopped.  This resulted in many failed mod_cluster requests in the intervening time period.  A service dependency has now been added on the web connector service and the mod_cluster subsystem must declare which connector it is using. This means that the mod_cluster service will be automatically be stopped when the connector is shutdown.



Comment 16 Dana Mison 2012-06-12 02:09:23 UTC
Release Notes Text: Removed: During server shutdown, the connector used for mod_cluster communication was shutdown before the mod_cluster service was stopped.  This resulted in many failed mod_cluster requests in the intervening time period.  A service dependency has now been added on the web connector service and the mod_cluster subsystem must declare which connector it is using. This means that the mod_cluster service will be automatically be stopped when the connector is shutdown.
 Added: During server shutdown, the connector used for mod_cluster communication was shutdown before the mod_cluster service was stopped.  This resulted in many failed mod_cluster requests in the intervening time period.  A service dependency has now been added on the web connector service and the mod_cluster subsystem must now declare which connector it is using. This means that the mod_cluster service will be automatically be stopped when the connector is shutdown.



Comment 17 Michal Karm Babacek 2012-06-14 13:42:09 UTC
@Rajesh: I am keeping an eye on this issue, I am gonna verify as soon as possible (in the scope of the related JIRAs as well).

Comment 18 Radim Hatlapatka 2012-08-22 12:04:56 UTC
RemoteIssueLink: Added: This issue links to "Failover on worker (tomcat) causes non 200 HTTP codes for few seconds (Web Link)"


Comment 19 Michal Karm Babacek 2012-08-22 15:23:03 UTC
RemoteIssueLink: Added: This issue links to "Bug 850769 - Failover on worker (tomcat) causes non 200 HTTP codes for few seconds (Web Link)"


Comment 20 Michal Karm Babacek 2012-08-22 15:23:42 UTC
RemoteIssueLink: Removed: This issue links to "Failover on worker (tomcat) causes non 200 HTTP codes for few seconds (Web Link)" 


Comment 21 Dana Mison 2012-10-19 04:25:34 UTC
Affects: Added: Release Notes


Comment 22 Anne-Louise Tangring 2012-11-05 17:55:50 UTC
Release Notes Docs Status: Removed: Documented as Resolved Issue 
Writer: Removed: Darrin 
Release Notes Text: Removed: During server shutdown, the connector used for mod_cluster communication was shutdown before the mod_cluster service was stopped.  This resulted in many failed mod_cluster requests in the intervening time period.  A service dependency has now been added on the web connector service and the mod_cluster subsystem must now declare which connector it is using. This means that the mod_cluster service will be automatically be stopped when the connector is shutdown.
 
Docs QE Status: Removed: ASSIGNED 


Comment 23 Michal Karm Babacek 2012-11-20 12:53:30 UTC
Closing. Can't reproduce with the current code base.


Note You need to log in before you can comment on or make changes to this bug.