Bug 987259 - hot-standby for mod_cluster
Summary: hot-standby for mod_cluster
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: mod_cluster
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ER9
: EAP 6.3.0
Assignee: Vaclav Tunka
QA Contact: Michal Karm Babacek
URL:
Whiteboard:
Depends On: 1101681 1107551
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-23 05:58 UTC by Hisanobu Okuda
Modified: 2018-12-09 17:06 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
This release of JBoss EAP 6 introduces a 'hot-standby' feature to mod_cluster.
Clone Of:
Environment:
Last Closed: 2014-08-06 14:36:26 UTC
Type: Feature Request
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker EAP6-172 0 Major Closed hot-standby for mod_cluster 2020-02-14 15:49:17 UTC
Red Hat Issue Tracker MODCLUSTER-235 0 Major Closed Support hot-standby worker nodes 2020-02-14 15:49:17 UTC
Red Hat Issue Tracker MODCLUSTER-391 0 Major Resolved mod_cluster and mod_proxy integration 2020-02-14 15:49:18 UTC
Red Hat Issue Tracker PRODMGT-494 0 Major Resolved Need fail_on_status and hot-standby of mod_jk for mod_cluster 2020-02-14 15:49:18 UTC

Description Hisanobu Okuda 2013-07-23 05:58:35 UTC
Description of problem:
Need hot-standby feature for mod_cluster

Version-Release number of selected component (if applicable):
EAP 6.x

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 JBoss JIRA Server 2013-07-26 19:04:38 UTC
John Doyle <jdoyle> made a comment on jira PRODMGT-494

We can look at this RFE, but we don't have a release for EAP that could deliver the capability by the October 2013 date specified by the user.

Comment 2 JBoss JIRA Server 2013-07-26 19:08:18 UTC
John Doyle <jdoyle> made a comment on jira PRODMGT-494

I see you have this in mod_cluster already and have been moving the target forward.  Do you have a release in mind?

Comment 3 JBoss JIRA Server 2013-07-29 07:47:28 UTC
Hisanobu Okuda <hokuda> made a comment on jira PRODMGT-494

If you meant https://issues.jboss.org/browse/MODCLUSTER-235 , it is not implemented yet.

Comment 7 Jean-frederic Clere 2014-02-07 15:44:37 UTC
I think the feature is already there just use the node as:
<simple-load-provider factor="0"/>

Comment 8 John Doyle 2014-02-07 16:13:22 UTC
Do you know when that came into code so we can mark a fixed version?  Or has it always been there?

Comment 9 Jean-frederic Clere 2014-02-07 16:16:20 UTC
In fact it is crippled by the description of the modcluster subsystem.
"JBAS014708: 0 is an invalid value for parameter factor. A minimum value of 1 is required"
That needs to be changed and the feature should be tested.

Comment 11 JBoss JIRA Server 2014-02-11 08:28:35 UTC
Jean-Frederic Clere <jfclere> updated the status of jira MODCLUSTER-235 to Resolved

Comment 14 Hisanobu Okuda 2014-02-12 00:15:45 UTC
https://issues.jboss.org/browse/EAP6-172 was filed for this request.

Comment 16 Jean-frederic Clere 2014-02-12 08:55:00 UTC
mod_cluster won't any request to a hot standby node except:
1 - The node is changed to a normal node (factor > 0)
2 - All the other nodes are in error or have be removed
Note that as soon as another node starts the request will be directed to the new node.

Comment 18 Kabir Khan 2014-02-16 09:59:56 UTC
It is not clear to me if https://github.com/jbossas/jboss-eap/pull/916 completely solves the issue. If that is not the case, please move this issue back to the ASSIGNED state.

Comment 20 Jean-frederic Clere 2014-02-17 07:00:24 UTC
the issue also requires a fix in the C part (mod_clsuter-1.2.8.Final).

Comment 23 Rostislav Svoboda 2014-02-20 18:10:30 UTC
https://issues.jboss.org/browse/EAP6-172 not yet acked, removing ack

Comment 26 Michal Karm Babacek 2014-03-10 13:32:50 UTC
QA_ACK, thanks Hisanobu for the cooperation.

Comment 28 JBoss JIRA Server 2014-04-18 17:02:50 UTC
Michal Babacek <mbabacek> updated the status of jira MODCLUSTER-235 to Closed

Comment 29 Michal Karm Babacek 2014-06-18 14:56:56 UTC
Bad news: It's broken.

Good news: Here is a patch that fixes it: https://github.com/modcluster/mod_cluster/pull/95

What's wrong:

Hot-standby node appears as a "Load: -1" node. That's wrong, it must be Load: 0 so as to allow for forwarding requests to it in case no other nodes are available.

i.e.:

 * load > 0  : a load factor.
 * load = 0  : standby worker.
 * load = -1 : errored worker.
 * load = -2 : just do a cping/cpong.

Comment 30 Jean-frederic Clere 2014-06-18 15:33:07 UTC
The requests should be forwarded even the load is -1.

Comment 31 Michal Karm Babacek 2014-06-18 15:37:33 UTC
I get 

Balancer: qacluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 1,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1 

<title>503 Service Temporarily Unavailable</title>

with Load: -1

Comment 32 Radoslav Husar 2014-06-18 15:40:19 UTC
Hi Michal, could you elaborate how did you test? (I couldn't reproduce with master which should have the very similar code in this area).

Comment 33 Jean-frederic Clere 2014-06-18 15:51:03 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1074550#c2

Comment 34 Michal Karm Babacek 2014-06-18 16:12:26 UTC
Sure, let's have a look:

- grab the closest RHEL7 x86_64 box (haven't tried to reproduce elsewhere yet)
- download these (sha1sum):

  29b1578ade041492cd18db9f225f4de1bf025a7f  jboss-eap-6.3.0.ER7.zip
  2ba91fed7bf2ea830e1469f66baeabb0dd05701e  httpd/jboss-ews-httpd-2.1.0-RHEL7-x86_64.zip
  9807dcabcb91f1f102e278d33f503c399f985264  jboss-eap-native-webserver-connectors-6.3.0.ER7-RHEL7-x86_64.zip

- start httpd, my config: 

MemManagerFile "/dev/shm/mod_cluster-eapx/jboss-ews-2.1/httpd/cache/mod_cluster"
ServerName 192.168.122.78:2181
<IfModule manager_module>
  Listen 192.168.122.78:8847
  LogLevel debug
  <VirtualHost 192.168.122.78:8847>
    ServerName 192.168.122.78:8847
    <Directory />
      Order deny,allow
      Deny from all
      Allow from all
    </Directory>
    KeepAliveTimeout 60
    MaxKeepAliveRequests 0
    ServerAdvertise on
    AdvertiseFrequency 5
    ManagerBalancerName qacluster
    AdvertiseGroup 224.0.5.12:65409
    EnableMCPMReceive
    <Location /mcm>
      SetHandler mod_cluster-manager
      Order deny,allow
      Deny from all
      Allow from all
    </Location>
  </VirtualHost>
</IfModule>

- configure two ordinary, standalone-ha.xml EAP instances and set them jvmRoutes
  <system-properties>
    <property name="jboss.mod_cluster.jvmRoute" value="jboss-eap-6.3"/>
    <property name="jboss.node.name" value="jboss-eap-6.3"/>
  </system-properties>
  <socket-binding name="modcluster" port="0" multicast-address="224.0.5.12" multicast-port="65409"/>

- configure one hot-standby one (jvmRoute set via property as above...)
  <subsystem xmlns="urn:jboss:domain:modcluster:1.2">
      <mod-cluster-config advertise-socket="modcluster" connector="ajp">
          <simple-load-provider factor="0"/>
       </mod-cluster-config>
  </subsystem>

- start it

- you have, e.g.:

Node jboss-eap-6.3 (ajp://192.168.122.78:8009): 
+++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 100 

Node jboss-eap-6.3-2 (ajp://192.168.122.78:8110): 
+++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 100 

Node jboss-eap-6.3-3 (ajp://192.168.122.78:8215): 
+++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1

- curling:

  repeatedly accessing: curl 'http://rhel7x86-64:8847/clusterbench/requestinfo;jsessionid=awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2';


JVM route: jboss-eap-6.3-2
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2
Session isNew: false

JVM route: jboss-eap-6.3-2
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2
Session isNew: false

JVM route: jboss-eap-6.3-2
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2
Session isNew: false

-- Node jboss-eap-6.3-2 stopped --

JVM route: jboss-eap-6.3
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3
Session isNew: false

JVM route: jboss-eap-6.3
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3
Session isNew: false

-- Node jboss-eap-6.3 stopped --


<title>503 Service Temporarily Unavailable</title>
+++ SNAP +++
<address>Apache/2.2.26 (Red Hat Enterprise Web Server) Server at rhel7x86-64 Port 8847</address>

<title>503 Service Temporarily Unavailable</title>
+++ SNAP +++
<address>Apache/2.2.26 (Red Hat Enterprise Web Server) Server at rhel7x86-64 Port 8847</address>





The problem in the code is the lbfactor, as you might observe. Take a look at these macros and maybe run preprocessor so as to see what C code you are actually going to compile...

Comment 35 Michal Karm Babacek 2014-06-18 16:18:54 UTC
@All: Load: -1 for a hot-standby node is wrong, it shows that lbfactor is -1. -1 is an error code for lbfactor -- i.e. node won't be used at all.

Comment 36 Jean-frederic Clere 2014-06-19 17:34:29 UTC
+++
 Node 4e6189af-0502-3305-8ff3-fad7fee8b516 (ajp://127.0.0.1:8009):
Enable Contexts Disable Contexts Stop Contexts
Balancer: mycluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 1,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1
Virtual Host 1:
Contexts:

/compileFailure, Status: ENABLED Request: 0 Disable Stop
+++

+++
[jfclere@jfcpc APACHE-2.2.21]$ curl -v http://localhost:8080/compileFailure/
.....
HTTP/1.1 200 OK
....
+++

It works for me I don't understand....

Comment 37 Jean-frederic Clere 2014-06-19 17:59:40 UTC
I am using 2.2.21 it works.
2.2.26 it fails :-(
Something is wrong :-(

Comment 39 Jean-frederic Clere 2014-06-20 07:27:04 UTC
I have merged https://github.com/modcluster/mod_cluster/pull/95

I can't explain why it worked for me in 2.2.21... I was probably compiling with another httpd than 2.2.21 :-(

Comment 40 Kabir Khan 2014-06-23 09:03:13 UTC
Speaking to Jean-Frederic, this will be a native upgrade

Comment 42 Michal Karm Babacek 2014-07-02 10:59:45 UTC
This issue will be verified in ER9 as a part of mod_cluster 1.2.9.Final + comment 29 patch. There won't be any component upgrade to 1.2.10.Final.

Regarding documentation and release notes:

Please,
See BZ 1074550
See https://bugzilla.redhat.com/show_bug.cgi?id=1115083#c0, Paragraph 4.

This new feature should be mentioned in release notes as per BZ 1115083.

Comment 43 Michal Karm Babacek 2014-07-11 18:42:51 UTC
Uff...it's been a long journey :-)

Comment 44 Michal Karm Babacek 2014-07-11 18:43:41 UTC
EAP 6.3.0.ER9

Comment 45 Michal Karm Babacek 2014-07-11 19:29:04 UTC
I was too quick in my judgement in comment 43.

While RHEL, HP-UX and Solaris builds work, the Windows binaries still present the "Load: -1" error.
It looks like the patch wasn't applied on mod_cluster 1.2.9.Final on Windows :-(

Comment 46 Jean-frederic Clere 2014-07-15 08:01:54 UTC
According to my investigations in fact the patch is applied but some wrong occured in the production.
Could you please test with https://brewweb.devel.redhat.com/buildinfo?buildID=368750

Comment 47 Vaclav Tunka 2014-07-15 10:45:04 UTC
This is an RCM issue, I created RT for them:
#306213 Investigate: extreme repo-regen delays, bad NVRs picked by Brew/Win builds

It seems Brew incorrectly picked the old mod_cluster-native-1.2.9-4.Final.win6 version, even thought mod_cluster-native-1.2.9-5.Final.win6 was built for more than hour before the compose was run.

2014-07-08 09:45:44              mod_cluster-native-1.2.9-5.Final.win6 
Tue, 08 Jul 2014 10:58:10 EDT    jboss-eap-native-webserver-connectors-6.3.0-7.win6

Log for connectors:
http://download.devel.redhat.com/brewroot/packages/jboss-eap-native-webserver-connectors/6.3.0/7.win6/data/logs/win/build.log

Here you can see picked up version (wrong one):
2014-07-08 10:55:00,265 [INFO] koji.vm: Retrieved /tmp/build/buildreqs/mod_cluster-native/win/mod_cluster-native-1.2.9-4.Final.win6.x86_64.zip (70762 bytes, md5: 7c7280627c5021d9abdbe176c9f0d4a7)
2014-07-08 10:55:00,296 [INFO] koji.vm: Retrieved /tmp/build/buildreqs/mod_cluster-native/win/mod_cluster-native-1.2.9-4.Final.win6.i686.zip (66322 bytes, md5: baa6fd0347cb6cfe1fd3b9c8c74ecbc1)

In Brew/Win there is no way, how to specify exact version. Latest version is used all the time for dependencies.

I will respin the jboss-eap-native-webserver-connectors and run the compose again today.

Comment 48 Vaclav Tunka 2014-07-15 10:46:26 UTC
The same is true for the src compose:
http://download.devel.redhat.com/brewroot/packages/jboss-eap/6.3.0/9.win6/data/logs/win/build.log

I will run that one as well.

Comment 49 Vaclav Tunka 2014-07-15 13:10:36 UTC
Windows containers (now verified to contain mod_cluster-native-1.2.9-5.Final.win6)

jboss-eap-6.3.0-10.win6
https://brewweb.devel.redhat.com/buildinfo?buildID=369683

jboss-eap-native-webserver-connectors-6.3.0-8.win6
https://brewweb.devel.redhat.com/buildinfo?buildID=369682

top level compose - httpd (also contains mod_cluster-native)
jboss-eap6-httpd-natives-6.3.0-8.ep6.el6
https://brewweb.devel.redhat.com/buildinfo?buildID=369695

top level compose for EAP 6.x handoff (running)
https://brewweb.devel.redhat.com/taskinfo?taskID=7701185

Comment 50 Michal Karm Babacek 2014-07-18 10:54:58 UTC
It works on Solaris and Windows now.
Verified with EAP 6.3.0.ER10.

Comment 51 JBoss JIRA Server 2014-08-01 11:05:45 UTC
Michal Babacek <mbabacek> updated the status of jira EAP6-172 to Resolved

Comment 52 JBoss JIRA Server 2015-04-28 15:09:42 UTC
John Doyle <jdoyle> updated the status of jira EAP6-172 to Closed


Note You need to log in before you can comment on or make changes to this bug.