Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 987259

Summary:	hot-standby for mod_cluster
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Hisanobu Okuda <hokuda>
Component:	mod_cluster	Assignee:	Vaclav Tunka <vtunka>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Michal Karm Babacek <mbabacek>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.1.0	CC:	chuffman, hokuda, jclere, jdoyle, kkhan, mbabacek, myarboro, patrick.martin, rhusar, rsvoboda, smumford, vtunka
Target Milestone:	ER9
Target Release:	EAP 6.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	This release of JBoss EAP 6 introduces a 'hot-standby' feature to mod_cluster.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-08-06 14:36:26 UTC	Type:	Feature Request
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1101681, 1107551
Bug Blocks:

Description Hisanobu Okuda 2013-07-23 05:58:35 UTC

Description of problem:
Need hot-standby feature for mod_cluster

Version-Release number of selected component (if applicable):
EAP 6.x

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 JBoss JIRA Server 2013-07-26 19:04:38 UTC

John Doyle <jdoyle> made a comment on jira PRODMGT-494

We can look at this RFE, but we don't have a release for EAP that could deliver the capability by the October 2013 date specified by the user.

Comment 2 JBoss JIRA Server 2013-07-26 19:08:18 UTC

John Doyle <jdoyle> made a comment on jira PRODMGT-494

I see you have this in mod_cluster already and have been moving the target forward.  Do you have a release in mind?

Comment 3 JBoss JIRA Server 2013-07-29 07:47:28 UTC

Hisanobu Okuda <hokuda> made a comment on jira PRODMGT-494

If you meant https://issues.jboss.org/browse/MODCLUSTER-235 , it is not implemented yet.

Comment 7 Jean-frederic Clere 2014-02-07 15:44:37 UTC

I think the feature is already there just use the node as:
<simple-load-provider factor="0"/>

Comment 8 John Doyle 2014-02-07 16:13:22 UTC

Do you know when that came into code so we can mark a fixed version?  Or has it always been there?

Comment 9 Jean-frederic Clere 2014-02-07 16:16:20 UTC

In fact it is crippled by the description of the modcluster subsystem.
"JBAS014708: 0 is an invalid value for parameter factor. A minimum value of 1 is required"
That needs to be changed and the feature should be tested.

Comment 11 JBoss JIRA Server 2014-02-11 08:28:35 UTC

Jean-Frederic Clere <jfclere> updated the status of jira MODCLUSTER-235 to Resolved

Comment 14 Hisanobu Okuda 2014-02-12 00:15:45 UTC

https://issues.jboss.org/browse/EAP6-172 was filed for this request.

Comment 16 Jean-frederic Clere 2014-02-12 08:55:00 UTC

mod_cluster won't any request to a hot standby node except:
1 - The node is changed to a normal node (factor > 0)
2 - All the other nodes are in error or have be removed
Note that as soon as another node starts the request will be directed to the new node.

Comment 18 Kabir Khan 2014-02-16 09:59:56 UTC

It is not clear to me if https://github.com/jbossas/jboss-eap/pull/916 completely solves the issue. If that is not the case, please move this issue back to the ASSIGNED state.

Comment 20 Jean-frederic Clere 2014-02-17 07:00:24 UTC

the issue also requires a fix in the C part (mod_clsuter-1.2.8.Final).

Comment 23 Rostislav Svoboda 2014-02-20 18:10:30 UTC

https://issues.jboss.org/browse/EAP6-172 not yet acked, removing ack

Comment 26 Michal Karm Babacek 2014-03-10 13:32:50 UTC

QA_ACK, thanks Hisanobu for the cooperation.

Comment 28 JBoss JIRA Server 2014-04-18 17:02:50 UTC

Michal Babacek <mbabacek> updated the status of jira MODCLUSTER-235 to Closed

Comment 29 Michal Karm Babacek 2014-06-18 14:56:56 UTC

Bad news: It's broken.

Good news: Here is a patch that fixes it: https://github.com/modcluster/mod_cluster/pull/95

What's wrong:

Hot-standby node appears as a "Load: -1" node. That's wrong, it must be Load: 0 so as to allow for forwarding requests to it in case no other nodes are available.

i.e.:

 * load > 0  : a load factor.
 * load = 0  : standby worker.
 * load = -1 : errored worker.
 * load = -2 : just do a cping/cpong.

Comment 30 Jean-frederic Clere 2014-06-18 15:33:07 UTC

The requests should be forwarded even the load is -1.

Comment 31 Michal Karm Babacek 2014-06-18 15:37:33 UTC

I get 

Balancer: qacluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 1,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1 

<title>503 Service Temporarily Unavailable</title>

with Load: -1

Comment 32 Radoslav Husar 2014-06-18 15:40:19 UTC

Hi Michal, could you elaborate how did you test? (I couldn't reproduce with master which should have the very similar code in this area).

Comment 33 Jean-frederic Clere 2014-06-18 15:51:03 UTC

See https://bugzilla.redhat.com/show_bug.cgi?id=1074550#c2

Comment 34 Michal Karm Babacek 2014-06-18 16:12:26 UTC

Sure, let's have a look:

- grab the closest RHEL7 x86_64 box (haven't tried to reproduce elsewhere yet)
- download these (sha1sum):

  29b1578ade041492cd18db9f225f4de1bf025a7f  jboss-eap-6.3.0.ER7.zip
  2ba91fed7bf2ea830e1469f66baeabb0dd05701e  httpd/jboss-ews-httpd-2.1.0-RHEL7-x86_64.zip
  9807dcabcb91f1f102e278d33f503c399f985264  jboss-eap-native-webserver-connectors-6.3.0.ER7-RHEL7-x86_64.zip

- start httpd, my config: 

MemManagerFile "/dev/shm/mod_cluster-eapx/jboss-ews-2.1/httpd/cache/mod_cluster"
ServerName 192.168.122.78:2181
<IfModule manager_module>
  Listen 192.168.122.78:8847
  LogLevel debug
  <VirtualHost 192.168.122.78:8847>
    ServerName 192.168.122.78:8847
    <Directory />
      Order deny,allow
      Deny from all
      Allow from all
    </Directory>
    KeepAliveTimeout 60
    MaxKeepAliveRequests 0
    ServerAdvertise on
    AdvertiseFrequency 5
    ManagerBalancerName qacluster
    AdvertiseGroup 224.0.5.12:65409
    EnableMCPMReceive
    <Location /mcm>
      SetHandler mod_cluster-manager
      Order deny,allow
      Deny from all
      Allow from all
    </Location>
  </VirtualHost>
</IfModule>

- configure two ordinary, standalone-ha.xml EAP instances and set them jvmRoutes
  <system-properties>
    <property name="jboss.mod_cluster.jvmRoute" value="jboss-eap-6.3"/>
    <property name="jboss.node.name" value="jboss-eap-6.3"/>
  </system-properties>
  <socket-binding name="modcluster" port="0" multicast-address="224.0.5.12" multicast-port="65409"/>

- configure one hot-standby one (jvmRoute set via property as above...)
  <subsystem xmlns="urn:jboss:domain:modcluster:1.2">
      <mod-cluster-config advertise-socket="modcluster" connector="ajp">
          <simple-load-provider factor="0"/>
       </mod-cluster-config>
  </subsystem>

- start it

- you have, e.g.:

Node jboss-eap-6.3 (ajp://192.168.122.78:8009): 
+++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 100 

Node jboss-eap-6.3-2 (ajp://192.168.122.78:8110): 
+++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 100 

Node jboss-eap-6.3-3 (ajp://192.168.122.78:8215): 
+++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1

- curling:

  repeatedly accessing: curl 'http://rhel7x86-64:8847/clusterbench/requestinfo;jsessionid=awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2';


JVM route: jboss-eap-6.3-2
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2
Session isNew: false

JVM route: jboss-eap-6.3-2
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2
Session isNew: false

JVM route: jboss-eap-6.3-2
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2
Session isNew: false

-- Node jboss-eap-6.3-2 stopped --

JVM route: jboss-eap-6.3
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3
Session isNew: false

JVM route: jboss-eap-6.3
Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3
Session isNew: false

-- Node jboss-eap-6.3 stopped --


<title>503 Service Temporarily Unavailable</title>
+++ SNAP +++
<address>Apache/2.2.26 (Red Hat Enterprise Web Server) Server at rhel7x86-64 Port 8847</address>

<title>503 Service Temporarily Unavailable</title>
+++ SNAP +++
<address>Apache/2.2.26 (Red Hat Enterprise Web Server) Server at rhel7x86-64 Port 8847</address>





The problem in the code is the lbfactor, as you might observe. Take a look at these macros and maybe run preprocessor so as to see what C code you are actually going to compile...

Comment 35 Michal Karm Babacek 2014-06-18 16:18:54 UTC

@All: Load: -1 for a hot-standby node is wrong, it shows that lbfactor is -1. -1 is an error code for lbfactor -- i.e. node won't be used at all.

Comment 36 Jean-frederic Clere 2014-06-19 17:34:29 UTC

+++
 Node 4e6189af-0502-3305-8ff3-fad7fee8b516 (ajp://127.0.0.1:8009):
Enable Contexts Disable Contexts Stop Contexts
Balancer: mycluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 1,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1
Virtual Host 1:
Contexts:

/compileFailure, Status: ENABLED Request: 0 Disable Stop
+++

+++
[jfclere@jfcpc APACHE-2.2.21]$ curl -v http://localhost:8080/compileFailure/
.....
HTTP/1.1 200 OK
....
+++

It works for me I don't understand....

Comment 37 Jean-frederic Clere 2014-06-19 17:59:40 UTC

I am using 2.2.21 it works.
2.2.26 it fails :-(
Something is wrong :-(

Comment 39 Jean-frederic Clere 2014-06-20 07:27:04 UTC

I have merged https://github.com/modcluster/mod_cluster/pull/95

I can't explain why it worked for me in 2.2.21... I was probably compiling with another httpd than 2.2.21 :-(

Comment 40 Kabir Khan 2014-06-23 09:03:13 UTC

Speaking to Jean-Frederic, this will be a native upgrade

Comment 42 Michal Karm Babacek 2014-07-02 10:59:45 UTC

This issue will be verified in ER9 as a part of mod_cluster 1.2.9.Final + comment 29 patch. There won't be any component upgrade to 1.2.10.Final.

Regarding documentation and release notes:

Please,
See BZ 1074550
See https://bugzilla.redhat.com/show_bug.cgi?id=1115083#c0, Paragraph 4.

This new feature should be mentioned in release notes as per BZ 1115083.

Comment 43 Michal Karm Babacek 2014-07-11 18:42:51 UTC

Uff...it's been a long journey :-)

Comment 44 Michal Karm Babacek 2014-07-11 18:43:41 UTC

EAP 6.3.0.ER9

Comment 45 Michal Karm Babacek 2014-07-11 19:29:04 UTC

I was too quick in my judgement in comment 43.

While RHEL, HP-UX and Solaris builds work, the Windows binaries still present the "Load: -1" error.
It looks like the patch wasn't applied on mod_cluster 1.2.9.Final on Windows :-(

Comment 46 Jean-frederic Clere 2014-07-15 08:01:54 UTC

According to my investigations in fact the patch is applied but some wrong occured in the production.
Could you please test with https://brewweb.devel.redhat.com/buildinfo?buildID=368750

Comment 47 Vaclav Tunka 2014-07-15 10:45:04 UTC

This is an RCM issue, I created RT for them:
#306213 Investigate: extreme repo-regen delays, bad NVRs picked by Brew/Win builds

It seems Brew incorrectly picked the old mod_cluster-native-1.2.9-4.Final.win6 version, even thought mod_cluster-native-1.2.9-5.Final.win6 was built for more than hour before the compose was run.

2014-07-08 09:45:44              mod_cluster-native-1.2.9-5.Final.win6 
Tue, 08 Jul 2014 10:58:10 EDT    jboss-eap-native-webserver-connectors-6.3.0-7.win6

Log for connectors:
http://download.devel.redhat.com/brewroot/packages/jboss-eap-native-webserver-connectors/6.3.0/7.win6/data/logs/win/build.log

Here you can see picked up version (wrong one):
2014-07-08 10:55:00,265 [INFO] koji.vm: Retrieved /tmp/build/buildreqs/mod_cluster-native/win/mod_cluster-native-1.2.9-4.Final.win6.x86_64.zip (70762 bytes, md5: 7c7280627c5021d9abdbe176c9f0d4a7)
2014-07-08 10:55:00,296 [INFO] koji.vm: Retrieved /tmp/build/buildreqs/mod_cluster-native/win/mod_cluster-native-1.2.9-4.Final.win6.i686.zip (66322 bytes, md5: baa6fd0347cb6cfe1fd3b9c8c74ecbc1)

In Brew/Win there is no way, how to specify exact version. Latest version is used all the time for dependencies.

I will respin the jboss-eap-native-webserver-connectors and run the compose again today.

Comment 48 Vaclav Tunka 2014-07-15 10:46:26 UTC

The same is true for the src compose:
http://download.devel.redhat.com/brewroot/packages/jboss-eap/6.3.0/9.win6/data/logs/win/build.log

I will run that one as well.

Comment 49 Vaclav Tunka 2014-07-15 13:10:36 UTC

Windows containers (now verified to contain mod_cluster-native-1.2.9-5.Final.win6)

jboss-eap-6.3.0-10.win6
https://brewweb.devel.redhat.com/buildinfo?buildID=369683

jboss-eap-native-webserver-connectors-6.3.0-8.win6
https://brewweb.devel.redhat.com/buildinfo?buildID=369682

top level compose - httpd (also contains mod_cluster-native)
jboss-eap6-httpd-natives-6.3.0-8.ep6.el6
https://brewweb.devel.redhat.com/buildinfo?buildID=369695

top level compose for EAP 6.x handoff (running)
https://brewweb.devel.redhat.com/taskinfo?taskID=7701185

Comment 50 Michal Karm Babacek 2014-07-18 10:54:58 UTC

It works on Solaris and Windows now.
Verified with EAP 6.3.0.ER10.

Comment 51 JBoss JIRA Server 2014-08-01 11:05:45 UTC

Michal Babacek <mbabacek> updated the status of jira EAP6-172 to Resolved

Comment 52 JBoss JIRA Server 2015-04-28 15:09:42 UTC

John Doyle <jdoyle> updated the status of jira EAP6-172 to Closed