Description of problem: Need hot-standby feature for mod_cluster Version-Release number of selected component (if applicable): EAP 6.x How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
John Doyle <jdoyle> made a comment on jira PRODMGT-494 We can look at this RFE, but we don't have a release for EAP that could deliver the capability by the October 2013 date specified by the user.
John Doyle <jdoyle> made a comment on jira PRODMGT-494 I see you have this in mod_cluster already and have been moving the target forward. Do you have a release in mind?
Hisanobu Okuda <hokuda> made a comment on jira PRODMGT-494 If you meant https://issues.jboss.org/browse/MODCLUSTER-235 , it is not implemented yet.
I think the feature is already there just use the node as: <simple-load-provider factor="0"/>
Do you know when that came into code so we can mark a fixed version? Or has it always been there?
In fact it is crippled by the description of the modcluster subsystem. "JBAS014708: 0 is an invalid value for parameter factor. A minimum value of 1 is required" That needs to be changed and the feature should be tested.
Jean-Frederic Clere <jfclere> updated the status of jira MODCLUSTER-235 to Resolved
https://issues.jboss.org/browse/EAP6-172 was filed for this request.
mod_cluster won't any request to a hot standby node except: 1 - The node is changed to a normal node (factor > 0) 2 - All the other nodes are in error or have be removed Note that as soon as another node starts the request will be directed to the new node.
It is not clear to me if https://github.com/jbossas/jboss-eap/pull/916 completely solves the issue. If that is not the case, please move this issue back to the ASSIGNED state.
the issue also requires a fix in the C part (mod_clsuter-1.2.8.Final).
https://issues.jboss.org/browse/EAP6-172 not yet acked, removing ack
QA_ACK, thanks Hisanobu for the cooperation.
Michal Babacek <mbabacek> updated the status of jira MODCLUSTER-235 to Closed
Bad news: It's broken. Good news: Here is a patch that fixes it: https://github.com/modcluster/mod_cluster/pull/95 What's wrong: Hot-standby node appears as a "Load: -1" node. That's wrong, it must be Load: 0 so as to allow for forwarding requests to it in case no other nodes are available. i.e.: * load > 0 : a load factor. * load = 0 : standby worker. * load = -1 : errored worker. * load = -2 : just do a cping/cpong.
The requests should be forwarded even the load is -1.
I get Balancer: qacluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 1,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1 <title>503 Service Temporarily Unavailable</title> with Load: -1
Hi Michal, could you elaborate how did you test? (I couldn't reproduce with master which should have the very similar code in this area).
See https://bugzilla.redhat.com/show_bug.cgi?id=1074550#c2
Sure, let's have a look: - grab the closest RHEL7 x86_64 box (haven't tried to reproduce elsewhere yet) - download these (sha1sum): 29b1578ade041492cd18db9f225f4de1bf025a7f jboss-eap-6.3.0.ER7.zip 2ba91fed7bf2ea830e1469f66baeabb0dd05701e httpd/jboss-ews-httpd-2.1.0-RHEL7-x86_64.zip 9807dcabcb91f1f102e278d33f503c399f985264 jboss-eap-native-webserver-connectors-6.3.0.ER7-RHEL7-x86_64.zip - start httpd, my config: MemManagerFile "/dev/shm/mod_cluster-eapx/jboss-ews-2.1/httpd/cache/mod_cluster" ServerName 192.168.122.78:2181 <IfModule manager_module> Listen 192.168.122.78:8847 LogLevel debug <VirtualHost 192.168.122.78:8847> ServerName 192.168.122.78:8847 <Directory /> Order deny,allow Deny from all Allow from all </Directory> KeepAliveTimeout 60 MaxKeepAliveRequests 0 ServerAdvertise on AdvertiseFrequency 5 ManagerBalancerName qacluster AdvertiseGroup 224.0.5.12:65409 EnableMCPMReceive <Location /mcm> SetHandler mod_cluster-manager Order deny,allow Deny from all Allow from all </Location> </VirtualHost> </IfModule> - configure two ordinary, standalone-ha.xml EAP instances and set them jvmRoutes <system-properties> <property name="jboss.mod_cluster.jvmRoute" value="jboss-eap-6.3"/> <property name="jboss.node.name" value="jboss-eap-6.3"/> </system-properties> <socket-binding name="modcluster" port="0" multicast-address="224.0.5.12" multicast-port="65409"/> - configure one hot-standby one (jvmRoute set via property as above...) <subsystem xmlns="urn:jboss:domain:modcluster:1.2"> <mod-cluster-config advertise-socket="modcluster" connector="ajp"> <simple-load-provider factor="0"/> </mod-cluster-config> </subsystem> - start it - you have, e.g.: Node jboss-eap-6.3 (ajp://192.168.122.78:8009): +++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 100 Node jboss-eap-6.3-2 (ajp://192.168.122.78:8110): +++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: 100 Node jboss-eap-6.3-3 (ajp://192.168.122.78:8215): +++ SNAP +++ Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1 - curling: repeatedly accessing: curl 'http://rhel7x86-64:8847/clusterbench/requestinfo;jsessionid=awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2'; JVM route: jboss-eap-6.3-2 Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2 Session isNew: false JVM route: jboss-eap-6.3-2 Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2 Session isNew: false JVM route: jboss-eap-6.3-2 Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3-2 Session isNew: false -- Node jboss-eap-6.3-2 stopped -- JVM route: jboss-eap-6.3 Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3 Session isNew: false JVM route: jboss-eap-6.3 Session ID: awyGRY5SLkVYqhQuhSoHqwGp.jboss-eap-6.3 Session isNew: false -- Node jboss-eap-6.3 stopped -- <title>503 Service Temporarily Unavailable</title> +++ SNAP +++ <address>Apache/2.2.26 (Red Hat Enterprise Web Server) Server at rhel7x86-64 Port 8847</address> <title>503 Service Temporarily Unavailable</title> +++ SNAP +++ <address>Apache/2.2.26 (Red Hat Enterprise Web Server) Server at rhel7x86-64 Port 8847</address> The problem in the code is the lbfactor, as you might observe. Take a look at these macros and maybe run preprocessor so as to see what C code you are actually going to compile...
@All: Load: -1 for a hot-standby node is wrong, it shows that lbfactor is -1. -1 is an error code for lbfactor -- i.e. node won't be used at all.
+++ Node 4e6189af-0502-3305-8ff3-fad7fee8b516 (ajp://127.0.0.1:8009): Enable Contexts Disable Contexts Stop Contexts Balancer: mycluster,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: 1,Ttl: 60000000,Status: OK,Elected: 0,Read: 0,Transferred: 0,Connected: 0,Load: -1 Virtual Host 1: Contexts: /compileFailure, Status: ENABLED Request: 0 Disable Stop +++ +++ [jfclere@jfcpc APACHE-2.2.21]$ curl -v http://localhost:8080/compileFailure/ ..... HTTP/1.1 200 OK .... +++ It works for me I don't understand....
I am using 2.2.21 it works. 2.2.26 it fails :-( Something is wrong :-(
I have merged https://github.com/modcluster/mod_cluster/pull/95 I can't explain why it worked for me in 2.2.21... I was probably compiling with another httpd than 2.2.21 :-(
Speaking to Jean-Frederic, this will be a native upgrade
This issue will be verified in ER9 as a part of mod_cluster 1.2.9.Final + comment 29 patch. There won't be any component upgrade to 1.2.10.Final. Regarding documentation and release notes: Please, See BZ 1074550 See https://bugzilla.redhat.com/show_bug.cgi?id=1115083#c0, Paragraph 4. This new feature should be mentioned in release notes as per BZ 1115083.
Uff...it's been a long journey :-)
EAP 6.3.0.ER9
I was too quick in my judgement in comment 43. While RHEL, HP-UX and Solaris builds work, the Windows binaries still present the "Load: -1" error. It looks like the patch wasn't applied on mod_cluster 1.2.9.Final on Windows :-(
According to my investigations in fact the patch is applied but some wrong occured in the production. Could you please test with https://brewweb.devel.redhat.com/buildinfo?buildID=368750
This is an RCM issue, I created RT for them: #306213 Investigate: extreme repo-regen delays, bad NVRs picked by Brew/Win builds It seems Brew incorrectly picked the old mod_cluster-native-1.2.9-4.Final.win6 version, even thought mod_cluster-native-1.2.9-5.Final.win6 was built for more than hour before the compose was run. 2014-07-08 09:45:44 mod_cluster-native-1.2.9-5.Final.win6 Tue, 08 Jul 2014 10:58:10 EDT jboss-eap-native-webserver-connectors-6.3.0-7.win6 Log for connectors: http://download.devel.redhat.com/brewroot/packages/jboss-eap-native-webserver-connectors/6.3.0/7.win6/data/logs/win/build.log Here you can see picked up version (wrong one): 2014-07-08 10:55:00,265 [INFO] koji.vm: Retrieved /tmp/build/buildreqs/mod_cluster-native/win/mod_cluster-native-1.2.9-4.Final.win6.x86_64.zip (70762 bytes, md5: 7c7280627c5021d9abdbe176c9f0d4a7) 2014-07-08 10:55:00,296 [INFO] koji.vm: Retrieved /tmp/build/buildreqs/mod_cluster-native/win/mod_cluster-native-1.2.9-4.Final.win6.i686.zip (66322 bytes, md5: baa6fd0347cb6cfe1fd3b9c8c74ecbc1) In Brew/Win there is no way, how to specify exact version. Latest version is used all the time for dependencies. I will respin the jboss-eap-native-webserver-connectors and run the compose again today.
The same is true for the src compose: http://download.devel.redhat.com/brewroot/packages/jboss-eap/6.3.0/9.win6/data/logs/win/build.log I will run that one as well.
Windows containers (now verified to contain mod_cluster-native-1.2.9-5.Final.win6) jboss-eap-6.3.0-10.win6 https://brewweb.devel.redhat.com/buildinfo?buildID=369683 jboss-eap-native-webserver-connectors-6.3.0-8.win6 https://brewweb.devel.redhat.com/buildinfo?buildID=369682 top level compose - httpd (also contains mod_cluster-native) jboss-eap6-httpd-natives-6.3.0-8.ep6.el6 https://brewweb.devel.redhat.com/buildinfo?buildID=369695 top level compose for EAP 6.x handoff (running) https://brewweb.devel.redhat.com/taskinfo?taskID=7701185
It works on Solaris and Windows now. Verified with EAP 6.3.0.ER10.
Michal Babacek <mbabacek> updated the status of jira EAP6-172 to Resolved
John Doyle <jdoyle> updated the status of jira EAP6-172 to Closed