Description of problem: Under normal circumstances a restart of one node will signal to the client that it becomes unavailable and available. But if the client-server try to invoke the EJB during the restart phase, it seem to be after the server-core (JGroups) is up but the application is not complete deployed, the client notice that the server is up but there is no application and it gets dropped from the list of the cluster forever.
Wolf Just to be clear, is this the scenario in which you see this problem occurring? - EJBClient interacting with a cluster X with members {A,B} - application Y deployed on nodes A,B - before the restart of node B: -- invocations for application Y successfully processed on nodes A,B - after the restart of node B: -- invocations only processed on node A, but not B In other words, this is a problem which (i) persists across invocations and (ii) is specific to invocations on clusters?
The cluster and application is correct deployed on both nodes. - load will be balanced to both nodes - node B will be restarted - invocation is continued (with high frequency) to node A - seems like server is started - but node selection will see node B before app startup has been finished - result is that nodeB will be removed from the cluster view this seems like a race condition during startup, I try to reproduce it but if there is no invocation for the EJB the restart works correct.
Thanks for the clarification.
Wolf, i'm going to have to create a test case to be able to reduce this issue for debugging. I tried looking through the code but there are two many places where things could go wrong. Because you say the load is balanced on both nodes, I assume that your example makes use of a SLSB application?
By the way, there is a test case which tests against exactly this issue in the EAP 6.3.0 testsuite: org.jboss.as.test.clustering.cluster.ejb3.stateless.RemoteStatelessFailoverTestCase.testLoadBalancing() But in running the test case locally, it cliams to pass, but it does not seem to function as expected - i'm not seeing output where there should be. Currently trying to find out why.
OK, it seems the test case works fine as long as you run it as part of the larger suite; using test= has a problem with output collection. The current test testLoadBalancing() actually stops client invocations while the server is being stopped and restarted. So i'm going to create an additional test case called testLoadBalancingUnderContinousLoad() which keeps the load ongoing while the server is stopped and restarted to try to reproduce the problem.
Wolf, I created a modified version of the load balancing test where the load to be balanced is not stopped during server shutdown / restart but maintained. I have run this modified test several times and can't reproduce the problem on EAP 6.3.0. I'm attaching a log of the latest run of the test. The frequency of invocations from the client is as fast as possible on a sequential basis: as soon as one invocation returns, we fire of another one. You can see from the logs that normal load balancing returns after the server comes back up.
Created attachment 994055 [details] Output from modified test case This test case output shows how a two node cluster behaves under load while one node is taken down and then brought up again.
I think I have found the root cause of the problem: A) Invocation where app-oneA is the only node (after stop of app-oneB and before restart) B) invocation where cluster topology update and module availability update arrive (the restart happened shortly before) C) invocation where both nodes now have ClusterNodeManagers, but one is marked as not available for the locator and is excluded D) invocation where app-oneA is the only node ------------------ (A) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@137bb0ff has selected node master:app-oneA, in cluster ejb ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation ---------------------- (B) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@137bb0ff has selected node master:app-oneA, in cluster ejb ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) Calling clusterTopologyMessageHandler Received a cluster node(s) addition message, for cluster named ejb with 1 nodes [ClusterNode{clusterName='ejb', nodeName='master:app-oneB', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='127.0.0.1', destinationPort=5147}], resolvedDestination=[Destination address=127.0.0.1, destination port=5147]}] Calling addNodesToClusterContext Calling addClusterNodes: Adding cluster node manager for node: master:app-oneB Received module availability report for 1 modules Registering module EJBModuleIdentifier{appName='jboss-ejb-multi-server-app-one', moduleName='jboss-ejb-multi-server-app-one', distinctName=''} availability for receiver context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@69f75644, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection@6f110f5c,channel=jboss.ejb,nodename=master:app-oneB]} master:app-oneA ClientAppOne: end invocation ------------------ (C) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA, master:app-oneB]available nodes: [master:app-oneA, master:app-oneB]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@137bb0ff has selected node master:app-oneB, in cluster ejb ClusterContext.getEJBReceiverContext: Ignoring node master:app-oneB since it cannot handle appName=jboss-ejb-multi-server-app-one,moduleName=ejb,distinct-name= Calling clusterTopologyMessageHandler Received a cluster node(s) addition message, for cluster named ejb with 2 nodes [ClusterNode{clusterName='ejb', nodeName='master:app-oneA', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='127.0.0.1', destinationPort=4547}], resolvedDestination=[Destination address=127.0.0.1, destination port=4547]}, ClusterNode{clusterName='ejb', nodeName='master:app-oneB', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='127.0.0.1', destinationPort=5147}], resolvedDestination=[Destination address=127.0.0.1, destination port=5147]}] Calling addNodesToClusterContext Calling addClusterNodes: Adding cluster node manager for node: master:app-oneA Skipping duplicate addition of cluster node Adding cluster node manager for node: master:app-oneB Skipping duplicate addition of cluster node ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [master:app-oneB] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@137bb0ff has selected node master:app-oneA, in cluster ejb ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation --------------- (D) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@137bb0ff has selected node master:app-oneA, in cluster ejb ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@137bb0ff has selected node master:app-oneA, in cluster ejb ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation
In fact, this example shows it a little better: there are two separate module availability reports which arrive from the server after restart: (i) appName=jboss-ejb-multi-server-app-one,moduleName=jboss-ejb-multi-server-app-one and (ii) appName=jboss-ejb-multi-server-app-one,moduleName=ejb The first one arrives with the cluster topology message (see A), the second arrives much later (see E). The cluster topology message ensures that a ClusterNodeManager is installed, and because of this, app-oneB is actually chosen to handle the invocation (see C). But because its module availability report has not yet arrived, it gets excluded from the invocation (and it seems, from the EJBClientContext itself. This shouldn't happen. -------------------- (A) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@1b998709 has selected node master:app-oneA, in cluster ejb ClusterContext.getEJBReceiverContext: got non-null EJBReceiverContext for node master:app-oneA EJBLocator = StatelessEJBLocator{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName='', beanName='AppOneBean', view='interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne'} EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = true ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) Calling clusterTopologyMessageHandler Received a cluster node(s) addition message, for cluster named ejb with 1 nodes [ClusterNode{clusterName='ejb', nodeName='master:app-oneB', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='127.0.0.1', destinationPort=5147}], resolvedDestination=[Destination address=127.0.0.1, destination port=5147]}] Calling addNodesToClusterContext Calling addClusterNodes: Adding cluster node manager for node: master:app-oneB Received module availability report for 1 modules Registering module EJBModuleIdentifier{appName='jboss-ejb-multi-server-app-one', moduleName='jboss-ejb-multi-server-app-one', distinctName=''} availability for receiver context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@69bcf8f6, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection@786de56e,channel=jboss.ejb,nodename=master:app-oneB]} EJBReceiver: registerModule: appName = jboss-ejb-multi-server-app-one : moduleName = jboss-ejb-multi-server-app-one : distinctName = master:app-oneA ClientAppOne: end invocation -------------------- (B) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA, master:app-oneB]available nodes: [master:app-oneA, master:app-oneB]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@1b998709 has selected node master:app-oneA, in cluster ejb ClusterContext.getEJBReceiverContext: got non-null EJBReceiverContext for node master:app-oneA EJBLocator = StatelessEJBLocator{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName='', beanName='AppOneBean', view='interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne'} EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = true ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation ------------------ (C) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA, master:app-oneB]available nodes: [master:app-oneA, master:app-oneB]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@1b998709 has selected node master:app-oneB, in cluster ejb ClusterContext.getEJBReceiverContext: got non-null EJBReceiverContext for node master:app-oneB EJBLocator = StatelessEJBLocator{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName='', beanName='AppOneBean', view='interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne'} Calling clusterTopologyMessageHandler Received a cluster node(s) addition message, for cluster named ejb with 2 nodes [ClusterNode{clusterName='ejb', nodeName='master:app-oneA', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='127.0.0.1', destinationPort=4547}], resolvedDestination=[Destination address=127.0.0.1, destination port=4547]}, ClusterNode{clusterName='ejb', nodeName='master:app-oneB', clientMappings=[ClientMapping{sourceNetworkAddress=/0:0:0:0:0:0:0:0, sourceNetworkMaskBits=0, destinationAddress='127.0.0.1', destinationPort=5147}], resolvedDestination=[Destination address=127.0.0.1, destination port=5147]}] Calling addNodesToClusterContext Calling addClusterNodes: Adding cluster node manager for node: master:app-oneA Skipping duplicate addition of cluster node Adding cluster node manager for node: master:app-oneB Skipping duplicate addition of cluster node EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = false ClusterContext.getEJBReceiverContext: Ignoring node master:app-oneB since it cannot handle appName=jboss-ejb-multi-server-app-one,moduleName=ejb,distinct-name= EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = false Retrying receiver selection in cluster ejb with excluded nodes [master:app-oneB] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [master:app-oneB] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@1b998709 has selected node master:app-oneA, in cluster ejb ClusterContext.getEJBReceiverContext: got non-null EJBReceiverContext for node master:app-oneA EJBLocator = StatelessEJBLocator{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName='', beanName='AppOneBean', view='interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne'} EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = true ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation -------------------- (D) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@1b998709 has selected node master:app-oneA, in cluster ejb ClusterContext.getEJBReceiverContext: got non-null EJBReceiverContext for node master:app-oneA EJBLocator = StatelessEJBLocator{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName='', beanName='AppOneBean', view='interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne'} EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = true ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) master:app-oneA ClientAppOne: end invocation -------------------- much later ... ---------------- (E) ClientAppOne: begin invocation EJBInvocationHandler: sendRequestWithRetries(first) EJBClientInvocationContext: sendRequest(0) Affinity: NONE / Weak Affinity: clusterAffinity with cluster ejb ReceiverInterceptor: getting EJBReceiverContext Got cluster context for cluster name: ejb ClusterContext: getting EJBReceiverContext: excludedNodes (from invocationContext) = [] ClusterContext:getEJBReceiverContext(9): nodeManagers: = [master:app-oneA]available nodes: [master:app-oneA]; excludedNodes = [] ClusterContext:getEJBReceiverContext:org.jboss.ejb.client.RandomClusterNodeSelector@1b998709 has selected node master:app-oneA, in cluster ejb ClusterContext.getEJBReceiverContext: got non-null EJBReceiverContext for node master:app-oneA EJBLocator = StatelessEJBLocator{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName='', beanName='AppOneBean', view='interface org.jboss.as.quickstarts.ejb.multi.server.app.AppOne'} EJBReceiver: acceptsModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = -> result = true ClusterContext: got EJBReceiverContext ReceiverInterceptor: got EJBReceiverContext (clusterContext) EJBClientInvocationContext: sendRequest(1) EJBClientInvocationContext: sendRequest(2) EJBClientInvocationContext: sendRequest(3) EJBClientInvocationContext: sendRequest(4) Received module availability report for 1 modules Registering module EJBModuleIdentifier{appName='jboss-ejb-multi-server-app-one', moduleName='ejb', distinctName=''} availability for receiver context EJBReceiverContext{clientContext=org.jboss.ejb.client.EJBClientContext@69bcf8f6, receiver=Remoting connection EJB receiver [connection=org.jboss.ejb.client.remoting.ConnectionPool$PooledConnection@786de56e,channel=jboss.ejb,nodename=master:app-oneB]} EJBReceiver: registerModule: appName = jboss-ejb-multi-server-app-one : moduleName = ejb : distinctName = master:app-oneA ClientAppOne: end invocation
Why was this not linked to the upgrade until *after* it had been merged? We can't sneakily put things into upgrades at this stage, everything needs to be transparent!
We believe that the fix for this bug causes a behavioral regression and we filed a bug 1204055 for it. For the same reason, we want to keep this bug ON_QA for the time being.
As stated in comment #16, this fix causes a regression, returning this BZ to assigned.
Carlo de Wolf <cdewolf> updated the status of jira EJBCLIENT-132 to Reopened
David Lloyd <david.lloyd> updated the status of jira EJBCLIENT-132 to Resolved
qa_acking as we have one-off BZ 1200406 ready for customer - one-off was verified on 2015-05-04
change is in repo.
Verified with EAP 6.4.1.CP.CR2.
Retroactively bulk-closing issues from released EAP 6.4 cummulative patches.