Bug 959638

Summary: EjbClient: No cluster node manager found for node XY during server restart
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Jitka Kozana <jkudrnac>
Component: EJBAssignee: Tomas Hofman <thofman>
Status: CLOSED CURRENTRELEASE QA Contact: Jitka Kozana <jkudrnac>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1.0, 6.1.1CC: cdewolf, jkudrnac, jmartisk, lthon, rjanik, rsvoboda, thofman
Target Milestone: CR1   
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1198231    
Attachments:
Description Flags
jboss-ejb-client.properties used in the test none

Description Jitka Kozana 2013-05-04 13:31:25 UTC
Situation: 4-node cluster, one node is failed at a time, during the whole test, clients are accessing stateless clustered EJB3. The failure type in this particular case is simulating server crash: JVM is killed using kill -9.
When the server was starting, being brought back, we saw this error: 
16:05:29,966 ERROR [org.jboss.ejb.client.ClusterContext] (ejb-client-cluster-node-connection-creation-2-thread-17) Cannot create EJBReceiver since no cluster node manager found for node perf20 in cluster context for cluster ejb

Here is the log output again with context, the server was starting: 
16:05:29,966 INFO  [org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler] (Remoting "config-based-ejb-client-endpoint" task-2) Retrying invocation which failed on node perf20 with exception:
javax.ejb.NoSuchEJBException: No such EJB[appname=clusterbench-ee6,modulename=clusterbench-ee6-ejb,distinctname=,beanname=RemoteStatelessSBImpl]
	at org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler.processMessage(NoSuchEJBExceptionResponseHandler.java:64)
	at org.jboss.ejb.client.remoting.ChannelAssociation.processResponse(ChannelAssociation.java:366)
	at org.jboss.ejb.client.remoting.ChannelAssociation$ResponseReceiver.handleMessage(ChannelAssociation.java:458)
	at org.jboss.remoting3.remote.RemoteConnectionChannel$4.run(RemoteConnectionChannel.java:373)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
16:05:29,966 ERROR [org.jboss.ejb.client.ClusterContext] (ejb-client-cluster-node-connection-creation-2-thread-17) Cannot create EJBReceiver since no cluster node manager found for node perf20 in cluster context for cluster ejb
16:05:29,974 INFO  [org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler] (Remoting "config-based-ejb-client-endpoint" task-2) Retrying invocation which failed on node perf20 with exception:
javax.ejb.NoSuchEJBException: No such EJB[appname=clusterbench-ee6,modulename=clusterbench-ee6-ejb,distinctname=,beanname=RemoteStatelessSBImpl]
	at org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler.processMessage(NoSuchEJBExceptionResponseHandler.java:64)
	at org.jboss.ejb.client.remoting.ChannelAssociation.processResponse(ChannelAssociation.java:366)
	at org.jboss.ejb.client.remoting.ChannelAssociation$ResponseReceiver.handleMessage(ChannelAssociation.java:458)
	at org.jboss.remoting3.remote.RemoteConnectionChannel$4.run(RemoteConnectionChannel.java:373)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
16:05:29,974 ERROR [org.jboss.ejb.client.ClusterContext] (ejb-client-cluster-node-connection-creation-2-thread-18) Cannot create EJBReceiver since no cluster node manager found for node perf20 in cluster context for cluster ejb
16:05:29,975 INFO  [org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler] (Remoting "config-based-ejb-client-endpoint" task-2) Retrying invocation which failed on node perf20 with exception:
javax.ejb.NoSuchEJBException: No such EJB[appname=clusterbench-ee6,modulename=clusterbench-ee6-ejb,distinctname=,beanname=RemoteStatelessSBImpl]
	at org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler.processMessage(NoSuchEJBExceptionResponseHandler.java:64)
	at org.jboss.ejb.client.remoting.ChannelAssociation.processResponse(ChannelAssociation.java:366)
	at org.jboss.ejb.client.remoting.ChannelAssociation$ResponseReceiver.handleMessage(ChannelAssociation.java:458)
	at org.jboss.remoting3.remote.RemoteConnectionChannel$4.run(RemoteConnectionChannel.java:373)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
16:05:29,975 INFO  [org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler] (Remoting "config-based-ejb-client-endpoint" task-2) Retrying invocation which failed on node perf20 with exception:
javax.ejb.NoSuchEJBException: No such EJB[appname=clusterbench-ee6,modulename=clusterbench-ee6-ejb,distinctname=,beanname=RemoteStatelessSBImpl]
	at org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler.processMessage(NoSuchEJBExceptionResponseHandler.java:64)
	at org.jboss.ejb.client.remoting.ChannelAssociation.processResponse(ChannelAssociation.java:366)
	at org.jboss.ejb.client.remoting.ChannelAssociation$ResponseReceiver.handleMessage(ChannelAssociation.java:458)
	at org.jboss.remoting3.remote.RemoteConnectionChannel$4.run(RemoteConnectionChannel.java:373)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

During the whole test, there were 4 server crashes and startups (1 for each node), but only 3 occurences of the above mentioned error. These cluster nodes are perf18, perf19, perf20, perf21, but this error was seen only for perf20 (two occurences) and perf18 (one occurence).
I did not find anything suspicious in the server.log of perf20.

Cache: REPL_ASYNC

Versions: EAP 6.1.0.ER6, ejb-client 1.0.19.Final

Link to hudson job: 
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbstateless-jvmkill-repl-async/15/

Server log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbstateless-jvmkill-repl-async/15/artifact/report/config/jboss-perf20/server.log

Comment 1 Jaikiran Pai 2013-05-05 02:57:42 UTC
Are the clients standalone applications or do the server nodes act as client themselves? What do the jboss-ejb-client.properties and jboss-ejb-client.xml (if any) look like? I couldn't find that in the "configs" that are published for that job.

Comment 2 Jitka Kozana 2013-05-05 07:17:33 UTC
The clients are standalone clients. 

I am attaching the jboss-ejb-client.properties.

Comment 3 Jitka Kozana 2013-05-05 07:18:43 UTC
Created attachment 743698 [details]
jboss-ejb-client.properties used in the test

Comment 4 Jitka Kozana 2013-05-05 08:09:12 UTC
I would like to add: this is not a release blocker. 

This error shows up *sometimes* in client log, the client does not seem to be affected in any other way that just seeing this error.

After the cluster node goes up completely (eg the startup is completed), the invocations are OK.

Comment 5 Rostislav Svoboda 2013-05-05 08:13:09 UTC
Adding standard flags, jboss‑eap‑6.1.0 to ? should assure this won't get lost for eap 6.1.x or 6.2.x. BZ should migrate automatically open issues to new flag after EAP 6.1.0 is released.

Comment 6 Jitka Kozana 2013-05-15 06:05:42 UTC
This was seen again during 6.1.0.ER8 testing:

13:21:49,431 INFO  [org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler] (Remoting "config-based-ejb-client-endpoint" task-2) Retrying invocation which failed on node perf18 with exception:
javax.ejb.NoSuchEJBException: No such EJB[appname=clusterbench-ee6,modulename=clusterbench-ee6-ejb,distinctname=,beanname=RemoteStatelessSBImpl]
	at org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler.processMessage(NoSuchEJBExceptionResponseHandler.java:64)
	at org.jboss.ejb.client.remoting.ChannelAssociation.processResponse(ChannelAssociation.java:366)
	at org.jboss.ejb.client.remoting.ChannelAssociation$ResponseReceiver.handleMessage(ChannelAssociation.java:458)
	at org.jboss.remoting3.remote.RemoteConnectionChannel$4.run(RemoteConnectionChannel.java:373)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
13:21:49,432 ERROR [org.jboss.ejb.client.ClusterContext] (ejb-client-cluster-node-connection-creation-2-thread-10) Cannot create EJBReceiver since no cluster node manager found for node perf18 in cluster context for cluster ejb
13:21:49,432 ERROR [org.jboss.ejb.client.ClusterContext] (ejb-client-cluster-node-connection-creation-2-thread-11) Cannot create EJBReceiver since no cluster node manager found for node perf18 in cluster context for cluster ejb
13:21:49,432 INFO  [org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler] (Remoting "config-based-ejb-client-endpoint" task-2) Retrying invocation which failed on node perf18 with exception:
javax.ejb.NoSuchEJBException: No such EJB[appname=clusterbench-ee6,modulename=clusterbench-ee6-ejb,distinctname=,beanname=RemoteStatelessSBImpl]
	at org.jboss.ejb.client.remoting.NoSuchEJBExceptionResponseHandler.processMessage(NoSuchEJBExceptionResponseHandler.java:64)
	at org.jboss.ejb.client.remoting.ChannelAssociation.processResponse(ChannelAssociation.java:366)
	at org.jboss.ejb.client.remoting.ChannelAssociation$ResponseReceiver.handleMessage(ChannelAssociation.java:458)
	at org.jboss.remoting3.remote.RemoteConnectionChannel$4.run(RemoteConnectionChannel.java:373)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)

Besides the same test, where it was seen for the first time ([1]), we now saw it in test, where failure type was graceful shutdown and clients were accessing SFSB: [2].

[1] https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbstateless-jvmkill-repl-async/18/console-perf17
[2]https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-repl-sync/24/console-perf17/

Comment 7 Jitka Kozana 2013-08-22 07:12:52 UTC
Seen again during EAP 6.1.1.ER7 testing, even with graceful shutdown.

Link to log:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-repl-sync/29/console-perf17

Link to job (the server logs and configuration are archived as builds artifacts):

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-repl-sync/29/

Comment 9 Dimitris Andreadis 2013-10-24 18:27:25 UTC
Assigning jpai EJB issues to david.lloyd. Please re-assign to Cheng or others as needed.

Comment 10 Dominik Pospisil 2014-09-10 12:42:23 UTC
Still reproducible on 6.3.0.ER7.

Comment 11 Dominik Pospisil 2014-09-10 13:04:26 UTC
Could you please provide working CI links?

Comment 12 Tomas Hofman 2015-02-20 13:40:03 UTC
This is a valid behaviour.

Error message is printed if this situation:

1) EJB client receives message with current list of nodes
2) ClusterTopologyMessageHandler creates node managers and these are added to ClusterContext
3) asynchronous association tasks are created for new node managers

but then:

4) node is killed
5) EJB client receives message to remove the node, so relevant node manager is removed from context
6) association task for given node manager is finally executed and checks if the node manager is still in the context
7) it is not, so it prints error message and terminates

As this is expected situation, I propose changing the error message to warning or remove it altogether.

Comment 13 Tomas Hofman 2015-03-09 11:02:01 UTC
Fixed in 1.0.30.Final.

Comment 15 Ladislav Thon 2015-04-01 07:41:24 UTC
Verified with EAP 6.4.0.CR2.