Description of problem: This problem occurs only when two or more new storage nodes are being deployed parallely. Sequential deployment (wait until previous storage node is joined - status NORMAL) works correctly. Version-Release number of selected component (if applicable): Version : 3.2.0.GA Update 02 Build Number : 055b880:0620403 How reproducible: Always Steps to Reproduce: 1. jon server, storage node and agent are installed and running on server1 2. install second storage node (do not start it) on server2 3. install third storage node (do not start it) on server3 4. start both storage nodes on server2 and server3 5. run 'Manual Autodiscovery' operation on platform resources for server2 and server3 Actual results: - both storage nodes are JOINING - both storage nodes are NORMAL in a while - both storage nodes throw a lot of (each milisec) following messages to rhq-storage.log: INFO [HANDSHAKE-/10.16.23.185] 2014-06-09 08:24:07,355 OutboundTcpConnection.java (line 399) Handshaking version with /10.16.23.185 TRACE [HANDSHAKE-/10.16.23.185] 2014-06-09 08:24:07,356 OutboundTcpConnection.java (line 406) Cannot handshake version with /10.16.23.185 note that they are trying the handshake with each other so 10.16.23.185 is ip of server3 and storage log on server3 contains the same message exept an ip which points to server2. Expected results: Handshake should be successful Additional info: trace level exception: INFO [HANDSHAKE-/10.16.23.185] 2014-06-09 08:24:07,344 OutboundTcpConnection.java (line 399) Handshaking version with /10.16.23.185 TRACE [HANDSHAKE-/10.16.23.185] 2014-06-09 08:24:07,344 OutboundTcpConnection.java (line 406) Cannot handshake version with /10.16.23.185 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:203) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:400) DEBUG [WRITE-/10.16.23.185] 2014-06-09 08:24:07,344 OutboundTcpConnection.java (line 338) Target max version is -2147483648; no version information yet, will retry TRACE [WRITE-/10.16.23.185] 2014-06-09 08:24:07,345 MessagingService.java (line 826) Assuming current protocol version for /10.16.23.185 INFO [HANDSHAKE-/10.16.23.185] 2014-06-09 08:24:07,345 OutboundTcpConnection.java (line 399) Handshaking version with /10.16.23.185 TRACE [HANDSHAKE-/10.16.23.185] 2014-06-09 08:24:07,345 OutboundTcpConnection.java (line 406) Cannot handshake version with /10.16.23.185 java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:400) The issue doesn't disapear even when both storage nodes are restarted.
I think that the safe, conservative approach is to only allow one node to be deployed at a time in order to avoid problems like schema disagreement. There is currently no mechanism in place to prevent multiple deployments being done simultaneously. I considered implemented some optimistic locking in 3.2.0, but there was not enough time. This (the locking) can probably be done for 3.3.0 because it also affects bug 1102887 and bug 1103841.
Bumping the target release due to time constraints. Work has been started though in the storage_workflow branch.
JBoss ON is coming to the end of its product life cycle. For more information regarding this transition, see https://access.redhat.com/articles/3827121. This bug report/request is being closed. If you feel this issue should not be closed or requires further review, please create a new bug report against the latest supported JBoss ON 3.3 version.