https://issues.jboss.org/browse/ISPN-2781
http://www.qa.jboss.com/~mlinhard/hyperion3/run0051-elas-16-32-ER10/report/loganalysis/server/categories/cat8_entry0.txt
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2781 Assigning it to Dan. This seems related to changes in ISPN-2632 and ISPN-2738 whereby those server addresses that cannot be found in Hot Rod topology cache are still in the hashId/member cache distribution manager. This can result in null server address instances which is where the NPE comes from. [~dan.berindei], This is probably a timing issue between crashed member detection listener, which removes nodes from topology cache based on JGroups view changes, and hash topology header generation logic. This is probably left over from when the transport view id was considered as main indicator to know whether a new view had to be sent to clients. With switch to cache topology id, this needs adjusting. I still find this new logic a bit fragile, particularly checks like this: https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/main/scala/org/infinispan/server/hotrod/AbstractEncoder1x.scala#L180 Earlier versions of this code kept track of a volatile view id that only got updated once we knew the cache had the right contents (i.e. once a node had been added or removed), and only at that point the view id changed and a new view was detected/sent. The new logic that relies on double checking whether the cache has all the contents expected whenever a response has to be generated is IMO, more wasteful (needless collection contents checks) and more fragile (there's view/barrier/latch that marks that a new view is available and the raw data used to generate the view is reliable).
Dan Berindei <dberinde> made a comment on jira ISPN-2781 Galderz, yes, this is again a problem with the topology cache being modified while we are writing a topology response to the client. Earlier versions of the code had the same problem - we don't hold any lock while generating the client response, so anyone can modify the topology cache while we are generating the response, and this was always the case. Note that a new client will always require a topology response, regardless of when the topology has last changed. At one point I did add an extra check to skip a node's hash from the topology response if it wasn't in the topology cache any more, but I thought (wrongly) that the extra membership check that you pointed out made it impossible and I deleted it. I'm going to take a snapshot of the topology cache before generating the response and I'm going to use that copy for everything. Topology changes are very rare anyway, so it shouldn't make a difference performance-wise, and it will make the code easier to reason about.
Dan Berindei <dberinde> made a comment on jira ISPN-2781 Copy the address cache into a regular map before generating the topology response, to protect from changes from other threads. Send topology responses even if not all CH members are in the address cache. This allows the client to remove leaving HotRod servers from its CH.
ran one elasticity test: www.qa.jboss.com/~mlinhard/hyperion3/run0053-elas-16-32-ER12/report/loganalysis/server/ not present in JDG 6.1.0.ER12