Bug 907496 - NPE in AbstractTopologyAwareEncoder1x
Summary: NPE in AbstractTopologyAwareEncoder1x
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ER12
: 6.1.0
Assignee: Tristan Tarrant
QA Contact: Nobody
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-04 14:55 UTC by Michal Linhard
Modified: 2014-03-17 04:04 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-2781 0 Major Resolved NPE in AbstractTopologyAwareEncoder1x 2013-02-26 12:51:47 UTC

Description Michal Linhard 2013-02-04 14:55:34 UTC
https://issues.jboss.org/browse/ISPN-2781

Comment 2 JBoss JIRA Server 2013-02-14 10:41:40 UTC
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2781

Assigning it to Dan. This seems related to changes in ISPN-2632 and ISPN-2738 whereby those server addresses that cannot be found in Hot Rod topology cache are still in the hashId/member cache distribution manager. This can result in null server address instances which is where the NPE comes from.

[~dan.berindei], This is probably a timing issue between crashed member detection listener, which removes nodes from topology cache based on JGroups view changes, and hash topology header generation logic. This is probably left over from when the transport view id was considered as main indicator to know whether a new view had to be sent to clients. With switch to cache topology id, this needs adjusting.

I still find this new logic a bit fragile, particularly checks like this:
https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/main/scala/org/infinispan/server/hotrod/AbstractEncoder1x.scala#L180

Earlier versions of this code kept track of a volatile view id that only got updated once we knew the cache had the right contents (i.e. once a node had been added or removed), and only at that point the view id changed and a new view was detected/sent.

The new logic that relies on double checking whether the cache has all the contents expected whenever a response has to be generated is IMO, more wasteful (needless collection contents checks) and more fragile (there's view/barrier/latch that marks that a new view is available and the raw data used to generate the view is reliable).

Comment 3 JBoss JIRA Server 2013-02-15 13:41:04 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-2781

Galderz, yes, this is again a problem with the topology cache being modified while we are writing a topology response to the client.

Earlier versions of the code had the same problem - we don't hold any lock while generating the client response, so anyone can modify the topology cache while we are generating the response, and this was always the case. Note that a new client will always require a topology response, regardless of when the topology has last changed.

At one point I did add an extra check to skip a node's hash from the topology response if it wasn't in the topology cache any more, but I thought (wrongly) that the extra membership check that you pointed out made it impossible and I deleted it.

I'm going to take a snapshot of the topology cache before generating the response and I'm going to use that copy for everything. Topology changes are very rare anyway, so it shouldn't make a difference performance-wise, and it will make the code easier to reason about.

Comment 4 JBoss JIRA Server 2013-02-19 10:58:29 UTC
Dan Berindei <dberinde> made a comment on jira ISPN-2781

Copy the address cache into a regular map before generating the
topology response, to protect from changes from other threads.

Send topology responses even if not all CH members are in the
address cache. This allows the client to remove leaving HotRod
servers from its CH.

Comment 5 Michal Linhard 2013-02-26 12:51:42 UTC
ran one elasticity test:
www.qa.jboss.com/~mlinhard/hyperion3/run0053-elas-16-32-ER12/report/loganalysis/server/

not present in JDG 6.1.0.ER12


Note You need to log in before you can comment on or make changes to this bug.