907496 – NPE in AbstractTopologyAwareEncoder1x

Bug 907496 - NPE in AbstractTopologyAwareEncoder1x

Summary: NPE in AbstractTopologyAwareEncoder1x

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	Infinispan
Sub Component:
Version:	6.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ER12
Target Release:	6.1.0
Assignee:	Tristan Tarrant
QA Contact:	Nobody
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-02-04 14:55 UTC by Michal Linhard
Modified:	2025-02-10 03:27 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2025-02-10 03:27:24 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	ISPN-2781	0	Major	Resolved	NPE in AbstractTopologyAwareEncoder1x	2013-02-26 12:51:47 UTC

Description Michal Linhard 2013-02-04 14:55:34 UTC

https://issues.jboss.org/browse/ISPN-2781

Comment 1 Michal Linhard 2013-02-04 14:56:32 UTC

http://www.qa.jboss.com/~mlinhard/hyperion3/run0051-elas-16-32-ER10/report/loganalysis/server/categories/cat8_entry0.txt

Comment 2 JBoss JIRA Server 2013-02-14 10:41:40 UTC

Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2781

Assigning it to Dan. This seems related to changes in ISPN-2632 and ISPN-2738 whereby those server addresses that cannot be found in Hot Rod topology cache are still in the hashId/member cache distribution manager. This can result in null server address instances which is where the NPE comes from.

[~dan.berindei], This is probably a timing issue between crashed member detection listener, which removes nodes from topology cache based on JGroups view changes, and hash topology header generation logic. This is probably left over from when the transport view id was considered as main indicator to know whether a new view had to be sent to clients. With switch to cache topology id, this needs adjusting.

I still find this new logic a bit fragile, particularly checks like this:
https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/main/scala/org/infinispan/server/hotrod/AbstractEncoder1x.scala#L180

Earlier versions of this code kept track of a volatile view id that only got updated once we knew the cache had the right contents (i.e. once a node had been added or removed), and only at that point the view id changed and a new view was detected/sent.

The new logic that relies on double checking whether the cache has all the contents expected whenever a response has to be generated is IMO, more wasteful (needless collection contents checks) and more fragile (there's view/barrier/latch that marks that a new view is available and the raw data used to generate the view is reliable).

Comment 3 JBoss JIRA Server 2013-02-15 13:41:04 UTC

Dan Berindei <dberinde> made a comment on jira ISPN-2781

Galderz, yes, this is again a problem with the topology cache being modified while we are writing a topology response to the client.

Earlier versions of the code had the same problem - we don't hold any lock while generating the client response, so anyone can modify the topology cache while we are generating the response, and this was always the case. Note that a new client will always require a topology response, regardless of when the topology has last changed.

At one point I did add an extra check to skip a node's hash from the topology response if it wasn't in the topology cache any more, but I thought (wrongly) that the extra membership check that you pointed out made it impossible and I deleted it.

I'm going to take a snapshot of the topology cache before generating the response and I'm going to use that copy for everything. Topology changes are very rare anyway, so it shouldn't make a difference performance-wise, and it will make the code easier to reason about.

Comment 4 JBoss JIRA Server 2013-02-19 10:58:29 UTC

Dan Berindei <dberinde> made a comment on jira ISPN-2781

Copy the address cache into a regular map before generating the
topology response, to protect from changes from other threads.

Send topology responses even if not all CH members are in the
address cache. This allows the client to remove leaving HotRod
servers from its CH.

Comment 5 Michal Linhard 2013-02-26 12:51:42 UTC

ran one elasticity test:
www.qa.jboss.com/~mlinhard/hyperion3/run0053-elas-16-32-ER12/report/loganalysis/server/

not present in JDG 6.1.0.ER12

Comment 9 Red Hat Bugzilla 2025-02-10 03:27:24 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.

Note You need to log in before you can comment on or make changes to this bug.