Bug 1276206 - [GSS](6.4.z) PageSubscriptionImpl cases a NullPointerException
[GSS](6.4.z) PageSubscriptionImpl cases a NullPointerException
Status: CLOSED CURRENTRELEASE
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: HornetQ (Show other bugs)
6.4.1
x86_64 Linux
high Severity high
: CR1
: EAP 6.4.7
Assigned To: Yong Hao Gao
Miroslav Novak
:
Depends On:
Blocks: 1279271 eap647-payload 1299528
  Show dependency treegraph
 
Reported: 2015-10-29 01:28 EDT by Tyronne Wickramarathne
Modified: 2017-01-17 07:01 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-17 07:01:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2128411 None None None 2016-01-18 10:11 EST

  None (edit)
Description Tyronne Wickramarathne 2015-10-29 01:28:23 EDT
Description of problem:

The current PageSubscriptionImpl implementation throws a NPE under heavy load.

Version-Release number of selected component (if applicable):
JBoss-EAP-6.4.1
HornetQ-2.3.25

How reproducible:


Steps to Reproduce:

The following NPE has been reported under heavy load :

2015-10-15 18:24:27,531 ERROR [org.hornetq.core.server] (Thread-18 (HornetQ-server-HornetQServerImpl::serverUUID=45745bf8-738b-11e5-9abc-c7191a962d45-884492432)) [U:] [I:] [D:] [N:] [S:] [J:] [A:] HQ224041: Failed to deliver: java.lang.NullPointerException
	at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.remove(PageSubscriptionImpl.java:1445)
	at org.hornetq.core.server.impl.QueueImpl.depage(QueueImpl.java:2214)
	at org.hornetq.core.server.impl.QueueImpl.access$1400(QueueImpl.java:92)
	at org.hornetq.core.server.impl.QueueImpl$DepageRunner.run(QueueImpl.java:2929)
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:105)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)




Additional info:

I don't think the PageCurserInfo is causing this. We might be able to get away from this issue if we could synchronise the currentDelivery (PageReference) :

public void remove()
      {
         deliveredCount.incrementAndGet();
         if (currentDelivery != null)
         {
            PageCursorInfo info = PageSubscriptionImpl.this.getPageInfo(currentDelivery.getPosition());
            if (info != null)
            {
               info.remove(currentDelivery.getPosition());
            }
         }
      }
Comment 6 Ondřej Kalman 2016-01-13 08:20:40 EST
QA_FAILED: failed our regression test, you are using wrong instance "currentDelivery" instead of "delivery"

PagedReference delivery = currentDelivery;
         if (delivery != null)
         {
            PageCursorInfo info = PageSubscriptionImpl.this.getPageInfo(currentDelivery.getPosition());
            if (info != null)
            {
               info.remove(currentDelivery.getPosition());
            }
         }
Comment 7 Yong Hao Gao 2016-01-13 08:38:38 EST
Hi,

According to this commit
https://github.com/hornetq/hornetq/commit/54f1cdec9076d0044e54c46d6b41e1849674f970

it does use 'delivery' instead of 'currentDelivery'.

Can you point me the code where you see the wrong instance?

Thanks
Howard
Comment 8 Ondřej Kalman 2016-01-13 08:45:43 EST
If you checkout SP8 tag or download it as zip you will see that implementation is different then your commit.
Comment 9 Ondřej Kalman 2016-01-13 08:52:54 EST
There was some later commit which changed your implementation...
Comment 11 Ondřej Kalman 2016-01-13 09:07:45 EST
yes, that's correct one-off passed, regression test was written for this case (that fix in one-off is included in future CPs)
Comment 12 Yong Hao Gao 2016-01-13 09:14:07 EST
hmm, looks like the fix for this case has been changed by other fixes/commits.
I'll ask clebert as he is the one that changed it.

Thanks
Comment 13 Clebert Suconic 2016-01-13 11:31:03 EST
I made a mistake. on my patch fix for LODH I needed to check for info != null, and I made a mistake there.

this will need a commit to fix it
Comment 14 Clebert Suconic 2016-01-13 11:39:46 EST
How do you want to address this: https://bugzilla.redhat.com/show_bug.cgi?id=1276206

?
Comment 15 Clebert Suconic 2016-01-13 11:40:22 EST
one-off?
Comment 19 Clebert Suconic 2016-01-14 12:10:13 EST
I meant to add this link on a previous commit:

https://github.com/hornetq/hornetq/commit/281a23710eddfc5e770c2f6871d57525583c2f7a


This is the fix I added
Comment 23 Clebert Suconic 2016-02-11 13:03:11 EST
It is. And that's the only real fix I see on this release.


Martin Styk as sent a lot of test fixes to the 2.3.25.x but I don't see any other fix in for this release.
Comment 24 Clebert Suconic 2016-02-11 13:04:07 EST
(actually there are 2 other small fixes)
Comment 29 Peter Mackay 2016-03-09 09:54:20 EST
Verified with EAP 6.4.7.CP.CR2 using the provided test cases.
Comment 30 Petr Penicka 2017-01-17 07:01:15 EST
Retroactively bulk-closing issues from released EAP 6.4 cumulative patches.

Note You need to log in before you can comment on or make changes to this bug.