http://www.qa.jboss.com/~mlinhard/hyperion3/run0015/report/loganalysis/server/ this actually happens in a catch block for Throwable, so it might be some different exception
testcase: - start 8 nodes - wait for cluster - load some entries to hotrod test caches - stop all nodes gracefully - wait for all java processes to die
And it is different exception. It's something about externalizer shutting down. About incorrect log message in JGroups I filed https://issues.jboss.org/browse/JGRP-1556
Resolution of JGRP-1556 only helped to show that the root cause behind this a NPE: http://www.qa.jboss.com/~mlinhard/hyperion3/run0038-startup-ER9/report/loganalysis/server/categories/cat8_entry1.txt the actual NPE cause is ISPN-2751 so this is still present in JDG 6.1.0.ER9, however impact of this bug is near zero.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2751 Michal, do you have the full stacktrace of the NPE?
Michal Linhard <mlinhard> made a comment on jira ISPN-2751 Because this exception is reported by JGroups, it's output only as one line ERROR message (see the link). I managed to catch it via exception breakpoint and see it directly in Eclipse, but I can't find my notes, where I pasted the stacktrace and it's quite a long process to recreate it. The JGroups code triggering it is this one: https://github.com/belaban/JGroups/blob/master/src/org/jgroups/blocks/RequestCorrelator.java#L514
Dan Berindei <dberinde> made a comment on jira ISPN-2751 Interesting, we were just looking at the ISPN-2572 stack trace yesterday, which shows that all outstanding requests are "closed" by JGroups (by returning an exception) before actually shutting down the channel. The ExternalizerTable component only shuts down after the JGroups channel, so it shouldn't be possible to have active requests after the ExternalizerTable has stopped. Do you have TRACE logs?
Dan Berindei <dberinde> made a comment on jira ISPN-2751 Interesting, we were just looking at the ISPN-2752 stack trace yesterday, which shows that all outstanding requests are "closed" by JGroups (by returning an exception) before actually shutting down the channel. The ExternalizerTable component only shuts down after the JGroups channel, so it shouldn't be possible to have active requests after the ExternalizerTable has stopped. Do you have TRACE logs?
Michal Linhard <mlinhard> made a comment on jira ISPN-2751 serverlogs.zip trace logs from run of 6 nodes on my laptop. the NPE occured 3 times: {code} node05.log 872902 12:15:03,782 ERROR [org.jgroups.blocks.RequestCorrelator] failed marshalling rsp (SuccessfulResponse{responseValue={}} ): java.lang.NullPointerException node06.log 892500 12:15:03,865 ERROR [org.jgroups.blocks.RequestCorrelator] failed marshalling rsp (SuccessfulResponse{responseValue={}} ): java.lang.NullPointerException node03.log 1080114 12:15:04,639 ERROR [org.jgroups.blocks.RequestCorrelator] failed marshalling rsp (SuccessfulResponse{responseValue={}} ): java.lang.NullPointerException {code} there were also some other issues during the run, but hopefully this run illustrates what you need.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2751 In the past we had issues where the RPC dispatcher was stopping *after* the cache manager had stopped. This lead to NPEs in ExternalizerTable (ISPN-2025) as a result of trying to deserialize stuff after the cache managed had stopped. At the time we added a workaround to handle null commands, which effectively are the result of externalizer table not being able to deserialize objects (normally due to the sequence of events mentioned) and we reverted it (ISPN-2063) once AS7-3180 had been implemented. The problem now does not appear to be that we cannot deserialize something, but rather than while sending a response to *something*, the cache manager was stopped leading to a null externalizer table. The circumstances look very similar to ISPN-2752 actually and the result seems to be circumstantial: * In ISPN-2752 we see that the cache manager has started stopping but the externalizer table is still there, because the response is being marshalled. * In this JIRA, we see that the cache manager has fully stopped, the externalizer table is not there any more, but the JGroups transport (or RPC dispatcher) is still actively trying to send a reply to a command. Some TRACE logs, including not only org.jgroups and org.infinispan, but also org.jboss.as.clustering would be very handy to try to better understand how to fix this.
Michal Linhard <mlinhard> made a comment on jira ISPN-2751 serverlogs2.zip - trace logs with "org.jboss.as.clustering" category this time only 5 nodes. See node04 for NPE.
Dan Berindei <dberinde> made a comment on jira ISPN-2751 I was actually able to reproduce this while running a test from the IDE. I think I (we) understood the ISPN-2752 stacktrace incorrectly - {{GroupRequest.requests}} represents the requests sent from the local node, so the {{IllegalStateException}} is thrown when the JChannel shuts down and there are still requests waiting for responses from the other nodes. In that case, we send a CH_UPDATE from a background thread and we don't wait for that thread to end before shutting down the JChannel. There is a somewhat similar mechanism on the receiving end, as well - when stopping, JChannel calls {{shutdownNow()}} on the TP thread pools, which interrupts the threads. But if the threads don't do any I/O, they don't get a {{InterruptedException}} and they proceed with their work. In this case the OOB thread was actually was calling {{getorCreateComponent()}} on the GlobalComponentRegistry, which is synchronized. But the main thread was calling {{stop()}} on the GlobalComponentRegistry, which is also synchronized, at the same time. So the OOB thread could not proceed to do anything until the main thread finished stopping the cache manager (which included waiting 3 seconds for the OOB thread pool to shut down). I think the problem is that {{AbstractComponentRegistry.stop()}} holds the component registry locked for too long. It should not hold the lock while it is invoking {{stop()}} on all the components.
Nominated for 6.2 release notes.
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.