Bug 1192324

Summary: Inconsistent logging for starting and finished rebalance
Product: [JBoss] JBoss Data Grid 6 Reporter: Osamu Nagano <onagano>
Component: InfinispanAssignee: Dan Berindei <dberinde>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Gencur <mgencur>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4.0CC: afield, chuffman, dmehra, dstahl, jdg-bugs, ksuzumur, pzapataf, ttarrant, vjuranek
Target Milestone: ER1   
Target Release: 6.4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The finished cluster-wide rebalance event was previously logged at DEBUG level which was not consistent with other events related to re-balancing. This is now fixed and the event is logged on INFO level.
Story Points: ---
Clone Of:
: 1200947 (view as bug list) Environment:
Last Closed: 2015-04-02 12:14:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1200947    

Description Osamu Nagano 2015-02-13 07:31:14 UTC
Description of problem:
In JDG 6.4.0, starting and finished cluster-wide (and local, too) rebalance events are logged in INFO level for better administration.

~~~
15:51:46,624 INFO  [org.infinispan.CLUSTER] (remote-thread-0) ISPN000310: Starting cluster-wide rebalance for cache default, topology CacheTopology{id=1, rebalanceId=1, currentCH=DefaultConsistentHash{ns = 20, owners = (1)[svr01/clustered: 20+0]}, pendingCH=DefaultConsistentHash{ns = 20, owners = (2)[svr01/clustered: 10+10, svr02/clustered: 10+10]}, unionCH=null, actualMembers=[svr01/clustered, svr02/clustered]}
15:51:46,629 INFO  [org.infinispan.CLUSTER] (undefined) ISPN000328: Finished local rebalance for cache default on node svr01/clustered, topology id = 1
15:51:46,696 INFO  [org.infinispan.CLUSTER] (remote-thread-2) ISPN000328: Finished local rebalance for cache default on node svr02/clustered, topology id = 1
~~~

But finished cluster-wide rebalance event is logged in DEBUG level, which is not visible by default.

~~~
15:51:46,697 DEBUG [org.infinispan.topology.ClusterCacheStatus] (remote-thread-2) Finished cluster-wide rebalance for cache default, topology id = 1
~~~


Version-Release number of selected component (if applicable):
infinispan-core-6.2.0.Final-redhat-4.jar


How reproducible:
Always.


Steps to Reproduce:
Just start two out-of-the-box clustered instances.  The following debug category should have been set.
~~~
            <logger category="org.infinispan.topology.ClusterCacheStatus">
                <level name="DEBUG"/>
            </logger>
~~~


Actual results:
See "Description of problem".


Expected results:
Finished cluster-wide rebalance event should be in INFO and org.infinispan.CLUSTER category as well.


Additional info:
Thread named "(undefined)" reduces debuggability.  This should be addressed as well.

Comment 4 Alan Field 2015-02-13 08:22:49 UTC
Marking for JDG 6.4.1

Comment 6 Dan Berindei 2015-03-11 17:03:00 UTC
PR: https://github.com/infinispan/jdg/pull/554

The PR does not address the (undefined) thread name, as I couldn't find the cause. The thread is from the "transport executor" thread pool, which seems to be properly defined in the configuration.

Comment 7 Martin Gencur 2015-03-12 13:54:40 UTC
Osamu, could you please check with the customer whether the fix is sufficient for them? Thanks

Comment 8 Vojtech Juranek 2015-03-12 14:02:50 UTC
Hi,
I'm not bale to reproduce "undefined" thread name in log (in my log thread names appear correctly). If customer wants this to be fixed as well, it would be nice to provide a reproducer. Thanks!

Comment 10 Alan Field 2015-03-12 14:21:46 UTC
I saw these log messages while verifying BZ1198211:

10:17:07,284 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(1)-127.0.0.1) ISPN000309: Rebalancing suspended
10:17:22,450 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,shared=udp) ISPN000094: Received new cluster view: [node0/clustered|1] (2) [node0/clustered, node1/clustered]
10:17:38,429 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(5)-127.0.0.1) ISPN000308: Rebalancing enabled
10:17:38,434 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(5)-127.0.0.1) ISPN000310: Starting cluster-wide rebalance for cache default, topology CacheTopology{id=1, rebalanceId=1, currentCH=DefaultConsistentHash{ns = 20, owners = (1)[node0/clustered: 20+0]}, pendingCH=DefaultConsistentHash{ns = 20, owners = (2)[node0/clustered: 10+10, node1/clustered: 10+10]}, unionCH=null, actualMembers=[node0/clustered, node1/clustered]}
10:17:38,446 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(5)-127.0.0.1) ISPN000310: Starting cluster-wide rebalance for cache ___hotRodTopologyCache, topology CacheTopology{id=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[node0/clustered: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[node0/clustered: 30, node1/clustered: 30]}, unionCH=null, actualMembers=[node0/clustered, node1/clustered]}
10:17:38,453 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(5)-127.0.0.1) ISPN000310: Starting cluster-wide rebalance for cache ___protobuf_metadata, topology CacheTopology{id=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[node0/clustered: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[node0/clustered: 30, node1/clustered: 30]}, unionCH=null, actualMembers=[node0/clustered, node1/clustered]}
10:17:38,456 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(5)-127.0.0.1) ISPN000310: Starting cluster-wide rebalance for cache memcachedCache, topology CacheTopology{id=1, rebalanceId=1, currentCH=DefaultConsistentHash{ns = 20, owners = (1)[node0/clustered: 20+0]}, pendingCH=DefaultConsistentHash{ns = 20, owners = (2)[node0/clustered: 10+10, node1/clustered: 10+10]}, unionCH=null, actualMembers=[node0/clustered, node1/clustered]}
10:17:38,460 INFO  [org.infinispan.CLUSTER] (RMI TCP Connection(5)-127.0.0.1) ISPN000310: Starting cluster-wide rebalance for cache namedCache, topology CacheTopology{id=1, rebalanceId=1, currentCH=DefaultConsistentHash{ns = 80, owners = (1)[node0/clustered: 80+0]}, pendingCH=DefaultConsistentHash{ns = 80, owners = (2)[node0/clustered: 40+40, node1/clustered: 40+40]}, unionCH=null, actualMembers=[node0/clustered, node1/clustered]}
10:17:38,496 INFO  [org.infinispan.CLUSTER] (remote-thread-0) ISPN000336: Finished cluster-wide rebalance for cache default, topology id = 1
10:17:38,504 INFO  [org.infinispan.CLUSTER] (remote-thread-0) ISPN000336: Finished cluster-wide rebalance for cache memcachedCache, topology id = 1
10:17:38,506 INFO  [org.infinispan.CLUSTER] (remote-thread-1) ISPN000336: Finished cluster-wide rebalance for cache ___hotRodTopologyCache, topology id = 1
10:17:38,520 INFO  [org.infinispan.CLUSTER] (remote-thread-1) ISPN000336: Finished cluster-wide rebalance for cache ___protobuf_metadata, topology id = 1
10:17:38,523 INFO  [org.infinispan.CLUSTER] (remote-thread-0) ISPN000336: Finished cluster-wide rebalance for cache namedCache, topology id = 1

Comment 12 Osamu Nagano 2015-03-13 00:27:50 UTC
I also confirmed the fix in JDG_6.4.1.ER1.  "Finished cluster-wide rebalance" is INFO and "Finished local rebalance" is DEBUG now.

For "undefined" thread name, I found an upstream fix, https://issues.jboss.org/browse/ISPN-4989, which may be related to this.