Bug 919397

Summary:	DIST: ISPN000208: No live owners found for segment at server shutdown
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Jitka Kozana <jkudrnac>
Component:	Clustering	Assignee:	Paul Ferraro <paul.ferraro>
Status:	CLOSED NOTABUG	QA Contact:	Jitka Kozana <jkudrnac>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.1.1	CC:	jkudrnac, lthon, myarboro, rjanik, smumford
Target Milestone:	---
Target Release:	EAP 6.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-12-09 14:44:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jitka Kozana 2013-03-08 11:01:27 UTC

EAP 6.1.0.ER2:

At server shutdown, we are seeing the following errors:
perf20:
13:34:02,045 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (OOB-60,shared=udp) ISPN000208: No live owners found for segment 17 of cache default-host/clusterbench. Current owners are:  [perf21/web]. Faulty owners: [perf21/web]

perf21 was at the time shutting down too and complaining about the same thing too (No live owners found).

See the server logs here:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Clustering/view/EAP6-Failover/job/eap-6x-failover-http-session-jvmkill-dist-async/35/artifact/report/config/jboss-perf20/server.log
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Clustering/view/EAP6-Failover/job/eap-6x-failover-http-session-jvmkill-dist-async/35/artifact/report/config/jboss-perf21/server.log

This upstream jira seems to be dealing with something similar: https://issues.jboss.org/browse/ISPN-1239

Comment 1 Jitka Kozana 2013-05-13 14:02:32 UTC

Seen again in 6.1.0.ER8.

Link to server log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-undeploy-dist-async/28/artifact/report/config/jboss-perf18/server.log

Link to job:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-undeploy-dist-async/28/

Comment 2 Ladislav Thon 2013-08-26 12:02:49 UTC

Still seeing this with EAP 6.1.1.ER7. For example:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-undeploy-dist-sync/28/
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-jvmkill-dist-sync/47/

Comment 3 Paul Ferraro 2013-12-09 14:44:07 UTC

This happens when all of the owners of a given leave the group, so that data can no longer be rebalanced.  When using DIST, you need to either:
* Increase the number of owners according to the expected availability
* Stagger shutdown of nodes according to the number of owners
* Ignore exceptions like this when you plan to shutdown everything

Comment 4 Scott Mumford 2014-04-23 02:04:08 UTC

Marking for exclusion from Release Notes documentation as not a bug.