Bug 1711830

Summary:	RHV manager spontaneously fencing nodes when lots of concurrent qemu snapshots are executed
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Jay Samson <jpankaja>
Component:	core	Assignee:	Krutika Dhananjay <kdhananj>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.4	CC:	amukherj, kdhananj, moagrawa, pdhange, rhs-bugs, sabose, sankarshan, srakonde, storage-qa-internal, sunkumar
Target Milestone:	---	Keywords:	Performance
Target Release:	---	Flags:	kdhananj: needinfo-
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-18 07:48:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1614430, 1712654
Bug Blocks:

Comment 3 Sahina Bose 2019-05-20 11:26:20 UTC

Is the fencing of nodes causing quorum loss? Can you ensure that customer has set the fencing policies related to gluster at the cluster level (i.e not fencing if brick is online or if it could lead to quorum loss)


Also can you confirm if these are gluster snapshots or qemu snapshots on gluster volume?

Comment 5 Sahina Bose 2019-05-20 14:45:13 UTC

vmstore1 and vmstore2 are distributed-replica volumes. whenever concurrent delete of VM snapshots occurs, there's an issue with I/O latency , sanlock logs - "2019-05-07 00:19:43 2301165 [25167]: s10 delta_renew long write time 43 sec"

Krutika, could you check the logs to see if there are any gluster issues causing this high latency?

Comment 30 Red Hat Bugzilla 2023-09-14 05:28:52 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days