Bug 1711830

Summary: RHV manager spontaneously fencing nodes when lots of concurrent qemu snapshots are executed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Jay Samson <jpankaja>
Component: coreAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Rahul Hinduja <rhinduja>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.4CC: amukherj, kdhananj, moagrawa, pdhange, rhs-bugs, sabose, sankarshan, srakonde, storage-qa-internal, sunkumar
Target Milestone: ---Keywords: Performance
Target Release: ---Flags: kdhananj: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-18 07:48:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1614430, 1712654    
Bug Blocks:    

Comment 3 Sahina Bose 2019-05-20 11:26:20 UTC
Is the fencing of nodes causing quorum loss? Can you ensure that customer has set the fencing policies related to gluster at the cluster level (i.e not fencing if brick is online or if it could lead to quorum loss)


Also can you confirm if these are gluster snapshots or qemu snapshots on gluster volume?

Comment 5 Sahina Bose 2019-05-20 14:45:13 UTC
vmstore1 and vmstore2 are distributed-replica volumes. whenever concurrent delete of VM snapshots occurs, there's an issue with I/O latency , sanlock logs - "2019-05-07 00:19:43 2301165 [25167]: s10 delta_renew long write time 43 sec"

Krutika, could you check the logs to see if there are any gluster issues causing this high latency?

Comment 30 Red Hat Bugzilla 2023-09-14 05:28:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days