Bug 1637787

Summary: Another transaction in progress seen in heketi logs while deleting pvcs.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: heketiAssignee: John Mulligan <jmulligan>
Status: CLOSED NEXTRELEASE QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: hchiramm, jmulligan, kramdoss, madam, rhs-bugs, rtalur, sankarshan, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-23 21:30:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1636872    
Bug Blocks:    
Attachments:
Description Flags
createLargeScaleTest.sh none

Description RamaKasturi 2018-10-10 06:18:44 UTC
Created attachment 1492345 [details]
createLargeScaleTest.sh

Description of problem:
I have used the script attached (createLargeScaleTest.sh) where a mongodb pod is created and pvc is bound to it as soon as it is created. Once the pvcs are bound successfully i am using another script attached (deleteLargeScaleTest.sh) to delete dc, service and pvc created. I see that all the pvcs get deleted but when i run oc get pv i see all of the pvs in failed state.

oc get pvc :
=================
No resources found

oc get pv:
====================
oc get pv | grep Failed | wc -l
950


Version-Release number of selected component (if applicable):
[root@dhcp46-231 ~]# oc version
oc v3.11.20
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
heketi-client-7.0.0-14.el7rhgs.x86_64


How reproducible:
Hit it once

Steps to Reproduce:
1. Run the script attached createLargeScaleTest.sh and wait for all the pvcs to be bound
2. Run the script attached deleteLargeScaleTest.sh to delete all the resources created as part of above test.
3.

Actual results:
There are 950 pvs in failed state and they do not get deleted. In the heketi logs i see the error as "Another transaction in progress"

Expected results:
All the pvs should get delted.

Additional info:

Comment 2 RamaKasturi 2018-10-10 06:20:19 UTC
heketi db dump is copied to the location below.

http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1600042/db_dump_oct8.txt

heketi log is copied to the location below.

http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1600042/heketi_delete_oct8.log

Comment 4 Niels de Vos 2018-10-15 13:19:08 UTC
As mentioned in bug 1636872 it is required to have a delay between volume operations. It is important to know if deleting the volumes with a 2 second delay is more stable.

On the other hand, we may add a retry of the operations when the gluster CLI times out or gets a "another transaction is in progress" error.