1595521 – [RFE] Implement retry logic for blockvolume creation in Heketi

Bug 1595521 - [RFE] Implement retry logic for blockvolume creation in Heketi

Summary: [RFE] Implement retry logic for blockvolume creation in Heketi

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	cns-3.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Talur
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	OCS-3.11.1-devel-triage-done
TreeView+	depends on / blocked

Reported:	2018-06-27 04:34 UTC by vinutha
Modified:	2019-05-08 17:28 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-08 17:28:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description vinutha 2018-06-27 04:34:15 UTC

Description of problem:
Heketi volume creation in loop fails during parallel node reboot. 

Version-Release number of selected component (if applicable):
# rpm -qa | grep openshift
atomic-openshift-clients-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
openshift-ansible-roles-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
atomic-openshift-docker-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch
atomic-openshift-excluder-3.10.0-0.67.0.git.0.ccd325f.el7.noarch
atomic-openshift-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
openshift-ansible-docs-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
openshift-ansible-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch
atomic-openshift-hyperkube-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
atomic-openshift-node-3.10.0-0.67.0.git.0.ccd325f.el7.x86_64
openshift-ansible-playbooks-3.10.0-0.67.0.git.107.1bd1f01.el7.noarch

-- Heketi version in heketi pod
# rpm -qa | grep heketi
python-heketi-7.0.0-1.el7rhgs.x86_64
heketi-client-7.0.0-1.el7rhgs.x86_64
heketi-7.0.0-1.el7rhgs.x86_64

-- Gluster version in gluster pod
# rpm -qa | grep gluster
glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-api-3.8.4-54.12.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64
glusterfs-server-3.8.4-54.12.el7rhgs.x86_64
gluster-block-0.2.1-20.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64


How reproducible:
1:1

Steps to Reproduce:
1. Create a 4 node CNS 3.10 setup 
2. Initiated 30 heketi blockvolume creation of 10GB each 
3. Rebooted 1 node. 
4. Observed that heketi blockvolume creation fails during the reboot and resumes after the node is up. There is sufficient space on the other 3 nodes for blockvolumes to be created.  
5. Initiated another reboot on the same node and observed the same behaviour. 

Actual results:
Heketi block volume creation failed during the reboot and resumed after node is up

Expected results:
Heketi block volume creation should proceed unaffected during node reboot since 3 nodes are still up

Additional info:
Attached logs before initiating volume creation and after

Comment 10 Yaniv Kaul 2018-07-08 08:01:53 UTC

Does it happen always? Without the loop and whatnot, trying to create a block volume when 1 out of 4 nodes is down?

Comment 11 Yaniv Kaul 2018-07-08 08:07:37 UTC

(And could it be in any way related to bug 1595531 ?

Note You need to log in before you can comment on or make changes to this bug.