1598310 – gluster-block create command returns a non zero exit code even on success

Bug 1598310 - gluster-block create command returns a non zero exit code even on success

Summary: gluster-block create command returns a non zero exit code even on success

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	cns-3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Prasanna Kumar Kalever
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1568862
TreeView+	depends on / blocked

Reported:	2018-07-05 05:38 UTC by Neha Berry
Modified:	2018-09-12 09:28 UTC (History)
CC List:	12 users (show)
Fixed In Version:	gluster-block-0.2.1-21.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-12 09:27:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	gluster gluster-block pull 93	0	'None'	closed	Exit code fix	2020-11-12 13:34:52 UTC
Red Hat Product Errata	RHEA-2018:2691	0	None	None	None	2018-09-12 09:28:22 UTC

Description Neha Berry 2018-07-05 05:38:12 UTC

Description of problem:
+++++++++++++++++++++
We were executing following Test Case : "On different version of block between 3 nodes - create and delete PVC in a loop". 

As part of the Test case, we had a OCP 3.9 +  CNS 3.9 async3 setup with 3 nodes X,Y and Z. We had created some pvcs and confirmed from "targetcli ls"  that Node X is the AO target node for all the PVCs. 
Upgraded Node X to latest gluster-block version - gluster-block-0.2.1-20.el7rhgs.x86_64(by re-spinnning glusterfs pod with latest CNS 3.10 image - 3.3.1-21). Once Node is back up, checked that both gluster-blockd and other services are up. Tried creating new pvcs, but pvc creation failed with unable to run command on pod X which has latest gluster-blockd. Seen from gluster-blockd logs, "create_load_balance' doesn't exit on 10.70.42.84"

================================================

pvc
++++++++
  Normal   ExternalProvisioning  3m (x88 over 8m)   persistentvolume-controller                                    waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock" or manually created by system administrator
  Normal   Provisioning          12s (x11 over 8m)  gluster.org/glusterblock a34cb863-7fb2-11e8-8a40-0a580a80000a  External provisioner is provisioning volume for claim "glusterfs/upg3"
  Warning  ProvisioningFailed    10s (x11 over 8m)  gluster.org/glusterblock a34cb863-7fb2-11e8-8a40-0a580a80000a  Failed to provision volume with StorageClass "gluster-block": failed to create volume: [heketi] failed to create volume: Unable to execute command on glusterfs-storage-hwn97:


Gluster-block logs
++++++++++++++

[2018-07-04 18:27:06.167238] INFO: create cli request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a mpath=3 blockhosts=10.70.42.223,10.70.42.84,10.70.41.217 authmode=1 size=5368709120, rbsize=0 [at block_svc_routines.c+3814 :<block_create_cli_1_svc_st>]
[2018-07-04 18:27:06.167780] ERROR: block remote version failed: RPC: Procedure unavailable
on host 10.70.41.217 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>]
[2018-07-04 18:27:06.167898] ERROR: block remote version failed: RPC: Procedure unavailable
on host 10.70.42.84 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>]
[2018-07-04 18:27:06.167983] ERROR: glusterBlockCapabilityRemoteAsync() failed (capability 'create_load_balance' doesn't exit on 10.70.42.84) [at block_svc_routines.c+1761 :<glusterBlockCheckCapabilities>]
[2018-07-04 18:27:06.290780] INFO: create request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a blockhosts=10.70.42.223,10.70.42.84,10.70.41.217 filename=33c77079-7ef9-4197-9766-a0d6f93e90b4 authmode=1 passwd=f2d6a92a-f9bf-4f2f-8298-06e38f2aa468 size=5368709120 [at block_svc_routines.c+4227 :<block_create_common>]
[2018-07-04 18:27:06.655525] INFO: command exit code, 0 [at block_svc_routines.c+4403 :<block_create_common>]
[2018-07-04 18:27:06.747205] INFO: Block create request satisfied for target: blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a on volume vol_72fd8a6c364a0b2d3cd6a4fa3e89791d with given hosts 10.70.42.223,10.70.42.84,10.70.41.217 [at block_svc_routines.c+3030 :<glusterBlockAuditRequest>]
[2018-07-04 18:27:06.837998] INFO: delete cli request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a [at block_svc_routines.c+4530 :<block_delete_cli_1_svc_st>]
[2018-07-04 18:27:06.846553] ERROR: block remote version failed: RPC: Procedure unavailable
on host 10.70.41.217 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>]
[2018-07-04 18:27:06.846633] ERROR: block remote version failed: RPC: Procedure unavailable
on host 10.70.42.84 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>]
[2018-07-04 18:27:06.857926] INFO: delete request, blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a filename=33c77079-7ef9-4197-9766-a0d6f93e90b4 [at block_svc_routines.c+4651 :<block_delete_1_svc_st>]
[2018-07-04 18:27:07.307856] INFO: command exit code, 0 [at block_svc_routines.c+4697 :<block_delete_1_svc_st>]
[2018-07-04 18:27:07.590359] INFO: delete cli request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a [at block_svc_routines.c+4530 :<block_delete_cli_1_svc_st>]
[2018-07-04 18:27:07.594704] ERROR: block with name blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a doesn't exist in the volume vol_72fd8a6c364a0b2d3cd6a4fa3e89791d [at block_svc_routines.c+4565 :<block_delete_cli_1_svc_st>]

============================================================

Some points of consideration
++++++++++++++++++++++
1. Load balancing in target portals is supported from gluster-block-0.2.1-19 and above. The gluster-block verison in CNS 3.9 ASYNC3 is gluster-block-0.2.1-14.1
2. This issue is seen only when the AO pod is at a higher version than the rest of the pods, and then pv creates are performed.
3. On another setup, non AO pod, say Y was upgraded and pvc creates were started. pvc creation succeeded in that setup.
4. Upgrade in presence 



How reproducible:
++++++++++++++++++++ 
2/2

Steps to Reproduce:
+++++++++++++++++++++
1. Create a CNS 3.9 setup with 3 gluster nodes X,Y,Z 
2. Create some pvcs to confirm which node/pod is selected is selected as  the Active-optimized(AO) target portal. Say the node is X.
 The first node listed in heketi-cli bloxkvolume info is the system selected AO pod.:q!
 
3. Using CNS gluster pod upgrade process, upgrade the pod on X to latest CNS 3.10 build , i.e. latest gluster-blockd build.
4  Once the pod comes up fine, try creating PVCs.

As a corner case (when above issue is not encountered)
+++++++++++++++
1. Create a CNS 3.9 setup with 3 gluster nodes X,Y,Z 
2. Create some pvcs to confirm which node/pod is selected is selected as  the Active-optimized(AO) target portal. Say the node is X.
 The first node listed in heketi-cli bloxkvolume info is the system selected AO pod.:q!
 
3. Using CNS gluster pod upgrade process, upgrade any other pod than pod X, i.e. upgrade pod Y(which is not the AO pod for the pvcs)
4  Once the pod comes up fine, try creating PVCs.
5. PVC creation and deletion will be successful as the pod which is at a higher version is not the one used as Active for the pvcs.



Actual results:
+++++++++++++++
When we have the Active pod at a higher version, pvc and other config commands fail to get executed at the Active pod.

Expected results:
++++++++++++++++++
Since pods are at different versions, as per functionality, the features of the lowest version should always be functional and no create and delete tasks should fail.

Additional info:
Version-Release number of selected component (if applicable):
---------------------------------------

# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep targetcli ; done
glusterfs-storage-8xjjr
+++++++++++++++++++++++
targetcli-2.1.fb46-4.el7_5.noarch
glusterfs-storage-hwn97
+++++++++++++++++++++++
targetcli-2.1.fb46-6.el7_5.noarch
glusterfs-storage-xg2bv
+++++++++++++++++++++++
targetcli-2.1.fb46-4.el7_5.noarch
[root@dhcp42-137 logs_target_pods]# 
[root@dhcp42-137 logs_target_pods]#  for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep gluster-block ; done
glusterfs-storage-8xjjr
+++++++++++++++++++++++
gluster-block-0.2.1-14.1.el7rhgs.x86_64
glusterfs-storage-hwn97
+++++++++++++++++++++++
gluster-block-0.2.1-20.el7rhgs.x86_64
glusterfs-storage-xg2bv
+++++++++++++++++++++++
gluster-block-0.2.1-14.1.el7rhgs.x86_64
[root@dhcp42-137 logs_target_pods]# 
[root@dhcp42-137 logs_target_pods]#  for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep tcmu-runner ; done
glusterfs-storage-8xjjr
+++++++++++++++++++++++
tcmu-runner-1.2.0-16.el7rhgs.x86_64
glusterfs-storage-hwn97
+++++++++++++++++++++++
tcmu-runner-1.2.0-20.el7rhgs.x86_64
glusterfs-storage-xg2bv
+++++++++++++++++++++++
tcmu-runner-1.2.0-16.el7rhgs.x86_64
[root@dhcp42-137 logs_target_pods]# 
[root@dhcp42-137 logs_target_pods]#  for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-configshell ; done
glusterfs-storage-8xjjr
+++++++++++++++++++++++
python-configshell-1.1.fb23-4.el7_5.noarch
glusterfs-storage-hwn97
+++++++++++++++++++++++
python-configshell-1.1.fb23-4.el7_5.noarch
glusterfs-storage-xg2bv
+++++++++++++++++++++++
python-configshell-1.1.fb23-4.el7_5.noarch
[root@dhcp42-137 logs_target_pods]# 
[root@dhcp42-137 logs_target_pods]#  for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-rtslib ; done
glusterfs-storage-8xjjr
+++++++++++++++++++++++
python-rtslib-2.1.fb63-11.el7_5.noarch
glusterfs-storage-hwn97
+++++++++++++++++++++++
python-rtslib-2.1.fb63-12.el7_5.noarch
glusterfs-storage-xg2bv
+++++++++++++++++++++++
python-rtslib-2.1.fb63-11.el7_5.noarch

Comment 15 Raghavendra Talur 2018-07-12 03:26:56 UTC

I have determined that heketi does log the stdout and stderr for failed commands. We won't be interpreting the json in failed commands though. 

So only thing pending for this bug is fix in gluster-block to return right status.

Comment 19 errata-xmlrpc 2018-09-12 09:27:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691

Note You need to log in before you can comment on or make changes to this bug.