Description of problem: +++++++++++++++++++++ We were executing following Test Case : "On different version of block between 3 nodes - create and delete PVC in a loop". As part of the Test case, we had a OCP 3.9 + CNS 3.9 async3 setup with 3 nodes X,Y and Z. We had created some pvcs and confirmed from "targetcli ls" that Node X is the AO target node for all the PVCs. Upgraded Node X to latest gluster-block version - gluster-block-0.2.1-20.el7rhgs.x86_64(by re-spinnning glusterfs pod with latest CNS 3.10 image - 3.3.1-21). Once Node is back up, checked that both gluster-blockd and other services are up. Tried creating new pvcs, but pvc creation failed with unable to run command on pod X which has latest gluster-blockd. Seen from gluster-blockd logs, "create_load_balance' doesn't exit on 10.70.42.84" ================================================ pvc ++++++++ Normal ExternalProvisioning 3m (x88 over 8m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock" or manually created by system administrator Normal Provisioning 12s (x11 over 8m) gluster.org/glusterblock a34cb863-7fb2-11e8-8a40-0a580a80000a External provisioner is provisioning volume for claim "glusterfs/upg3" Warning ProvisioningFailed 10s (x11 over 8m) gluster.org/glusterblock a34cb863-7fb2-11e8-8a40-0a580a80000a Failed to provision volume with StorageClass "gluster-block": failed to create volume: [heketi] failed to create volume: Unable to execute command on glusterfs-storage-hwn97: Gluster-block logs ++++++++++++++ [2018-07-04 18:27:06.167238] INFO: create cli request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a mpath=3 blockhosts=10.70.42.223,10.70.42.84,10.70.41.217 authmode=1 size=5368709120, rbsize=0 [at block_svc_routines.c+3814 :<block_create_cli_1_svc_st>] [2018-07-04 18:27:06.167780] ERROR: block remote version failed: RPC: Procedure unavailable on host 10.70.41.217 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>] [2018-07-04 18:27:06.167898] ERROR: block remote version failed: RPC: Procedure unavailable on host 10.70.42.84 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>] [2018-07-04 18:27:06.167983] ERROR: glusterBlockCapabilityRemoteAsync() failed (capability 'create_load_balance' doesn't exit on 10.70.42.84) [at block_svc_routines.c+1761 :<glusterBlockCheckCapabilities>] [2018-07-04 18:27:06.290780] INFO: create request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a blockhosts=10.70.42.223,10.70.42.84,10.70.41.217 filename=33c77079-7ef9-4197-9766-a0d6f93e90b4 authmode=1 passwd=f2d6a92a-f9bf-4f2f-8298-06e38f2aa468 size=5368709120 [at block_svc_routines.c+4227 :<block_create_common>] [2018-07-04 18:27:06.655525] INFO: command exit code, 0 [at block_svc_routines.c+4403 :<block_create_common>] [2018-07-04 18:27:06.747205] INFO: Block create request satisfied for target: blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a on volume vol_72fd8a6c364a0b2d3cd6a4fa3e89791d with given hosts 10.70.42.223,10.70.42.84,10.70.41.217 [at block_svc_routines.c+3030 :<glusterBlockAuditRequest>] [2018-07-04 18:27:06.837998] INFO: delete cli request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a [at block_svc_routines.c+4530 :<block_delete_cli_1_svc_st>] [2018-07-04 18:27:06.846553] ERROR: block remote version failed: RPC: Procedure unavailable on host 10.70.41.217 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>] [2018-07-04 18:27:06.846633] ERROR: block remote version failed: RPC: Procedure unavailable on host 10.70.42.84 [at block_svc_routines.c+542 :<glusterBlockCallRPC_1>] [2018-07-04 18:27:06.857926] INFO: delete request, blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a filename=33c77079-7ef9-4197-9766-a0d6f93e90b4 [at block_svc_routines.c+4651 :<block_delete_1_svc_st>] [2018-07-04 18:27:07.307856] INFO: command exit code, 0 [at block_svc_routines.c+4697 :<block_delete_1_svc_st>] [2018-07-04 18:27:07.590359] INFO: delete cli request, volume=vol_72fd8a6c364a0b2d3cd6a4fa3e89791d blockname=blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a [at block_svc_routines.c+4530 :<block_delete_cli_1_svc_st>] [2018-07-04 18:27:07.594704] ERROR: block with name blk_glusterfs_upg3_da84519e-7fb7-11e8-8a40-0a580a80000a doesn't exist in the volume vol_72fd8a6c364a0b2d3cd6a4fa3e89791d [at block_svc_routines.c+4565 :<block_delete_cli_1_svc_st>] ============================================================ Some points of consideration ++++++++++++++++++++++ 1. Load balancing in target portals is supported from gluster-block-0.2.1-19 and above. The gluster-block verison in CNS 3.9 ASYNC3 is gluster-block-0.2.1-14.1 2. This issue is seen only when the AO pod is at a higher version than the rest of the pods, and then pv creates are performed. 3. On another setup, non AO pod, say Y was upgraded and pvc creates were started. pvc creation succeeded in that setup. 4. Upgrade in presence How reproducible: ++++++++++++++++++++ 2/2 Steps to Reproduce: +++++++++++++++++++++ 1. Create a CNS 3.9 setup with 3 gluster nodes X,Y,Z 2. Create some pvcs to confirm which node/pod is selected is selected as the Active-optimized(AO) target portal. Say the node is X. The first node listed in heketi-cli bloxkvolume info is the system selected AO pod.:q! 3. Using CNS gluster pod upgrade process, upgrade the pod on X to latest CNS 3.10 build , i.e. latest gluster-blockd build. 4 Once the pod comes up fine, try creating PVCs. As a corner case (when above issue is not encountered) +++++++++++++++ 1. Create a CNS 3.9 setup with 3 gluster nodes X,Y,Z 2. Create some pvcs to confirm which node/pod is selected is selected as the Active-optimized(AO) target portal. Say the node is X. The first node listed in heketi-cli bloxkvolume info is the system selected AO pod.:q! 3. Using CNS gluster pod upgrade process, upgrade any other pod than pod X, i.e. upgrade pod Y(which is not the AO pod for the pvcs) 4 Once the pod comes up fine, try creating PVCs. 5. PVC creation and deletion will be successful as the pod which is at a higher version is not the one used as Active for the pvcs. Actual results: +++++++++++++++ When we have the Active pod at a higher version, pvc and other config commands fail to get executed at the Active pod. Expected results: ++++++++++++++++++ Since pods are at different versions, as per functionality, the features of the lowest version should always be functional and no create and delete tasks should fail. Additional info: Version-Release number of selected component (if applicable): --------------------------------------- # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep targetcli ; done glusterfs-storage-8xjjr +++++++++++++++++++++++ targetcli-2.1.fb46-4.el7_5.noarch glusterfs-storage-hwn97 +++++++++++++++++++++++ targetcli-2.1.fb46-6.el7_5.noarch glusterfs-storage-xg2bv +++++++++++++++++++++++ targetcli-2.1.fb46-4.el7_5.noarch [root@dhcp42-137 logs_target_pods]# [root@dhcp42-137 logs_target_pods]# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep gluster-block ; done glusterfs-storage-8xjjr +++++++++++++++++++++++ gluster-block-0.2.1-14.1.el7rhgs.x86_64 glusterfs-storage-hwn97 +++++++++++++++++++++++ gluster-block-0.2.1-20.el7rhgs.x86_64 glusterfs-storage-xg2bv +++++++++++++++++++++++ gluster-block-0.2.1-14.1.el7rhgs.x86_64 [root@dhcp42-137 logs_target_pods]# [root@dhcp42-137 logs_target_pods]# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa|grep tcmu-runner ; done glusterfs-storage-8xjjr +++++++++++++++++++++++ tcmu-runner-1.2.0-16.el7rhgs.x86_64 glusterfs-storage-hwn97 +++++++++++++++++++++++ tcmu-runner-1.2.0-20.el7rhgs.x86_64 glusterfs-storage-xg2bv +++++++++++++++++++++++ tcmu-runner-1.2.0-16.el7rhgs.x86_64 [root@dhcp42-137 logs_target_pods]# [root@dhcp42-137 logs_target_pods]# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-configshell ; done glusterfs-storage-8xjjr +++++++++++++++++++++++ python-configshell-1.1.fb23-4.el7_5.noarch glusterfs-storage-hwn97 +++++++++++++++++++++++ python-configshell-1.1.fb23-4.el7_5.noarch glusterfs-storage-xg2bv +++++++++++++++++++++++ python-configshell-1.1.fb23-4.el7_5.noarch [root@dhcp42-137 logs_target_pods]# [root@dhcp42-137 logs_target_pods]# for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- rpm -qa | grep python-rtslib ; done glusterfs-storage-8xjjr +++++++++++++++++++++++ python-rtslib-2.1.fb63-11.el7_5.noarch glusterfs-storage-hwn97 +++++++++++++++++++++++ python-rtslib-2.1.fb63-12.el7_5.noarch glusterfs-storage-xg2bv +++++++++++++++++++++++ python-rtslib-2.1.fb63-11.el7_5.noarch
I have determined that heketi does log the stdout and stderr for failed commands. We won't be interpreting the json in failed commands though. So only thing pending for this bug is fix in gluster-block to return right status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691