1598748 – Block device deletion returns "SUCCESS" even though deletion has failed on one of the nodes

Bug 1598748 - Block device deletion returns "SUCCESS" even though deletion has failed on one of the nodes

Summary: Block device deletion returns "SUCCESS" even though deletion has failed on on...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	cns-3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Prasanna Kumar Kalever
QA Contact:	Nitin Goyal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1568862
TreeView+	depends on / blocked

Reported:	2018-07-06 11:33 UTC by Rachael
Modified:	2018-09-12 09:28 UTC (History)
CC List:	11 users (show)
Fixed In Version:	gluster-block-0.2.1-21.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-12 09:27:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2691	0	None	None	None	2018-09-12 09:28:22 UTC

Description Rachael 2018-07-06 11:33:25 UTC

Description of problem:
On a CNS setup, with existing block PVCs, a script was run to delete 10 PVCs while killing the targetcli process on one of the pods. The deletion of the PVCs were successful and the block devices got deleted from both heketi and gluster backend. However the following message was seen in the heketi logs:

[kubeexec] DEBUG 2018/07/06 09:21:51 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-244.lab.eng.blr.redhat.com Pod: glusterfs-storage-w9jcs Command: gluster-block delete vol_37b50be8ac1fb551ad7f1b2985d8b6a7/test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 --json
Result: { "FAILED ON": [ "10.70.46.244" ], "SUCCESSFUL ON": [ "10.70.47.60", "10.70.47.95" ], "RESULT": "SUCCESS" }

On checking the gluster-blockd logs the following error was seen: 

[2018-07-06 09:21:49.707366] INFO: delete cli request, volume=vol_37b50be8ac1fb551ad7f1b2985d8b6a7 blockname=test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 [at block_svc_routines.c+4530 :<block_delete_cli_1_svc_st>]
[2018-07-06 09:21:49.813627] INFO: delete request, blockname=test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 filename=31dbec61-2f92-43d2-afae-8d2620585511 [at block_svc_routines.c+4651 :<block_delete_1_svc_st>]
[2018-07-06 09:21:49.835666] ERROR: No target config for block test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203. [at block_svc_routines.c+4673 :<block_delete_1_svc_st>]
[2018-07-06 09:21:49.948161] ERROR: failed in remote delete for block test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 on host 10.70.46.244 volume vol_37b50be8ac1fb551ad7f1b2985d8b6a7 [at block_svc_routines.c+1031 :<glusterBlockDeleteRemote>]
[2018-07-06 09:21:51.208877] ERROR: failed to delete config on 10.70.46.244 No target config for block test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203.: on volume vol_37b50be8ac1fb551ad7f1b2985d8b6a7 on host 10.70.46.244 [at block_svc_routines.c+1115 :<glusterBlockCollectAttemptSuccess>]

The saveconfig.json file still has the entries for the deleted block devices.



Version-Release number of selected component (if applicable):

# oc version
oc v3.10.0-0.67.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

# rpm -qa|grep heketi
python-heketi-7.0.0-2.el7rhgs.x86_64
heketi-client-7.0.0-2.el7rhgs.x86_64
heketi-7.0.0-2.el7rhgs.x86_64


# rpm -qa|grep gluster
glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-api-3.8.4-54.12.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64
glusterfs-server-3.8.4-54.12.el7rhgs.x86_64
gluster-block-0.2.1-20.el7rhgs.x86_64


How reproducible:3/3


Steps to Reproduce:

1. On one of the gluster pods,run the following loop to kill targetcli process: while(true); do pkill targetcli; done 

2. From the master node, delete block pvc: oc delete pvc <claim_name>


Actual results:
The deletion of the PVC is successful even though it has failed on one of the nodes.

Additional info:
Logs will be attached soon

Comment 6 Pranith Kumar K 2018-07-09 09:26:56 UTC

Racheal,
   Could you provide QE-ack?

Comment 11 Nitin Goyal 2018-07-30 09:26:14 UTC

Hi,

I verified this bug on below rpm and container Images. It is working fine.

Rpm: ->
gluster-block-0.2.1-22.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-3.8.4-54.15.el7rhgs.x86_64
glusterfs-api-3.8.4-54.15.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64
glusterfs-server-3.8.4-54.15.el7rhgs.x86_64
gluster-block-0.2.1-22.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64

rhel 7.53 kernel
3.10.0-862.11.2.el7.x86_64

Container Images: ->
rhgs-server-rhel7:3.3.1-27
rhgs-gluster-block-prov-rhel7:3.3.1-20

What i observed : ->

When i am deleting pvc and running "while(true); do pkill targetcli; done" simultaneously on one gluster pod, Pvc did not got deleted, But stale entries got deleted from other gluster pods.

When i stop the script "while(true); do pkill targetcli; done" block device got deleted and stale entries are also got deleted.


when Pvc was not deleted yet: ->

[root@dhcp46-180 ~]# oc get pvc | grep c126
c126      Bound     pvc-70f7c93a-919e-11e8-b631-005056a53010   1Gi        RWO            block-sc       2d
[root@dhcp46-180 ~]# 
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004
[root@dhcp46-180 ~]# 
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c cat /etc/target/saveconfig.json | grep c126
      "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", 
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 cat /etc/target/saveconfig.json | grep c126
      "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", 
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq cat /etc/target/saveconfig.json | grep c126
      "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", 
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"



Running script on one gluster pod: ->

sh-4.2# while(true); do pkill targetcli; done

When pvc delete command given: ->

[root@dhcp46-180 ~]# oc delete pvc c126
persistentvolumeclaim "c126" deleted
[root@dhcp46-180 ~]# 
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c cat /etc/target/saveconfig.json | grep c126
      "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", 
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
              "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004"
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 cat /etc/target/saveconfig.json | grep c126
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq cat /etc/target/saveconfig.json | grep c126
[root@dhcp46-180 ~]# 

Stopping the script on gluster pod
sh-4.2# while(true); do pkill targetcli; done 
^C
sh-4.2# 

After stoping script: ->

[root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126
[root@dhcp46-180 ~]# 
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c cat /etc/target/saveconfig.json | grep c126
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 cat /etc/target/saveconfig.json | grep c126
[root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq cat /etc/target/saveconfig.json | grep c126
[root@dhcp46-180 ~]#

Comment 13 errata-xmlrpc 2018-09-12 09:27:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691

Note You need to log in before you can comment on or make changes to this bug.