Description of problem: On a CNS setup, with existing block PVCs, a script was run to delete 10 PVCs while killing the targetcli process on one of the pods. The deletion of the PVCs were successful and the block devices got deleted from both heketi and gluster backend. However the following message was seen in the heketi logs: [kubeexec] DEBUG 2018/07/06 09:21:51 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-244.lab.eng.blr.redhat.com Pod: glusterfs-storage-w9jcs Command: gluster-block delete vol_37b50be8ac1fb551ad7f1b2985d8b6a7/test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 --json Result: { "FAILED ON": [ "10.70.46.244" ], "SUCCESSFUL ON": [ "10.70.47.60", "10.70.47.95" ], "RESULT": "SUCCESS" } On checking the gluster-blockd logs the following error was seen: [2018-07-06 09:21:49.707366] INFO: delete cli request, volume=vol_37b50be8ac1fb551ad7f1b2985d8b6a7 blockname=test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 [at block_svc_routines.c+4530 :<block_delete_cli_1_svc_st>] [2018-07-06 09:21:49.813627] INFO: delete request, blockname=test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 filename=31dbec61-2f92-43d2-afae-8d2620585511 [at block_svc_routines.c+4651 :<block_delete_1_svc_st>] [2018-07-06 09:21:49.835666] ERROR: No target config for block test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203. [at block_svc_routines.c+4673 :<block_delete_1_svc_st>] [2018-07-06 09:21:49.948161] ERROR: failed in remote delete for block test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203 on host 10.70.46.244 volume vol_37b50be8ac1fb551ad7f1b2985d8b6a7 [at block_svc_routines.c+1031 :<glusterBlockDeleteRemote>] [2018-07-06 09:21:51.208877] ERROR: failed to delete config on 10.70.46.244 No target config for block test-vol_glusterfs_claim45_2d9c2d2b-80cf-11e8-a4e5-0a580a810203.: on volume vol_37b50be8ac1fb551ad7f1b2985d8b6a7 on host 10.70.46.244 [at block_svc_routines.c+1115 :<glusterBlockCollectAttemptSuccess>] The saveconfig.json file still has the entries for the deleted block devices. Version-Release number of selected component (if applicable): # oc version oc v3.10.0-0.67.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO # rpm -qa|grep heketi python-heketi-7.0.0-2.el7rhgs.x86_64 heketi-client-7.0.0-2.el7rhgs.x86_64 heketi-7.0.0-2.el7rhgs.x86_64 # rpm -qa|grep gluster glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64 glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-api-3.8.4-54.12.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64 glusterfs-server-3.8.4-54.12.el7rhgs.x86_64 gluster-block-0.2.1-20.el7rhgs.x86_64 How reproducible:3/3 Steps to Reproduce: 1. On one of the gluster pods,run the following loop to kill targetcli process: while(true); do pkill targetcli; done 2. From the master node, delete block pvc: oc delete pvc <claim_name> Actual results: The deletion of the PVC is successful even though it has failed on one of the nodes. Additional info: Logs will be attached soon
Racheal, Could you provide QE-ack?
Hi, I verified this bug on below rpm and container Images. It is working fine. Rpm: -> gluster-block-0.2.1-22.el7rhgs.x86_64 glusterfs-libs-3.8.4-54.15.el7rhgs.x86_64 glusterfs-3.8.4-54.15.el7rhgs.x86_64 glusterfs-api-3.8.4-54.15.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.15.el7rhgs.x86_64 glusterfs-server-3.8.4-54.15.el7rhgs.x86_64 gluster-block-0.2.1-22.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.15.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.15.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.15.el7rhgs.x86_64 rhel 7.53 kernel 3.10.0-862.11.2.el7.x86_64 Container Images: -> rhgs-server-rhel7:3.3.1-27 rhgs-gluster-block-prov-rhel7:3.3.1-20 What i observed : -> When i am deleting pvc and running "while(true); do pkill targetcli; done" simultaneously on one gluster pod, Pvc did not got deleted, But stale entries got deleted from other gluster pods. When i stop the script "while(true); do pkill targetcli; done" block device got deleted and stale entries are also got deleted. when Pvc was not deleted yet: -> [root@dhcp46-180 ~]# oc get pvc | grep c126 c126 Bound pvc-70f7c93a-919e-11e8-b631-005056a53010 1Gi RWO block-sc 2d [root@dhcp46-180 ~]# [root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004 [root@dhcp46-180 ~]# [root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c cat /etc/target/saveconfig.json | grep c126 "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" [root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 cat /etc/target/saveconfig.json | grep c126 "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" [root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq cat /etc/target/saveconfig.json | grep c126 "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" Running script on one gluster pod: -> sh-4.2# while(true); do pkill targetcli; done When pvc delete command given: -> [root@dhcp46-180 ~]# oc delete pvc c126 persistentvolumeclaim "c126" deleted [root@dhcp46-180 ~]# [root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c cat /etc/target/saveconfig.json | grep c126 "name": "blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004", "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" "storage_object": "/backstores/user/blk_glusterfs_c126_714eda17-919e-11e8-9b11-0a580a810004" [root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 cat /etc/target/saveconfig.json | grep c126 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq cat /etc/target/saveconfig.json | grep c126 [root@dhcp46-180 ~]# Stopping the script on gluster pod sh-4.2# while(true); do pkill targetcli; done ^C sh-4.2# After stoping script: -> [root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq gluster-block list vol_355b430ec4dfc1ee674da2e63f12153b | grep c126 [root@dhcp46-180 ~]# [root@dhcp46-180 ~]# oc rsh glusterfs-storage-r6r6c cat /etc/target/saveconfig.json | grep c126 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-rfrx6 cat /etc/target/saveconfig.json | grep c126 [root@dhcp46-180 ~]# oc rsh glusterfs-storage-x94vq cat /etc/target/saveconfig.json | grep c126 [root@dhcp46-180 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691