Created attachment 1449068 [details] attaching heketi.log Description of problem: Difference in volume count in heketi and gluster volume list Version-Release number of selected component (if applicable): heketi-client-7.0.0-1.el7rhgs.x86_64 oc v3.10.0-0.63.0 How reproducible: Once Steps to Reproduce: 1. Run automated cases on OCP3.10+CNS3.10 setup 2. Automated Heketi cases started failing, while debugging found that few heketi volumes are not deleted Additional info: [root@dhcp46-165 ~]# oc get pods NAME READY STATUS RESTARTS AGE glusterblock-storage-provisioner-dc-1-qns6z 1/1 Running 0 3h glusterfs-storage-gxp7c 1/1 Running 0 2h glusterfs-storage-pg4xc 1/1 Running 0 3h glusterfs-storage-vsh2m 1/1 Running 0 3h heketi-storage-1-wgnwd 1/1 Running 0 2h [root@dhcp46-165 ~]# oc get nodes NAME STATUS ROLES AGE VERSION dhcp46-122.lab.eng.blr.redhat.com Ready compute,infra 18h v1.10.0+b81c8f8 dhcp46-165.lab.eng.blr.redhat.com Ready master 18h v1.10.0+b81c8f8 dhcp46-187.lab.eng.blr.redhat.com Ready compute 18h v1.10.0+b81c8f8 dhcp47-76.lab.eng.blr.redhat.com Ready compute 18h v1.10.0+b81c8f8 [root@dhcp46-165 ~]# oc rsh glusterfs-storage-gxp7c gluster v list | wc -l 34 [root@dhcp46-165 ~]# heketi-cli -s http://172.31.69.60:8080 volume list --secret 'adminkey' --user admin | wc -l 37
Sosreports - http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1589070/
madhu, as discussed, please investigate.
Michael, I have done my analysis, There are 3 volumes which are present in heketi and not in gluster Id:5db55c483f6984d9917cfe4b3c8b3cbc Cluster:0a73c60efdd4673113b668afea101e6d Name:vol_5db55c483f6984d9917cfe4b3c8b3cbc Id:9c7d5f0f473cce6914804135f0b8ddcd Cluster:0a73c60efdd4673113b668afea101e6d Name:vol_9c7d5f0f473cce6914804135f0b8ddcd Id:9c85de66c12db0a72b4d16fe888ff74d Cluster:0a73c60efdd4673113b668afea101e6d Name:vol_9c85de66c12db0a72b4d16fe888ff74d analysis 1)Volume vol_5db55c483f6984d9917cfe4b3c8b3cbc , volume deletion was successful and there was a failure in DestroyBricks, Hence volume was deleted from gluster, not from heketi db Heketi Log: [kubeexec] DEBUG 2018/06/08 08:54:09 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: gluster --mode=script snapshot list vol_5db55c483f6984d9917cfe4b3c8b3cbc --xml [kubeexec] DEBUG 2018/06/08 08:54:15 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: gluster --mode=script volume stop vol_5db55c483f6984d9917cfe4b3c8b3cbc force Result: volume stop: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success [kubeexec] DEBUG 2018/06/08 08:54:19 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: gluster --mode=script volume delete vol_5db55c483f6984d9917cfe4b3c8b3cbc Result: volume delete: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success [heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/brick_create.go:73: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:532: Unable to delete bricks: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [asynchttp] INFO 2018/06/08 08:56:40 asynchttp.go:292: Completed job 20e932645b6c4b3853e7a46917e686e7 in 2m32.540224696s [heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete Volume Failed: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [kubeexec] DEBUG 2018/06/08 08:56:40 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: lvremove --autobackup=n -f /dev/mapper/vg_96f1667f2f1ced2c5ef94772922be93b-brick_df5559fdf15f6372e19e4e9f8bc1f129 2)Volume vol_9c7d5f0f473cce6914804135f0b8ddcd From heketi logs it looks like, there was not log for volume delete command from heketi to gluster. The listing volume itself not succeed for this volume deletion From cmd_histrory log: it looks like the volume has been deleted from gluster Heketi log: [kubeexec] DEBUG 2018/06/08 07:42:20 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script snapshot list vol_9c7d5f0f473cce6914804135f0b8ddcd --xml <opErrstr>Volume (vol_9c7d5f0f473cce6914804135f0b8ddcd) does not exist</opErrstr> [kubeexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume stop vol_9c7d5f0f473cce6914804135f0b8ddcd force] on glusterfs-storage-gxp7c: Err[command terminated with exit code 1]: Stdout []: Stderr [volume stop: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist [cmdexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:144: Unable to stop volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume stop: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist [kubeexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume delete vol_9c7d5f0f473cce6914804135f0b8ddcd] on glusterfs-storage-gxp7c: Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist [cmdexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist [heketi] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist [heketi] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist cmd_histrory Log: [2018-06-08 07:25:30.159615] : volume stop vol_9c7d5f0f473cce6914804135f0b8ddcd force : SUCCESS [2018-06-08 07:25:30.445577] : volume delete vol_9c7d5f0f473cce6914804135f0b8ddcd : SUCCESS 3)Volume vol_9c85de66c12db0a72b4d16fe888ff74d not able to find volume delete command from cmd_history from all nodes. not sure how this volume got deleted Heketi Log: [kubeexec] DEBUG 2018/06/08 08:43:52 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force Result: volume stop: vol_9c85de66c12db0a72b4d16fe888ff74d: success [kubeexec] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume delete vol_9c85de66c12db0a72b4d16fe888ff74d] on glusterfs-storage-gxp7c: Err[command terminated with exit code 1]: Stdout [Error : Request timed out [cmdexec] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: [heketi] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: [heketi] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: [heketi] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete Volume Failed: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: [heketi] ERROR 2018/06/08 08:48:06 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:510: Unable to determine snapshot information from volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF [heketi] ERROR 2018/06/08 08:48:06 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to determine snapshot information from volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF [kubeexec] DEBUG 2018/06/08 08:48:06 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d --xml ├─13541 /usr/bin/python /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post --volname=vol_9c85de66c12db0a72b4d16fe888ff74d [heketi] ERROR 2018/06/08 08:54:08 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete Volume Failed: Unable to determine snapshot information from volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF [kubeexec] DEBUG 2018/06/08 08:54:52 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d --xml <opErrstr>Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist</opErrstr> cmd_history Log: [2018-06-08 08:43:52.156891] : volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force : SUCCESS [2018-06-08 08:53:58.354836] : snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d : FAILED : Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist
(In reply to Madhu Rajanna from comment #4) > Michael, I have done my analysis, > > > There are 3 volumes which are present in heketi and not in gluster > > Id:5db55c483f6984d9917cfe4b3c8b3cbc > Cluster:0a73c60efdd4673113b668afea101e6d > Name:vol_5db55c483f6984d9917cfe4b3c8b3cbc > Id:9c7d5f0f473cce6914804135f0b8ddcd > Cluster:0a73c60efdd4673113b668afea101e6d > Name:vol_9c7d5f0f473cce6914804135f0b8ddcd > Id:9c85de66c12db0a72b4d16fe888ff74d > Cluster:0a73c60efdd4673113b668afea101e6d > Name:vol_9c85de66c12db0a72b4d16fe888ff74d > > analysis > > 1)Volume vol_5db55c483f6984d9917cfe4b3c8b3cbc , > volume deletion was successful and there was a failure in DestroyBricks, > Hence volume was deleted from gluster, not from heketi db This analysis looks right. > > Heketi Log: > > [kubeexec] DEBUG 2018/06/08 08:54:09 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: > gluster --mode=script snapshot list vol_5db55c483f6984d9917cfe4b3c8b3cbc > --xml > [kubeexec] DEBUG 2018/06/08 08:54:15 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: > gluster --mode=script volume stop vol_5db55c483f6984d9917cfe4b3c8b3cbc force > Result: volume stop: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success > [kubeexec] DEBUG 2018/06/08 08:54:19 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: > gluster --mode=script volume delete vol_5db55c483f6984d9917cfe4b3c8b3cbc > Result: volume delete: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success > > > [heketi] ERROR 2018/06/08 08:56:40 > /src/github.com/heketi/heketi/apps/glusterfs/brick_create.go:73: Unable to > execute command on glusterfs-storage-gxp7c: umount: > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > brick_4f4d753d298c99eac492c32006c74484: target is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > [heketi] ERROR 2018/06/08 08:56:40 > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:532: Unable to > delete bricks: Unable to execute command on glusterfs-storage-gxp7c: umount: > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > brick_4f4d753d298c99eac492c32006c74484: target is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > [heketi] ERROR 2018/06/08 08:56:40 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > executing delete volume: Unable to execute command on > glusterfs-storage-gxp7c: umount: > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > brick_4f4d753d298c99eac492c32006c74484: target is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > [asynchttp] INFO 2018/06/08 08:56:40 asynchttp.go:292: Completed job > 20e932645b6c4b3853e7a46917e686e7 in 2m32.540224696s > [heketi] ERROR 2018/06/08 08:56:40 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete > Volume Failed: Unable to execute command on glusterfs-storage-gxp7c: umount: > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > brick_4f4d753d298c99eac492c32006c74484: target is busy. > (In some cases useful info about processes that use > the device is found by lsof(8) or fuser(1)) > [kubeexec] DEBUG 2018/06/08 08:56:40 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > lvremove --autobackup=n -f > /dev/mapper/vg_96f1667f2f1ced2c5ef94772922be93b- > brick_df5559fdf15f6372e19e4e9f8bc1f129 > > > > > > 2)Volume vol_9c7d5f0f473cce6914804135f0b8ddcd > > From heketi logs it looks like, there was not log for volume delete command > from heketi to gluster. The listing volume itself not succeed for this > volume deletion > > From cmd_histrory log: > it looks like the volume has been deleted from gluster Please check if there is a possibility of a volume create failure where the volume is deleted by heketi as part of cleanup but the entry remains. > > Heketi log: > > [kubeexec] DEBUG 2018/06/08 07:42:20 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > gluster --mode=script snapshot list vol_9c7d5f0f473cce6914804135f0b8ddcd > --xml > <opErrstr>Volume (vol_9c7d5f0f473cce6914804135f0b8ddcd) does not > exist</opErrstr> > [kubeexec] ERROR 2018/06/08 07:42:21 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to > run command [gluster --mode=script volume stop > vol_9c7d5f0f473cce6914804135f0b8ddcd force] on glusterfs-storage-gxp7c: > Err[command terminated with exit code 1]: Stdout []: Stderr [volume stop: > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > [cmdexec] ERROR 2018/06/08 07:42:21 > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:144: Unable to > stop volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command > on glusterfs-storage-gxp7c: volume stop: > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > [kubeexec] ERROR 2018/06/08 07:42:21 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to > run command [gluster --mode=script volume delete > vol_9c7d5f0f473cce6914804135f0b8ddcd] on glusterfs-storage-gxp7c: > Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete: > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > [cmdexec] ERROR 2018/06/08 07:42:21 > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to > delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute > command on glusterfs-storage-gxp7c: volume delete: > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > [heketi] ERROR 2018/06/08 07:42:21 > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to > delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: > Unable to execute command on glusterfs-storage-gxp7c: volume delete: > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > [heketi] ERROR 2018/06/08 07:42:21 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > executing delete volume: Unable to delete volume > vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on > glusterfs-storage-gxp7c: volume delete: > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > > cmd_histrory Log: > > [2018-06-08 07:25:30.159615] : volume stop > vol_9c7d5f0f473cce6914804135f0b8ddcd force : SUCCESS > [2018-06-08 07:25:30.445577] : volume delete > vol_9c7d5f0f473cce6914804135f0b8ddcd : SUCCESS > > > 3)Volume vol_9c85de66c12db0a72b4d16fe888ff74d > > not able to find volume delete command from cmd_history from all nodes. not > sure how this volume got deleted > > Heketi Log: > > [kubeexec] DEBUG 2018/06/08 08:43:52 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > gluster --mode=script volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force > Result: volume stop: vol_9c85de66c12db0a72b4d16fe888ff74d: success > [kubeexec] ERROR 2018/06/08 08:45:52 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to > run command [gluster --mode=script volume delete > vol_9c85de66c12db0a72b4d16fe888ff74d] on glusterfs-storage-gxp7c: > Err[command terminated with exit code 1]: Stdout [Error : Request timed out > [cmdexec] ERROR 2018/06/08 08:45:52 > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to > delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute > command on glusterfs-storage-gxp7c: > [heketi] ERROR 2018/06/08 08:45:52 > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to > delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: > Unable to execute command on glusterfs-storage-gxp7c: > [heketi] ERROR 2018/06/08 08:45:52 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > executing delete volume: Unable to delete volume > vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on > glusterfs-storage-gxp7c: > [heketi] ERROR 2018/06/08 08:45:52 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete > Volume Failed: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: > Unable to execute command on glusterfs-storage-gxp7c: > [heketi] ERROR 2018/06/08 08:48:06 > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:510: Unable to > determine snapshot information from volume > vol_9c85de66c12db0a72b4d16fe888ff74d: EOF > [heketi] ERROR 2018/06/08 08:48:06 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > executing delete volume: Unable to determine snapshot information from > volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF > [kubeexec] DEBUG 2018/06/08 08:48:06 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d > --xml > ├─13541 /usr/bin/python > /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post > --volname=vol_9c85de66c12db0a72b4d16fe888ff74d > [heketi] ERROR 2018/06/08 08:54:08 > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete > Volume Failed: Unable to determine snapshot information from volume > vol_9c85de66c12db0a72b4d16fe888ff74d: EOF > [kubeexec] DEBUG 2018/06/08 08:54:52 > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d > --xml > <opErrstr>Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not > exist</opErrstr> > > > cmd_history Log: > > [2018-06-08 08:43:52.156891] : volume stop > vol_9c85de66c12db0a72b4d16fe888ff74d force : SUCCESS > [2018-06-08 08:53:58.354836] : snapshot list > vol_9c85de66c12db0a72b4d16fe888ff74d : FAILED : Volume > (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist Have you checked the cmd_history on all the nodes?
(In reply to Raghavendra Talur from comment #5) > (In reply to Madhu Rajanna from comment #4) > > Michael, I have done my analysis, > > > > > > There are 3 volumes which are present in heketi and not in gluster > > > > Id:5db55c483f6984d9917cfe4b3c8b3cbc > > Cluster:0a73c60efdd4673113b668afea101e6d > > Name:vol_5db55c483f6984d9917cfe4b3c8b3cbc > > Id:9c7d5f0f473cce6914804135f0b8ddcd > > Cluster:0a73c60efdd4673113b668afea101e6d > > Name:vol_9c7d5f0f473cce6914804135f0b8ddcd > > Id:9c85de66c12db0a72b4d16fe888ff74d > > Cluster:0a73c60efdd4673113b668afea101e6d > > Name:vol_9c85de66c12db0a72b4d16fe888ff74d > > > > analysis > > > > 1)Volume vol_5db55c483f6984d9917cfe4b3c8b3cbc , > > volume deletion was successful and there was a failure in DestroyBricks, > > Hence volume was deleted from gluster, not from heketi db > > This analysis looks right. > > > > > Heketi Log: > > > > [kubeexec] DEBUG 2018/06/08 08:54:09 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: > > gluster --mode=script snapshot list vol_5db55c483f6984d9917cfe4b3c8b3cbc > > --xml > > [kubeexec] DEBUG 2018/06/08 08:54:15 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: > > gluster --mode=script volume stop vol_5db55c483f6984d9917cfe4b3c8b3cbc force > > Result: volume stop: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success > > [kubeexec] DEBUG 2018/06/08 08:54:19 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: > > gluster --mode=script volume delete vol_5db55c483f6984d9917cfe4b3c8b3cbc > > Result: volume delete: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success > > > > > > [heketi] ERROR 2018/06/08 08:56:40 > > /src/github.com/heketi/heketi/apps/glusterfs/brick_create.go:73: Unable to > > execute command on glusterfs-storage-gxp7c: umount: > > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > > brick_4f4d753d298c99eac492c32006c74484: target is busy. > > (In some cases useful info about processes that use > > the device is found by lsof(8) or fuser(1)) > > [heketi] ERROR 2018/06/08 08:56:40 > > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:532: Unable to > > delete bricks: Unable to execute command on glusterfs-storage-gxp7c: umount: > > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > > brick_4f4d753d298c99eac492c32006c74484: target is busy. > > (In some cases useful info about processes that use > > the device is found by lsof(8) or fuser(1)) > > [heketi] ERROR 2018/06/08 08:56:40 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > > executing delete volume: Unable to execute command on > > glusterfs-storage-gxp7c: umount: > > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > > brick_4f4d753d298c99eac492c32006c74484: target is busy. > > (In some cases useful info about processes that use > > the device is found by lsof(8) or fuser(1)) > > [asynchttp] INFO 2018/06/08 08:56:40 asynchttp.go:292: Completed job > > 20e932645b6c4b3853e7a46917e686e7 in 2m32.540224696s > > [heketi] ERROR 2018/06/08 08:56:40 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete > > Volume Failed: Unable to execute command on glusterfs-storage-gxp7c: umount: > > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/ > > brick_4f4d753d298c99eac492c32006c74484: target is busy. > > (In some cases useful info about processes that use > > the device is found by lsof(8) or fuser(1)) > > [kubeexec] DEBUG 2018/06/08 08:56:40 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > > lvremove --autobackup=n -f > > /dev/mapper/vg_96f1667f2f1ced2c5ef94772922be93b- > > brick_df5559fdf15f6372e19e4e9f8bc1f129 > > > > > > > > > > > > 2)Volume vol_9c7d5f0f473cce6914804135f0b8ddcd > > > > From heketi logs it looks like, there was not log for volume delete command > > from heketi to gluster. The listing volume itself not succeed for this > > volume deletion > > > > From cmd_histrory log: > > it looks like the volume has been deleted from gluster > > Please check if there is a possibility of a volume create failure where the > volume is deleted by heketi as part of cleanup but the entry remains. I dont see any volume create or brick creation log for this volume in heketi.log file > > > > Heketi log: > > > > [kubeexec] DEBUG 2018/06/08 07:42:20 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > > gluster --mode=script snapshot list vol_9c7d5f0f473cce6914804135f0b8ddcd > > --xml > > <opErrstr>Volume (vol_9c7d5f0f473cce6914804135f0b8ddcd) does not > > exist</opErrstr> > > [kubeexec] ERROR 2018/06/08 07:42:21 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to > > run command [gluster --mode=script volume stop > > vol_9c7d5f0f473cce6914804135f0b8ddcd force] on glusterfs-storage-gxp7c: > > Err[command terminated with exit code 1]: Stdout []: Stderr [volume stop: > > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > [cmdexec] ERROR 2018/06/08 07:42:21 > > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:144: Unable to > > stop volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command > > on glusterfs-storage-gxp7c: volume stop: > > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > [kubeexec] ERROR 2018/06/08 07:42:21 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to > > run command [gluster --mode=script volume delete > > vol_9c7d5f0f473cce6914804135f0b8ddcd] on glusterfs-storage-gxp7c: > > Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete: > > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > [cmdexec] ERROR 2018/06/08 07:42:21 > > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to > > delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute > > command on glusterfs-storage-gxp7c: volume delete: > > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > [heketi] ERROR 2018/06/08 07:42:21 > > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to > > delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: > > Unable to execute command on glusterfs-storage-gxp7c: volume delete: > > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > [heketi] ERROR 2018/06/08 07:42:21 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > > executing delete volume: Unable to delete volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on > > glusterfs-storage-gxp7c: volume delete: > > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume > > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist > > > > > > cmd_histrory Log: > > > > [2018-06-08 07:25:30.159615] : volume stop > > vol_9c7d5f0f473cce6914804135f0b8ddcd force : SUCCESS > > [2018-06-08 07:25:30.445577] : volume delete > > vol_9c7d5f0f473cce6914804135f0b8ddcd : SUCCESS > > > > > > 3)Volume vol_9c85de66c12db0a72b4d16fe888ff74d > > > > not able to find volume delete command from cmd_history from all nodes. not > > sure how this volume got deleted > > > > Heketi Log: > > > > [kubeexec] DEBUG 2018/06/08 08:43:52 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > > gluster --mode=script volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force > > Result: volume stop: vol_9c85de66c12db0a72b4d16fe888ff74d: success > > [kubeexec] ERROR 2018/06/08 08:45:52 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to > > run command [gluster --mode=script volume delete > > vol_9c85de66c12db0a72b4d16fe888ff74d] on glusterfs-storage-gxp7c: > > Err[command terminated with exit code 1]: Stdout [Error : Request timed out > > [cmdexec] ERROR 2018/06/08 08:45:52 > > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to > > delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute > > command on glusterfs-storage-gxp7c: > > [heketi] ERROR 2018/06/08 08:45:52 > > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to > > delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: > > Unable to execute command on glusterfs-storage-gxp7c: > > [heketi] ERROR 2018/06/08 08:45:52 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > > executing delete volume: Unable to delete volume > > vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on > > glusterfs-storage-gxp7c: > > [heketi] ERROR 2018/06/08 08:45:52 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete > > Volume Failed: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: > > Unable to execute command on glusterfs-storage-gxp7c: > > [heketi] ERROR 2018/06/08 08:48:06 > > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:510: Unable to > > determine snapshot information from volume > > vol_9c85de66c12db0a72b4d16fe888ff74d: EOF > > [heketi] ERROR 2018/06/08 08:48:06 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error > > executing delete volume: Unable to determine snapshot information from > > volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF > > [kubeexec] DEBUG 2018/06/08 08:48:06 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > > gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d > > --xml > > ├─13541 /usr/bin/python > > /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post > > --volname=vol_9c85de66c12db0a72b4d16fe888ff74d > > [heketi] ERROR 2018/06/08 08:54:08 > > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete > > Volume Failed: Unable to determine snapshot information from volume > > vol_9c85de66c12db0a72b4d16fe888ff74d: EOF > > [kubeexec] DEBUG 2018/06/08 08:54:52 > > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: > > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: > > gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d > > --xml > > <opErrstr>Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not > > exist</opErrstr> > > > > > > cmd_history Log: > > > > [2018-06-08 08:43:52.156891] : volume stop > > vol_9c85de66c12db0a72b4d16fe888ff74d force : SUCCESS > > [2018-06-08 08:53:58.354836] : snapshot list > > vol_9c85de66c12db0a72b4d16fe888ff74d : FAILED : Volume > > (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist > > Have you checked the cmd_history on all the nodes? yes i have verified cmd_history on all 3 nodes.
some part of the issue has been fixed in the upstream patch https://github.com/heketi/heketi/pull/1222. John can give more info on this
(In reply to Madhu Rajanna from comment #8) > some part of the issue has been fixed in the upstream patch > https://github.com/heketi/heketi/pull/1222. > John can give more info on this The initial error condition is one triggered at the gluster level (a brick multiplexing issue IIRC) and there is no fix for it in that patch series. This means that the "out of sync" would still be possible as long as the bug persists in gluster. However, the patch series makes is possible to re-run the volume delete on the heketi side , and if the brick mount is no longer in use, the delete should succeed and bring the heketi state into sync with gluster. FWIW, I quoted "out of sync" because in this case I could argue that heketi's not wrong about the volume not being deleted as not all the components of the volume are deleted (by design in the current version). It's gluster that doesn't "care" about the underlying storage and thus a simple comparison of the two lists is a tad bit misleading in the case where there's an underlying bug in gluster.
Forgot to mention that the unmount issue probably is BZ#1524336, if I am not mistaken.
I learnt that there are more steps in automation than just creation and deletion of volumes. It would be good to verify this case with automation and have requested apeksha for the same. Looks like this issue is not being hit with latest CI runs but automation team would like to test it with 3.11 just to confirm.