Bug 1589070 - [Tracker-RHGS-BZ#1631329-BZ#1524336-BZ#1618221] Difference in volume count in heketi and gluster volume list
Summary: [Tracker-RHGS-BZ#1631329-BZ#1524336-BZ#1618221] Difference in volume count in...
Keywords:
Status: CLOSED DUPLICATE of bug 1584639
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhgs-server-container
Version: cns-3.10
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Madhu Rajanna
QA Contact: Apeksha
URL: Tags:  Depends On: 1524336 1628651 (...
Whiteboard:
Depends On: 1524336 1618221 1631329
Blocks: OCS-3.11.1-Engineering-Proposed-BZs OCS-3.11.1-devel-triage-done
TreeView+ depends on / blocked
 
Reported: 2018-06-08 10:25 UTC by Apeksha
Modified: 2019-04-04 03:48 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-04 03:48:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
attaching heketi.log (2.53 MB, text/plain)
2018-06-08 10:25 UTC, Apeksha
no flags Details

Description Apeksha 2018-06-08 10:25:49 UTC
Created attachment 1449068 [details]
attaching heketi.log

Description of problem:
Difference in volume count in heketi and gluster volume list


Version-Release number of selected component (if applicable):
heketi-client-7.0.0-1.el7rhgs.x86_64
oc v3.10.0-0.63.0

How reproducible: Once

Steps to Reproduce:
1. Run automated cases on OCP3.10+CNS3.10 setup
2. Automated Heketi cases started failing, while debugging found that few heketi volumes are not deleted 


Additional info:
[root@dhcp46-165 ~]# oc get pods
NAME                                          READY     STATUS    RESTARTS   AGE
glusterblock-storage-provisioner-dc-1-qns6z   1/1       Running   0          3h
glusterfs-storage-gxp7c                       1/1       Running   0          2h
glusterfs-storage-pg4xc                       1/1       Running   0          3h
glusterfs-storage-vsh2m                       1/1       Running   0          3h
heketi-storage-1-wgnwd                        1/1       Running   0          2h
[root@dhcp46-165 ~]# oc get nodes
NAME                                STATUS    ROLES           AGE       VERSION
dhcp46-122.lab.eng.blr.redhat.com   Ready     compute,infra   18h       v1.10.0+b81c8f8
dhcp46-165.lab.eng.blr.redhat.com   Ready     master          18h       v1.10.0+b81c8f8
dhcp46-187.lab.eng.blr.redhat.com   Ready     compute         18h       v1.10.0+b81c8f8
dhcp47-76.lab.eng.blr.redhat.com    Ready     compute         18h       v1.10.0+b81c8f8

[root@dhcp46-165 ~]# oc rsh glusterfs-storage-gxp7c gluster v list | wc -l
34
[root@dhcp46-165 ~]# heketi-cli -s http://172.31.69.60:8080 volume list --secret 'adminkey' --user admin | wc -l
37

Comment 2 Apeksha 2018-06-08 11:18:12 UTC
Sosreports - http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1589070/

Comment 3 Michael Adam 2018-06-11 13:18:24 UTC
madhu, as discussed, please investigate.

Comment 4 Madhu Rajanna 2018-06-12 08:23:18 UTC
Michael, I have done my analysis,


There are 3 volumes which are present in heketi and not in gluster

Id:5db55c483f6984d9917cfe4b3c8b3cbc    Cluster:0a73c60efdd4673113b668afea101e6d    Name:vol_5db55c483f6984d9917cfe4b3c8b3cbc
Id:9c7d5f0f473cce6914804135f0b8ddcd    Cluster:0a73c60efdd4673113b668afea101e6d    Name:vol_9c7d5f0f473cce6914804135f0b8ddcd
Id:9c85de66c12db0a72b4d16fe888ff74d    Cluster:0a73c60efdd4673113b668afea101e6d    Name:vol_9c85de66c12db0a72b4d16fe888ff74d

analysis

1)Volume vol_5db55c483f6984d9917cfe4b3c8b3cbc ,
volume deletion was successful and there was a failure in DestroyBricks, Hence volume was deleted from gluster, not from heketi db

Heketi Log:

[kubeexec] DEBUG 2018/06/08 08:54:09 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: gluster --mode=script snapshot list vol_5db55c483f6984d9917cfe4b3c8b3cbc --xml
[kubeexec] DEBUG 2018/06/08 08:54:15 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: gluster --mode=script volume stop vol_5db55c483f6984d9917cfe4b3c8b3cbc force
Result: volume stop: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success
[kubeexec] DEBUG 2018/06/08 08:54:19 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command: gluster --mode=script volume delete vol_5db55c483f6984d9917cfe4b3c8b3cbc
Result: volume delete: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success


[heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/brick_create.go:73: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:532: Unable to delete bricks: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[asynchttp] INFO 2018/06/08 08:56:40 asynchttp.go:292: Completed job 20e932645b6c4b3853e7a46917e686e7 in 2m32.540224696s
[heketi] ERROR 2018/06/08 08:56:40 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete Volume Failed: Unable to execute command on glusterfs-storage-gxp7c: umount: /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/brick_4f4d753d298c99eac492c32006c74484: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[kubeexec] DEBUG 2018/06/08 08:56:40 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: lvremove --autobackup=n -f /dev/mapper/vg_96f1667f2f1ced2c5ef94772922be93b-brick_df5559fdf15f6372e19e4e9f8bc1f129





2)Volume vol_9c7d5f0f473cce6914804135f0b8ddcd

From heketi logs it looks like, there was not log for volume delete command from heketi to gluster. The listing volume itself not succeed for this volume deletion 

From cmd_histrory log:
it looks like the volume has been deleted from gluster

Heketi log:

[kubeexec] DEBUG 2018/06/08 07:42:20 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script snapshot list vol_9c7d5f0f473cce6914804135f0b8ddcd --xml
  <opErrstr>Volume (vol_9c7d5f0f473cce6914804135f0b8ddcd) does not exist</opErrstr>
[kubeexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume stop vol_9c7d5f0f473cce6914804135f0b8ddcd force] on glusterfs-storage-gxp7c: Err[command terminated with exit code 1]: Stdout []: Stderr [volume stop: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
[cmdexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:144: Unable to stop volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume stop: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
[kubeexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume delete vol_9c7d5f0f473cce6914804135f0b8ddcd] on glusterfs-storage-gxp7c: Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
[cmdexec] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
[heketi] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
[heketi] ERROR 2018/06/08 07:42:21 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on glusterfs-storage-gxp7c: volume delete: vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist


cmd_histrory Log:

[2018-06-08 07:25:30.159615]  : volume stop vol_9c7d5f0f473cce6914804135f0b8ddcd force : SUCCESS
[2018-06-08 07:25:30.445577]  : volume delete vol_9c7d5f0f473cce6914804135f0b8ddcd : SUCCESS


3)Volume vol_9c85de66c12db0a72b4d16fe888ff74d

not able to find volume delete command from cmd_history from all nodes. not sure how this volume got deleted

Heketi Log:

[kubeexec] DEBUG 2018/06/08 08:43:52 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force
Result: volume stop: vol_9c85de66c12db0a72b4d16fe888ff74d: success
[kubeexec] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to run command [gluster --mode=script volume delete vol_9c85de66c12db0a72b4d16fe888ff74d] on glusterfs-storage-gxp7c: Err[command terminated with exit code 1]: Stdout [Error : Request timed out
[cmdexec] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: 
[heketi] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: 
[heketi] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: 
[heketi] ERROR 2018/06/08 08:45:52 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete Volume Failed: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on glusterfs-storage-gxp7c: 
[heketi] ERROR 2018/06/08 08:48:06 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:510: Unable to determine snapshot information from volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
[heketi] ERROR 2018/06/08 08:48:06 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error executing delete volume: Unable to determine snapshot information from volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
[kubeexec] DEBUG 2018/06/08 08:48:06 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d --xml
           ├─13541 /usr/bin/python /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post --volname=vol_9c85de66c12db0a72b4d16fe888ff74d
[heketi] ERROR 2018/06/08 08:54:08 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete Volume Failed: Unable to determine snapshot information from volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
[kubeexec] DEBUG 2018/06/08 08:54:52 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command: gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d --xml
  <opErrstr>Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist</opErrstr>


cmd_history Log:

[2018-06-08 08:43:52.156891]  : volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force : SUCCESS
[2018-06-08 08:53:58.354836]  : snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d : FAILED : Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist

Comment 5 Raghavendra Talur 2018-06-13 16:46:31 UTC
(In reply to Madhu Rajanna from comment #4)
> Michael, I have done my analysis,
> 
> 
> There are 3 volumes which are present in heketi and not in gluster
> 
> Id:5db55c483f6984d9917cfe4b3c8b3cbc   
> Cluster:0a73c60efdd4673113b668afea101e6d   
> Name:vol_5db55c483f6984d9917cfe4b3c8b3cbc
> Id:9c7d5f0f473cce6914804135f0b8ddcd   
> Cluster:0a73c60efdd4673113b668afea101e6d   
> Name:vol_9c7d5f0f473cce6914804135f0b8ddcd
> Id:9c85de66c12db0a72b4d16fe888ff74d   
> Cluster:0a73c60efdd4673113b668afea101e6d   
> Name:vol_9c85de66c12db0a72b4d16fe888ff74d
> 
> analysis
> 
> 1)Volume vol_5db55c483f6984d9917cfe4b3c8b3cbc ,
> volume deletion was successful and there was a failure in DestroyBricks,
> Hence volume was deleted from gluster, not from heketi db

This analysis looks right.

> 
> Heketi Log:
> 
> [kubeexec] DEBUG 2018/06/08 08:54:09
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command:
> gluster --mode=script snapshot list vol_5db55c483f6984d9917cfe4b3c8b3cbc
> --xml
> [kubeexec] DEBUG 2018/06/08 08:54:15
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command:
> gluster --mode=script volume stop vol_5db55c483f6984d9917cfe4b3c8b3cbc force
> Result: volume stop: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success
> [kubeexec] DEBUG 2018/06/08 08:54:19
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command:
> gluster --mode=script volume delete vol_5db55c483f6984d9917cfe4b3c8b3cbc
> Result: volume delete: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success
> 
> 
> [heketi] ERROR 2018/06/08 08:56:40
> /src/github.com/heketi/heketi/apps/glusterfs/brick_create.go:73: Unable to
> execute command on glusterfs-storage-gxp7c: umount:
> /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> brick_4f4d753d298c99eac492c32006c74484: target is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))
> [heketi] ERROR 2018/06/08 08:56:40
> /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:532: Unable to
> delete bricks: Unable to execute command on glusterfs-storage-gxp7c: umount:
> /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> brick_4f4d753d298c99eac492c32006c74484: target is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))
> [heketi] ERROR 2018/06/08 08:56:40
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> executing delete volume: Unable to execute command on
> glusterfs-storage-gxp7c: umount:
> /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> brick_4f4d753d298c99eac492c32006c74484: target is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))
> [asynchttp] INFO 2018/06/08 08:56:40 asynchttp.go:292: Completed job
> 20e932645b6c4b3853e7a46917e686e7 in 2m32.540224696s
> [heketi] ERROR 2018/06/08 08:56:40
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete
> Volume Failed: Unable to execute command on glusterfs-storage-gxp7c: umount:
> /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> brick_4f4d753d298c99eac492c32006c74484: target is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))
> [kubeexec] DEBUG 2018/06/08 08:56:40
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> lvremove --autobackup=n -f
> /dev/mapper/vg_96f1667f2f1ced2c5ef94772922be93b-
> brick_df5559fdf15f6372e19e4e9f8bc1f129
> 
> 
> 
> 
> 
> 2)Volume vol_9c7d5f0f473cce6914804135f0b8ddcd
> 
> From heketi logs it looks like, there was not log for volume delete command
> from heketi to gluster. The listing volume itself not succeed for this
> volume deletion 
> 
> From cmd_histrory log:
> it looks like the volume has been deleted from gluster

Please check if there is a possibility of a volume create failure where the volume is deleted by heketi as part of cleanup but the entry remains.
> 
> Heketi log:
> 
> [kubeexec] DEBUG 2018/06/08 07:42:20
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> gluster --mode=script snapshot list vol_9c7d5f0f473cce6914804135f0b8ddcd
> --xml
>   <opErrstr>Volume (vol_9c7d5f0f473cce6914804135f0b8ddcd) does not
> exist</opErrstr>
> [kubeexec] ERROR 2018/06/08 07:42:21
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> run command [gluster --mode=script volume stop
> vol_9c7d5f0f473cce6914804135f0b8ddcd force] on glusterfs-storage-gxp7c:
> Err[command terminated with exit code 1]: Stdout []: Stderr [volume stop:
> vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> [cmdexec] ERROR 2018/06/08 07:42:21
> /src/github.com/heketi/heketi/executors/cmdexec/volume.go:144: Unable to
> stop volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command
> on glusterfs-storage-gxp7c: volume stop:
> vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> [kubeexec] ERROR 2018/06/08 07:42:21
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> run command [gluster --mode=script volume delete
> vol_9c7d5f0f473cce6914804135f0b8ddcd] on glusterfs-storage-gxp7c:
> Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete:
> vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> [cmdexec] ERROR 2018/06/08 07:42:21
> /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to
> delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute
> command on glusterfs-storage-gxp7c: volume delete:
> vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> [heketi] ERROR 2018/06/08 07:42:21
> /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to
> delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd:
> Unable to execute command on glusterfs-storage-gxp7c: volume delete:
> vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> [heketi] ERROR 2018/06/08 07:42:21
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> executing delete volume: Unable to delete volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on
> glusterfs-storage-gxp7c: volume delete:
> vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> 
> 
> cmd_histrory Log:
> 
> [2018-06-08 07:25:30.159615]  : volume stop
> vol_9c7d5f0f473cce6914804135f0b8ddcd force : SUCCESS
> [2018-06-08 07:25:30.445577]  : volume delete
> vol_9c7d5f0f473cce6914804135f0b8ddcd : SUCCESS
> 
> 
> 3)Volume vol_9c85de66c12db0a72b4d16fe888ff74d
> 
> not able to find volume delete command from cmd_history from all nodes. not
> sure how this volume got deleted
> 
> Heketi Log:
> 
> [kubeexec] DEBUG 2018/06/08 08:43:52
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> gluster --mode=script volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force
> Result: volume stop: vol_9c85de66c12db0a72b4d16fe888ff74d: success
> [kubeexec] ERROR 2018/06/08 08:45:52
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> run command [gluster --mode=script volume delete
> vol_9c85de66c12db0a72b4d16fe888ff74d] on glusterfs-storage-gxp7c:
> Err[command terminated with exit code 1]: Stdout [Error : Request timed out
> [cmdexec] ERROR 2018/06/08 08:45:52
> /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to
> delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute
> command on glusterfs-storage-gxp7c: 
> [heketi] ERROR 2018/06/08 08:45:52
> /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to
> delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d:
> Unable to execute command on glusterfs-storage-gxp7c: 
> [heketi] ERROR 2018/06/08 08:45:52
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> executing delete volume: Unable to delete volume
> vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on
> glusterfs-storage-gxp7c: 
> [heketi] ERROR 2018/06/08 08:45:52
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete
> Volume Failed: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d:
> Unable to execute command on glusterfs-storage-gxp7c: 
> [heketi] ERROR 2018/06/08 08:48:06
> /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:510: Unable to
> determine snapshot information from volume
> vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
> [heketi] ERROR 2018/06/08 08:48:06
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> executing delete volume: Unable to determine snapshot information from
> volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
> [kubeexec] DEBUG 2018/06/08 08:48:06
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d
> --xml
>            ├─13541 /usr/bin/python
> /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post
> --volname=vol_9c85de66c12db0a72b4d16fe888ff74d
> [heketi] ERROR 2018/06/08 08:54:08
> /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete
> Volume Failed: Unable to determine snapshot information from volume
> vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
> [kubeexec] DEBUG 2018/06/08 08:54:52
> /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d
> --xml
>   <opErrstr>Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not
> exist</opErrstr>
> 
> 
> cmd_history Log:
> 
> [2018-06-08 08:43:52.156891]  : volume stop
> vol_9c85de66c12db0a72b4d16fe888ff74d force : SUCCESS
> [2018-06-08 08:53:58.354836]  : snapshot list
> vol_9c85de66c12db0a72b4d16fe888ff74d : FAILED : Volume
> (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist

Have you checked the cmd_history on all the nodes?

Comment 6 Madhu Rajanna 2018-06-14 04:39:02 UTC
(In reply to Raghavendra Talur from comment #5)
> (In reply to Madhu Rajanna from comment #4)
> > Michael, I have done my analysis,
> > 
> > 
> > There are 3 volumes which are present in heketi and not in gluster
> > 
> > Id:5db55c483f6984d9917cfe4b3c8b3cbc   
> > Cluster:0a73c60efdd4673113b668afea101e6d   
> > Name:vol_5db55c483f6984d9917cfe4b3c8b3cbc
> > Id:9c7d5f0f473cce6914804135f0b8ddcd   
> > Cluster:0a73c60efdd4673113b668afea101e6d   
> > Name:vol_9c7d5f0f473cce6914804135f0b8ddcd
> > Id:9c85de66c12db0a72b4d16fe888ff74d   
> > Cluster:0a73c60efdd4673113b668afea101e6d   
> > Name:vol_9c85de66c12db0a72b4d16fe888ff74d
> > 
> > analysis
> > 
> > 1)Volume vol_5db55c483f6984d9917cfe4b3c8b3cbc ,
> > volume deletion was successful and there was a failure in DestroyBricks,
> > Hence volume was deleted from gluster, not from heketi db
> 
> This analysis looks right.
> 
> > 
> > Heketi Log:
> > 
> > [kubeexec] DEBUG 2018/06/08 08:54:09
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command:
> > gluster --mode=script snapshot list vol_5db55c483f6984d9917cfe4b3c8b3cbc
> > --xml
> > [kubeexec] DEBUG 2018/06/08 08:54:15
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command:
> > gluster --mode=script volume stop vol_5db55c483f6984d9917cfe4b3c8b3cbc force
> > Result: volume stop: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success
> > [kubeexec] DEBUG 2018/06/08 08:54:19
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp46-122.lab.eng.blr.redhat.com Pod: glusterfs-storage-pg4xc Command:
> > gluster --mode=script volume delete vol_5db55c483f6984d9917cfe4b3c8b3cbc
> > Result: volume delete: vol_5db55c483f6984d9917cfe4b3c8b3cbc: success
> > 
> > 
> > [heketi] ERROR 2018/06/08 08:56:40
> > /src/github.com/heketi/heketi/apps/glusterfs/brick_create.go:73: Unable to
> > execute command on glusterfs-storage-gxp7c: umount:
> > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> > brick_4f4d753d298c99eac492c32006c74484: target is busy.
> >         (In some cases useful info about processes that use
> >          the device is found by lsof(8) or fuser(1))
> > [heketi] ERROR 2018/06/08 08:56:40
> > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:532: Unable to
> > delete bricks: Unable to execute command on glusterfs-storage-gxp7c: umount:
> > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> > brick_4f4d753d298c99eac492c32006c74484: target is busy.
> >         (In some cases useful info about processes that use
> >          the device is found by lsof(8) or fuser(1))
> > [heketi] ERROR 2018/06/08 08:56:40
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> > executing delete volume: Unable to execute command on
> > glusterfs-storage-gxp7c: umount:
> > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> > brick_4f4d753d298c99eac492c32006c74484: target is busy.
> >         (In some cases useful info about processes that use
> >          the device is found by lsof(8) or fuser(1))
> > [asynchttp] INFO 2018/06/08 08:56:40 asynchttp.go:292: Completed job
> > 20e932645b6c4b3853e7a46917e686e7 in 2m32.540224696s
> > [heketi] ERROR 2018/06/08 08:56:40
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete
> > Volume Failed: Unable to execute command on glusterfs-storage-gxp7c: umount:
> > /var/lib/heketi/mounts/vg_96f1667f2f1ced2c5ef94772922be93b/
> > brick_4f4d753d298c99eac492c32006c74484: target is busy.
> >         (In some cases useful info about processes that use
> >          the device is found by lsof(8) or fuser(1))
> > [kubeexec] DEBUG 2018/06/08 08:56:40
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> > lvremove --autobackup=n -f
> > /dev/mapper/vg_96f1667f2f1ced2c5ef94772922be93b-
> > brick_df5559fdf15f6372e19e4e9f8bc1f129
> > 
> > 
> > 
> > 
> > 
> > 2)Volume vol_9c7d5f0f473cce6914804135f0b8ddcd
> > 
> > From heketi logs it looks like, there was not log for volume delete command
> > from heketi to gluster. The listing volume itself not succeed for this
> > volume deletion 
> > 
> > From cmd_histrory log:
> > it looks like the volume has been deleted from gluster
> 
> Please check if there is a possibility of a volume create failure where the
> volume is deleted by heketi as part of cleanup but the entry remains.

I dont see any volume create or brick creation log for this volume in heketi.log file

> > 
> > Heketi log:
> > 
> > [kubeexec] DEBUG 2018/06/08 07:42:20
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> > gluster --mode=script snapshot list vol_9c7d5f0f473cce6914804135f0b8ddcd
> > --xml
> >   <opErrstr>Volume (vol_9c7d5f0f473cce6914804135f0b8ddcd) does not
> > exist</opErrstr>
> > [kubeexec] ERROR 2018/06/08 07:42:21
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> > run command [gluster --mode=script volume stop
> > vol_9c7d5f0f473cce6914804135f0b8ddcd force] on glusterfs-storage-gxp7c:
> > Err[command terminated with exit code 1]: Stdout []: Stderr [volume stop:
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> > [cmdexec] ERROR 2018/06/08 07:42:21
> > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:144: Unable to
> > stop volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command
> > on glusterfs-storage-gxp7c: volume stop:
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> > [kubeexec] ERROR 2018/06/08 07:42:21
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> > run command [gluster --mode=script volume delete
> > vol_9c7d5f0f473cce6914804135f0b8ddcd] on glusterfs-storage-gxp7c:
> > Err[command terminated with exit code 1]: Stdout []: Stderr [volume delete:
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> > [cmdexec] ERROR 2018/06/08 07:42:21
> > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to
> > delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute
> > command on glusterfs-storage-gxp7c: volume delete:
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> > [heketi] ERROR 2018/06/08 07:42:21
> > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to
> > delete volume: Unable to delete volume vol_9c7d5f0f473cce6914804135f0b8ddcd:
> > Unable to execute command on glusterfs-storage-gxp7c: volume delete:
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> > [heketi] ERROR 2018/06/08 07:42:21
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> > executing delete volume: Unable to delete volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: Unable to execute command on
> > glusterfs-storage-gxp7c: volume delete:
> > vol_9c7d5f0f473cce6914804135f0b8ddcd: failed: Volume
> > vol_9c7d5f0f473cce6914804135f0b8ddcd does not exist
> > 
> > 
> > cmd_histrory Log:
> > 
> > [2018-06-08 07:25:30.159615]  : volume stop
> > vol_9c7d5f0f473cce6914804135f0b8ddcd force : SUCCESS
> > [2018-06-08 07:25:30.445577]  : volume delete
> > vol_9c7d5f0f473cce6914804135f0b8ddcd : SUCCESS
> > 
> > 
> > 3)Volume vol_9c85de66c12db0a72b4d16fe888ff74d
> > 
> > not able to find volume delete command from cmd_history from all nodes. not
> > sure how this volume got deleted
> > 
> > Heketi Log:
> > 
> > [kubeexec] DEBUG 2018/06/08 08:43:52
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> > gluster --mode=script volume stop vol_9c85de66c12db0a72b4d16fe888ff74d force
> > Result: volume stop: vol_9c85de66c12db0a72b4d16fe888ff74d: success
> > [kubeexec] ERROR 2018/06/08 08:45:52
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:242: Failed to
> > run command [gluster --mode=script volume delete
> > vol_9c85de66c12db0a72b4d16fe888ff74d] on glusterfs-storage-gxp7c:
> > Err[command terminated with exit code 1]: Stdout [Error : Request timed out
> > [cmdexec] ERROR 2018/06/08 08:45:52
> > /src/github.com/heketi/heketi/executors/cmdexec/volume.go:153: Unable to
> > delete volume vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute
> > command on glusterfs-storage-gxp7c: 
> > [heketi] ERROR 2018/06/08 08:45:52
> > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:525: Unable to
> > delete volume: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d:
> > Unable to execute command on glusterfs-storage-gxp7c: 
> > [heketi] ERROR 2018/06/08 08:45:52
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> > executing delete volume: Unable to delete volume
> > vol_9c85de66c12db0a72b4d16fe888ff74d: Unable to execute command on
> > glusterfs-storage-gxp7c: 
> > [heketi] ERROR 2018/06/08 08:45:52
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete
> > Volume Failed: Unable to delete volume vol_9c85de66c12db0a72b4d16fe888ff74d:
> > Unable to execute command on glusterfs-storage-gxp7c: 
> > [heketi] ERROR 2018/06/08 08:48:06
> > /src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:510: Unable to
> > determine snapshot information from volume
> > vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
> > [heketi] ERROR 2018/06/08 08:48:06
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:429: Error
> > executing delete volume: Unable to determine snapshot information from
> > volume vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
> > [kubeexec] DEBUG 2018/06/08 08:48:06
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> > gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d
> > --xml
> >            ├─13541 /usr/bin/python
> > /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post
> > --volname=vol_9c85de66c12db0a72b4d16fe888ff74d
> > [heketi] ERROR 2018/06/08 08:54:08
> > /src/github.com/heketi/heketi/apps/glusterfs/operations.go:1185: Delete
> > Volume Failed: Unable to determine snapshot information from volume
> > vol_9c85de66c12db0a72b4d16fe888ff74d: EOF
> > [kubeexec] DEBUG 2018/06/08 08:54:52
> > /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host:
> > dhcp47-76.lab.eng.blr.redhat.com Pod: glusterfs-storage-gxp7c Command:
> > gluster --mode=script snapshot list vol_9c85de66c12db0a72b4d16fe888ff74d
> > --xml
> >   <opErrstr>Volume (vol_9c85de66c12db0a72b4d16fe888ff74d) does not
> > exist</opErrstr>
> > 
> > 
> > cmd_history Log:
> > 
> > [2018-06-08 08:43:52.156891]  : volume stop
> > vol_9c85de66c12db0a72b4d16fe888ff74d force : SUCCESS
> > [2018-06-08 08:53:58.354836]  : snapshot list
> > vol_9c85de66c12db0a72b4d16fe888ff74d : FAILED : Volume
> > (vol_9c85de66c12db0a72b4d16fe888ff74d) does not exist
> 
> Have you checked the cmd_history on all the nodes?

yes i have verified cmd_history on all 3 nodes.

Comment 8 Madhu Rajanna 2018-06-27 12:37:08 UTC
some part of the issue has been fixed in the upstream patch
https://github.com/heketi/heketi/pull/1222.
John can give more info on this

Comment 9 John Mulligan 2018-06-27 17:14:22 UTC
(In reply to Madhu Rajanna from comment #8)
> some part of the issue has been fixed in the upstream patch
> https://github.com/heketi/heketi/pull/1222.
> John can give more info on this

The initial error condition is one triggered at the gluster level (a brick multiplexing issue IIRC) and there is no fix for it in that patch series. This means that the "out of sync" would still be possible as long as the bug persists in gluster. However, the patch series makes is possible to re-run the volume delete on the heketi side , and if the brick mount is no longer in use, the delete should succeed and bring the heketi state into sync with gluster.

FWIW, I quoted "out of sync" because in this case I could argue that heketi's not wrong about the volume not being deleted as not all the components of the volume are deleted (by design in the current version). It's gluster that doesn't "care" about the underlying storage and thus a simple comparison of the two lists is a tad bit misleading in the case where there's an underlying bug in gluster.

Comment 10 John Mulligan 2018-06-27 18:29:06 UTC
Forgot to mention that the unmount issue probably is BZ#1524336, if I am not mistaken.

Comment 14 RamaKasturi 2018-09-25 11:03:41 UTC
I learnt that there are more steps in automation than just creation and deletion of volumes. It would be good to verify this case with automation and have requested apeksha for the same. 

Looks like this issue is not being hit with latest CI runs but automation team would like to test it with 3.11 just to confirm.


Note You need to log in before you can comment on or make changes to this bug.