1613073 – After OCP host reboot or outage mount and delete fails for gluster block volumes

Bug 1613073 - After OCP host reboot or outage mount and delete fails for gluster block volumes

Summary: After OCP host reboot or outage mount and delete fails for gluster block volumes

Keywords:
Status:	CLOSED DUPLICATE of bug 1598322
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-block
Sub Component:
Version:	cns-3.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Prasanna Kumar Kalever
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-06 23:21 UTC by Annette Clewett
Modified:	2018-08-08 06:47 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-08-08 06:47:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterfs server log from OCP app server (39.24 KB, application/x-gzip) 2018-08-06 23:21 UTC, Annette Clewett	no flags	Details
glusterfs server log from OCP app server (29.77 KB, application/x-gzip) 2018-08-06 23:21 UTC, Annette Clewett	no flags	Details
glusterfs server log from OCP app server (30.20 KB, application/x-gzip) 2018-08-06 23:22 UTC, Annette Clewett	no flags	Details
View All

Description Annette Clewett 2018-08-06 23:21:08 UTC

Created attachment 1473794 [details]
glusterfs server log from OCP app server

Description of problem:
After OCP nodes hosting gluster pods are powered off and then powered back on gluster-block OCP pvs will not delete correctly. Further investigation shows that attempting to delete via 'heketi-cli blockvolume delete <blockvolume_ID>' also fails.  


Version-Release number of selected component (if applicable):

Running from clone of openshift-ansible branch=release-3.10, commit 3e53753df1e1a592b3f594c9393e805d7a1ee735

With OCS image files:
rhgs-server-rhel7:3.3.1-289
rhgs-gluster-block-prov-rhel7:3.3.1-20
rhgs-volmanager-rhel7:3.3.1-21

How reproducible:
Intermittent

Steps to Reproduce:
1. Create OCS cluster on at least 3 nodes
2. Install metrics, logging and/or prometheus with gluster-block volumes
3. Turn off OCP nodes hosting gluster pods and OCP infra node (in this case all pods on infra node). Leave off for at least 10 mins. 
4) Turn on OCP nodes hosting gluster pods, wait until all gluster pods/heketi online
5) Turn on OCP infra node (has pods for metrics, logging and prometheus if they all are deployed).
6) Delete deployments of logging, metrics and prometheus including associated PVCs in each project, openshift-infra, openshift-logging, openshift-metrics. 
7) 'oc get pv' find all glusterblock PVs are in Released status and not deleted

Actual results:
glusterblock PVs are in Released status and not deleted

Expected results:
glusterblock PVs are deleted when associated OCP PVC is deleted. Results of heketi-cli blockvolume list and 'gluster-block list <block_vol_hosting_volume>' all show blockvolume is deleted.

Additional info:
Example errors in gluster log for attempted 'heketi-cli glusterblock blockvol_43ca76f2672377530be7f1ec245eb191'.
https://gist.github.com/netzzer/6c4499ff2920630ac36fab69af0285db

Results for 'targetcli list' form all 3 gluster pods.
$ oc get pods
NAME                                           READY     STATUS    RESTARTS   AGE
glusterblock-registry-provisioner-dc-1-jwbmt   1/1       Running   0          5h
glusterfs-registry-8tk9m                       1/1       Running   1          5d
glusterfs-registry-8xkc8                       1/1       Running   1          5d
glusterfs-registry-l7555                       1/1       Running   1          5d
heketi-registry-1-cgkgc                        1/1       Running   1          5d

glusterfs-registry-8tk9m 'targetcli' ls - https://gist.github.com/netzzer/1b73f4aab9311b5780a881b7fd588aa5

https://gist.github.com/netzzer/641ce4d15b9f6d9b185e50babda2ddce
glusterfs-registry-8xkc8 'targetcli' ls -

glusterfs-registry-l7555 'targetcli' ls -
https://gist.github.com/netzzer/0f849605d175e97a7f5e046d13531a7c

All logs for 3 pods (/var/log/glusterfs/*) attached as tar files.

Comment 2 Annette Clewett 2018-08-06 23:21:48 UTC

Created attachment 1473795 [details]
glusterfs server log from OCP app server

Comment 3 Annette Clewett 2018-08-06 23:22:44 UTC

Created attachment 1473796 [details]
glusterfs server log from OCP app server

Comment 7 Annette Clewett 2018-08-07 18:32:25 UTC

Questions answered. 

1.
> With OCS image files:
> rhgs-server-rhel7:3.3.1-289

should this be rhgs-server-rhel7:3.3.1-28 ?

Yes, typo!

2. Are you waiting for anything else ?

No, from reading BZ#1598322 this looks to be root cause of the issue.

Note You need to log in before you can comment on or make changes to this bug.