Bug 1395216

Summary: ProvisioningFailed: Failed to provision volume with StorageClass "gold": glusterfs: create volume err: failed to get hostip Id not found
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasanth <pprakash>
Component: heketiAssignee: Humble Chirammal <hchiramm>
Status: CLOSED ERRATA QA Contact: Prasanth <pprakash>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.4CC: akhakhar, annair, hchiramm, jarrpa, madam, mliyazud, mzywusko, nerawat, pprakash, rcyriac, rhs-bugs, rmekala, rreddy, rtalur, storage-qa-internal, vinug
Target Milestone: ---   
Target Release: CNS 3.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 21:56:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1346621    
Bug Blocks: 1385247    

Description Prasanth 2016-11-15 12:40:41 UTC
Description of problem:

ProvisioningFailed: Failed to provision volume with StorageClass "gold": glusterfs: create volume err: failed to get hostip Id not found.

################
# oc describe pvc claim1
Name:           claim1
Namespace:      storage-project
StorageClass:   gold
Status:         Pending
Volume:
Labels:         <none>
Capacity:
Access Modes:
Events:
  FirstSeen     LastSeen        Count   From                            SubobjectPath   Type            Reason                  Message
  ---------     --------        -----   ----                            -------------   --------        ------                  -------
  43m           42m             5       {persistentvolume-controller }                  Warning         ProvisioningFailed      Failed to provision volume with StorageClass "gold": glusterfs: create volume err: error creating volume .
  42m           42m             1       {persistentvolume-controller }                  Warning         ProvisioningFailed      Failed to provision volume with StorageClass "gold": glusterfs: create volume err: error creating volume Unable to execute command on glusterfs-dc-dhcp47-53.lab.eng.blr.redhat.com-1-jmjuf: volume create: vol_004a90e4dae8970b28b1cac2f9de41e1: failed: Host 10.70.47.54 not connected

.
  41m   41m     1       {persistentvolume-controller }          Warning ProvisioningFailed      Failed to provision volume with StorageClass "gold": glusterfs: create volume err: error creating volume Unable to execute command on glusterfs-dc-dhcp47-121.lab.eng.blr.redhat.com-1-zv8jq: volume create: vol_c41667c7c2974724718c79d5ab995d22: failed: Host 10.70.47.54 not connected

.
  41m   1s      84      {persistentvolume-controller }          Warning ProvisioningFailed      Failed to provision volume with StorageClass "gold": glusterfs: create volume err: failed to get hostip Id not found
#######################


Version-Release number of selected component (if applicable):
openshift v3.4.0.24+52fd77b
kubernetes v1.4.0+776c994
heketi-cli 3.0.0

How reproducible: Seen once and will try to reproduce it again


Steps to Reproduce:
1. Create a claim of 100G for example
2. Reboot one of the gluster node among the OCP nodes
3. Check # oc get pvc
4. Check # oc describe pvc <claim>

Actual results: Claim continues to be in "Pending" state but heketi continues to create gluster volumes in the back-end.


Expected results: claim should be created successfully once the node comes back and moved to "Ready" state and the gluster pod is in "Running" Status.


Additional info: I'll attach more details soon.

Comment 2 Humble Chirammal 2016-11-15 12:50:20 UTC
The error "glusterfs: create volume err: failed to get hostip Id not found." point to the same issue ( node info failed and Heketi returned the error which cause the provisioner to try volume creation again) which we are discussing in below bugzillas:

https://bugzilla.redhat.com/show_bug.cgi?id=1392377
https://bugzilla.redhat.com/show_bug.cgi?id=1346621

Comment 5 Prasanth 2016-11-15 13:53:00 UTC
It has ended up creating around 125 gluster volumes while the claim is still in "Pending" State.

---------
# heketi-cli volume list |wc -l
125

# oc get pvc
NAME                 STATUS    VOLUME                                     CAPACITY   ACCESSMODES   AGE
claim1               Pending     
---------                                                                  2h

This should not be the case at any point. It should not create more volumes than requested at any point of time. So we need to prevent this from happening.

Comment 6 Humble Chirammal 2016-11-15 14:01:08 UTC
(In reply to Prasanth from comment #5)
> It has ended up creating around 125 gluster volumes while the claim is still
> in "Pending" State.
> 
> ---------
> # heketi-cli volume list |wc -l
> 125
> 
> # oc get pvc
> NAME                 STATUS    VOLUME                                    
> CAPACITY   ACCESSMODES   AGE
> claim1               Pending     
> ---------                                                                  2h
> 
> This should not be the case at any point. It should not create more volumes
> than requested at any point of time. So we need to prevent this from
> happening.

The workflow looks same as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1392377#c27, so fix of Heketi's bug ( ID not found) should solve this issue.

Comment 7 Michael Adam 2016-11-21 15:18:22 UTC
yet another duplicate of bz #1346621 with different test.

Should be fixed with patch from 

https://github.com/heketi/heketi/pull/579 

in next build.

Comment 12 Prasanth 2016-12-22 08:04:39 UTC
Verified

Comment 13 errata-xmlrpc 2017-01-18 21:56:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0148.html