Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1520291

Summary: ASB pods were in error status after installation
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: InstallerAssignee: Fabian von Feilitzsch <fabian>
Status: CLOSED NOTABUG QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.1CC: aos-bugs, gpei, jokerman, jpeeler, mark.vinkx, mmccomas, qixuan.wang, wmeng
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-24 15:18:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gaoyun Pei 2017-12-04 08:17:59 UTC
Description of problem:
After 3.7 cluster installation, found the pods under openshift-ansible-service-broker project were all error, pvc "etcd" was "Pending". This is reproducible when the cluster doesn't have default storageclass(without cloudprovider enabled)


Version-Release number of the following components:
openshift-ansible-3.7.11-1.git.0.42a781f.el7.noarch.rpm
ansible 2.4.1.0-1.el7

How reproducible:
Always

Steps to Reproduce:
1. Run playbooks/byo/config.yml to setup an ocp-3.7 cluster. Since openshift_service_catalog, ansible_service_broker and template_service_broker would be installed by default. Check related pod after installation.


Actual results:
[root@ip-172-18-9-175 ~]# oc get pod -n openshift-ansible-service-broker
NAME                READY     STATUS              RESTARTS   AGE
asb-1-deploy        1/1       Running             0          4m
asb-1-j7tmh         0/1       CrashLoopBackOff    5          4m
asb-etcd-1-8q9lm    0/1       ContainerCreating   0          4m
asb-etcd-1-deploy   1/1       Running             0          4m

[root@ip-172-18-9-175 ~]# oc logs asb-1-j7tmh -n openshift-ansible-service-broker
Using config file mounted to /etc/ansible-service-broker/config.yaml
============================================================
==           Starting Ansible Service Broker...           ==
============================================================
[2017-12-04T08:10:15.594Z] [NOTICE] Initializing clients...
[2017-12-04T08:10:15.594Z] [INFO] == ETCD CX ==
[2017-12-04T08:10:15.594Z] [INFO] EtcdHost: asb-etcd.openshift-ansible-service-broker.svc
[2017-12-04T08:10:15.594Z] [INFO] EtcdPort: 2379
[2017-12-04T08:10:15.594Z] [INFO] Endpoints: [https://asb-etcd.openshift-ansible-service-broker.svc:2379]
[2017-12-04T08:10:16.595Z] [ERROR] client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint https://asb-etcd.openshift-ansible-service-broker.svc:2379 exceeded header timeout

[root@ip-172-18-9-175 ~]# oc describe pod asb-etcd-1-8q9lm -n openshift-ansible-service-broker
Name:		asb-etcd-1-8q9lm
Namespace:	openshift-ansible-service-broker
Node:		ip-172-18-2-70.ec2.internal/172.18.2.70
...
Events:
  ...
  2m		28s		2	kubelet, ip-172-18-2-70.ec2.internal			Warning		FailedMount		Unable to mount volumes for pod "asb-etcd-1-8q9lm_openshift-ansible-service-broker(48c0fc7c-d8ca-11e7-af7f-0edf0affc2bc)": timeout expired waiting for volumes to attach/mount for pod "openshift-ansible-service-broker"/"asb-etcd-1-8q9lm". list of unattached/unmounted volumes=[etcd]

[root@ip-172-18-9-175 ~]# oc get pvc -n openshift-ansible-service-broker
NAME      STATUS    VOLUME    CAPACITY   ACCESSMODES   STORAGECLASS   AGE
etcd      Pending                                                     3m
[root@ip-172-18-9-175 ~]# oc describe pvc -n openshift-ansible-service-broker
Name:		etcd
Namespace:	openshift-ansible-service-broker
StorageClass:	
Status:		Pending
Volume:		
Labels:		<none>
Annotations:	<none>
Capacity:	
Access Modes:	
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  3m		13s		15	persistentvolume-controller			Normal		FailedBinding	no persistent volumes available for this claim and no storage class is set

[root@ip-172-18-9-175 ~]# oc get pv 
NAME           CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                 STORAGECLASS   REASON    AGE
regpv-volume   17G        RWX           Retain          Bound     default/regpv-claim                            15m
[root@ip-172-18-9-175 ~]# oc get storageclass
No resources found.

Comment 3 Fabian von Feilitzsch 2018-01-24 15:31:37 UTC
If no PV is defined then the broker will be unable to start. It looks like you set up NFS backing for the registry PV, you can do the same for the broker PV by adding these options to your inventory (as documented in https://docs.openshift.org/latest/install_config/install/advanced_install.html#configuring-openshift-ansible-broker)

openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/osev3-etcd 
openshift_hosted_etcd_storage_volume_name=etcd-vol2 
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}

Does this fix the issue or was there something additional that I missed?

Comment 4 Gaoyun Pei 2018-01-25 08:54:09 UTC
Thanks Fabian! After adding the openshift_hosted_etcd_storage_* related options, the required pv for ansible-service-broker was created, and asb pods are running well. 

The openshift-ansible version I used is openshift-ansible-3.7.26-1.git.0.f87f1af.el7.noarch.rpm.

So my thinking is now we're installing ASB by default no matter user want it or not, maybe we could add a clear prompt when user didn't set correct PV options for it?

Just like what I have done, I didn't know I have to set openshift_hosted_etcd_storage_* options when installing an ocp-3.7 cluster without cloudprovider enabled. I'll appreciate it very much if installer could give some hint in this case. Thanks!

Comment 5 Qixuan Wang 2018-04-16 13:41:08 UTC
Hi there,

I ran into the same problem with OCP 3.9.14 deployed on OpenStack + Glusterfs, there is storageclass, then how should I configure openshift_hosted_etcd_storage_* options? Thanks!


[root@host-172-16-120-146 ~]# oc describe pvc
Name:          etcd
Namespace:     openshift-ansible-service-broker
StorageClass:  
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   <none>
Finalizers:    []
Capacity:      
Access Modes:  
Events:
  Type    Reason         Age               From                         Message
  ----    ------         ----              ----                         -------
  Normal  FailedBinding  2m (x26 over 8m)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

[root@host-172-16-120-146 ~]# oc get pv 
No resources found.

[root@host-172-16-120-146 ~]# oc describe storageclass
Name:            glusterfs-storage
IsDefaultClass:  No
Annotations:     <none>
Provisioner:     kubernetes.io/glusterfs
Parameters:      resturl=http://heketi-storage-glusterfs.apps.0416-2iv.qe.rhcloud.com,restuser=admin,secretName=heketi-storage-admin-secret,secretNamespace=glusterfs
ReclaimPolicy:   Delete
Events:          <none>

Comment 6 Qixuan Wang 2018-04-19 02:38:52 UTC
Hi there,

I enabled the cloud provider to avoid the problem.

Comment 7 Fabian von Feilitzsch 2018-04-20 16:00:28 UTC
Hey, did that resolve your issue or is there still work we need to do here?

Comment 8 Fabian von Feilitzsch 2018-04-20 18:08:46 UTC
I think it would be difficult to automatically detect incorrect PV settings due to the large number of possible setups, particularly in the case of an upgrade or the broker being added to an existing cluster. The dependence on the PV is in the documentation, and is going away in v3.10, so I think we can go ahead and close this unless you have an objection.

Comment 9 Fabian von Feilitzsch 2018-04-24 15:18:43 UTC
Closing for now, if you feel like this issue needs additional work feel free to reopen.

Comment 10 Gaoyun Pei 2018-05-02 02:42:28 UTC
Hi Fabian, sorry for the late reply. Comment 8 makes sense, let's close it.