Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1769645

Summary:	StorageOS CR daemonsets are crashing
Product:	OpenShift Container Platform	Reporter:	Bruno Andrade <bandrade>
Component:	ISV Operators	Assignee:	Bruno Andrade <bandrade>
Status:	CLOSED ERRATA	QA Contact:	Bruno Andrade <bandrade>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.3.0	CC:	aos-bugs, jiazha, mdorn, mifiedle, scolange, sd-ecosystem, simon.croome, tbuskey
Target Milestone:	---
Target Release:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-01-23 11:11:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bruno Andrade 2019-11-07 05:03:38 UTC

Description of problem:

After installing the following CR, storageos-daemonset keeps with CrashLoopBackOff state  

apiVersion: storageos.com/v1
 kind: StorageOSCluster
 metadata:
   name: storageos
   namespace: openshift-operators
 spec:
   secretRefName: "storageos-api" 
   secretRefNamespace: "openshift-operators"  
   namespace: openshift-operators
   csi:
     enable: true
     deploymentStrategy: deployment
   resources:
     requests:
     memory: "512Mi"
   k8sDistro: "openshift"

oc get pods    
NAME                                    READY   STATUS             RESTARTS   AGE
storageos-csi-helper-688f676c89-knght   3/3     Running            0          9m45s
storageos-daemonset-56htx               3/3     Running            0          9m45s
storageos-daemonset-9b6sh               2/3     CrashLoopBackOff   5          9m45s
storageos-daemonset-bpc7p               3/3     Running            0          9m45s
storageos-daemonset-d4mkq               2/3     CrashLoopBackOff   5          9m45s
storageos-daemonset-rf2lk               2/3     CrashLoopBackOff   5          9m45s
storageos-daemonset-xhnjl               2/3     CrashLoopBackOff   5          9m45s
storageos-operator-7f787d48c4-rs8qp     1/1     Running            0          42m
storageos-scheduler-5d47b6c86d-vbjcp    1/1     Running            0          9m45s

oc logs -f storageos-daemonset-xhnjl -c storageos

level=info msg="not first cluster node, joining first node" action=create address=10.0.140.90 category=etcd host=ip-10-0-140-90.us-east-2.compute.internal module=cp target=10.0.140.90
time="2019-11-07T04:56:17.061159088Z" level=error msg="could not retrieve cluster config from api" endpoint="http://10.0.140.90:5705/v1/members" status_code=404
time="2019-11-07T04:56:17.061230456Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="10.0.136.208,10.0.156.24,10.0.169.3,10.0.158.200,10.0.174.204,10.0.140.90" error="404 Not Found" module=cp
time="2019-11-07T04:56:17.061273574Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp
time="2019-11-07T04:56:22.061521189Z" level=info msg="not first cluster node, joining first node" action=create address=10.0.140.90 category=etcd host=ip-10-0-140-90.us-east-2.compute.internal module=cp target=10.0.136.208

Version-Release number of selected component (if applicable):
StorageOS Operator: 1.4.0

How reproducible:
Always

Steps to Reproduce:
1.Create StorageOS Operator
2.Create the secret as oriented on https://docs.storageos.com/docs/platforms/openshift/install/4.1
3. Create the StorageOS CR as mentioned on bug description

Actual results: StoragesOS daemonsets are failing


Expected results: StoragesOS daemonsets should be running and StorageOS cluster should be healthy

Comment 3 Simon Croome 2019-11-25 18:16:11 UTC

We believe this was due to port 5705 being blocked by a firewall on the worker nodes.  We've changed the docs in https://github.com/storageos/cluster-operator/pull/197 to make it more clear this port is required.  We'll release 1.5.1 later this week with the changes.

Comment 5 Bruno Andrade 2019-12-12 23:25:11 UTC

Tested on StorageOS 1.5.1 version and looks good if tcp/5705 is opened. 

oc get pods
NAME                                    READY   STATUS             RESTARTS   AGE
storageos-csi-helper-5564768bc5-snnmt   3/3     Running            0          7m58s
storageos-daemonset-2hmf6               3/3     Running            0          7m57s
storageos-daemonset-6rr7c               3/3     Running            0          7m57s
storageos-daemonset-jhdcg               3/3     Running            0          7m57s
storageos-operator-658fb7f587-7kmlb     1/1     Running            0          9m58s
storageos-scheduler-57878b58db-9fnm6    1/1     Running            3          7m58s

Marking as VERIFIED

Comment 7 errata-xmlrpc 2020-01-23 11:11:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062