Bug 1769645 - StorageOS CR daemonsets are crashing
Summary: StorageOS CR daemonsets are crashing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ISV Operators
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.3.0
Assignee: Bruno Andrade
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-07 05:03 UTC by Bruno Andrade
Modified: 2020-01-23 11:11 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:11:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:11:49 UTC

Description Bruno Andrade 2019-11-07 05:03:38 UTC
Description of problem:

After installing the following CR, storageos-daemonset keeps with CrashLoopBackOff state  

apiVersion: storageos.com/v1
 kind: StorageOSCluster
 metadata:
   name: storageos
   namespace: openshift-operators
 spec:
   secretRefName: "storageos-api" 
   secretRefNamespace: "openshift-operators"  
   namespace: openshift-operators
   csi:
     enable: true
     deploymentStrategy: deployment
   resources:
     requests:
     memory: "512Mi"
   k8sDistro: "openshift"

oc get pods    
NAME                                    READY   STATUS             RESTARTS   AGE
storageos-csi-helper-688f676c89-knght   3/3     Running            0          9m45s
storageos-daemonset-56htx               3/3     Running            0          9m45s
storageos-daemonset-9b6sh               2/3     CrashLoopBackOff   5          9m45s
storageos-daemonset-bpc7p               3/3     Running            0          9m45s
storageos-daemonset-d4mkq               2/3     CrashLoopBackOff   5          9m45s
storageos-daemonset-rf2lk               2/3     CrashLoopBackOff   5          9m45s
storageos-daemonset-xhnjl               2/3     CrashLoopBackOff   5          9m45s
storageos-operator-7f787d48c4-rs8qp     1/1     Running            0          42m
storageos-scheduler-5d47b6c86d-vbjcp    1/1     Running            0          9m45s

oc logs -f storageos-daemonset-xhnjl -c storageos

level=info msg="not first cluster node, joining first node" action=create address=10.0.140.90 category=etcd host=ip-10-0-140-90.us-east-2.compute.internal module=cp target=10.0.140.90
time="2019-11-07T04:56:17.061159088Z" level=error msg="could not retrieve cluster config from api" endpoint="http://10.0.140.90:5705/v1/members" status_code=404
time="2019-11-07T04:56:17.061230456Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="10.0.136.208,10.0.156.24,10.0.169.3,10.0.158.200,10.0.174.204,10.0.140.90" error="404 Not Found" module=cp
time="2019-11-07T04:56:17.061273574Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp
time="2019-11-07T04:56:22.061521189Z" level=info msg="not first cluster node, joining first node" action=create address=10.0.140.90 category=etcd host=ip-10-0-140-90.us-east-2.compute.internal module=cp target=10.0.136.208

Version-Release number of selected component (if applicable):
StorageOS Operator: 1.4.0

How reproducible:
Always

Steps to Reproduce:
1.Create StorageOS Operator
2.Create the secret as oriented on https://docs.storageos.com/docs/platforms/openshift/install/4.1
3. Create the StorageOS CR as mentioned on bug description

Actual results: StoragesOS daemonsets are failing


Expected results: StoragesOS daemonsets should be running and StorageOS cluster should be healthy

Comment 3 Simon Croome 2019-11-25 18:16:11 UTC
We believe this was due to port 5705 being blocked by a firewall on the worker nodes.  We've changed the docs in https://github.com/storageos/cluster-operator/pull/197 to make it more clear this port is required.  We'll release 1.5.1 later this week with the changes.

Comment 5 Bruno Andrade 2019-12-12 23:25:11 UTC
Tested on StorageOS 1.5.1 version and looks good if tcp/5705 is opened. 

oc get pods
NAME                                    READY   STATUS             RESTARTS   AGE
storageos-csi-helper-5564768bc5-snnmt   3/3     Running            0          7m58s
storageos-daemonset-2hmf6               3/3     Running            0          7m57s
storageos-daemonset-6rr7c               3/3     Running            0          7m57s
storageos-daemonset-jhdcg               3/3     Running            0          7m57s
storageos-operator-658fb7f587-7kmlb     1/1     Running            0          9m58s
storageos-scheduler-57878b58db-9fnm6    1/1     Running            3          7m58s

Marking as VERIFIED

Comment 7 errata-xmlrpc 2020-01-23 11:11:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.