Bug 1882103

Summary: memcached-operator-registry-server launch by operator-sdk run packagemanifests fails
Product: OpenShift Container Platform Reporter: Tom Buskey <tbuskey>
Component: OLMAssignee: Evan Cordell <ecordell>
OLM sub component: OLM QA Contact: Bruno Andrade <bandrade>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, bandrade, jesusr, jiazha, tbuskey
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:44:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom Buskey 2020-09-23 19:12:34 UTC
Description of problem:
Following https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30834
at stage operator-sdk run packagemanifests  --operator-version 0.0.1 --olm-namespace openshift-operator-lifecycle-manager, it times our waiting for the CSV


Version-Release number of selected component (if applicable):
operator-sdk version;oc version
operator-sdk version: "v0.19.4-1-g416d4466", commit: "416d4466d73d5a66e86eebbe4f5c7d48a1a51416", kubernetes version: "v1.18.2", go version: "go1.15 linux/amd64"
Client Version: openshift-clients-4.6.0-202006250705.p0-137-g0a570695f
Server Version: 4.6.0-0.nightly-2020-09-23-022756
Kubernetes Version: v1.19.0+8a39924


How reproducible:
always

Steps to Reproduce:
1. Follow https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30834
2. ...
3. operator-sdk run packagemanifests  --operator-version 0.0.1 --olm-namespace openshift-operator-lifecycle-manager


Actual results:
NFO[0048] Waiting for ClusterServiceVersion "openshift-operator-lifecycle-manager/memcached-operator.v0.0.1" to reach 'Succeeded' phase 
INFO[0048]   Waiting for ClusterServiceVersion "openshift-operator-lifecycle-manager/memcached-operator.v0.0.1" to appear 
FATA[0120] Failed to run operator: error waiting for CSV to install: timed out waiting for the condition 
#

oc get pod
memcached-operator-registry-server-5cfbc4f8bd-h2tjr   0/1     CrashLoopBackOff   6          7m30s

oc logs memcached-operator-registry-server-5cfbc4f8bd-h2tjr :
ime="2020-09-23T18:42:49Z" level=info msg="skipping hidden directory" dir=/registry/manifests file=..2020_09_23_18_41_14.602544883 load=package
time="2020-09-23T18:42:49Z" level=info msg="skipping hidden file" dir=/registry/manifests file=..data load=package
Error: open /etc/nsswitch.conf: permission denied
Usage:
   [flags]

Flags:
  -d, --database string          relative path to sqlite db (default "bundles.db")
  -h, --help                     help for this command
  -p, --port string              port number to serve on (default "50051")
      --skip-migrate             do  not attempt to migrate to the latest db revision when starting
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")

time="2020-09-23T18:42:49Z" level=panic msg="open /etc/nsswitch.conf: permission denied"
panic: (*logrus.Entry) (0x1138d00,0xc0001afb20)

Expected results:
NFO[0051] Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to reach 'Succeeded' phase
INFO[0052]   Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to appear
INFO[0056]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Pending
INFO[0058]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Installing
INFO[0087]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Succeeded
INFO[0088] Successfully installed "memcached-operator.v0.0.1" on OLM version "0.14.2"



Additional info:

Comment 1 Jesus M. Rodriguez 2020-09-24 22:47:05 UTC
#### Running existing image with no /etc/nsswitch.conf against OCP 4.6.0

I was able to recreate the problem against OCP 4.6.0 with the existing image.
```
[jesusr@transam community-operators{master}]$ operator-sdk run packagemanifests ./jenkins-operator --olm-namespace openshift-operator-lifecycle-manager --operator-version 0.4.1-rc3 --operator-namespace $TEST_NAMESPACE
INFO[0000] Running operator from directory ./jenkins-operator
INFO[0000] Creating jenkins-operator registry
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-package"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-2-2"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-3-31"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-0"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc1"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc2"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc3"
INFO[0001]   Creating Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001]   Creating Service "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001] Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" rollout to complete
INFO[0001]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 out of 1 new replicas have been updated
INFO[0002]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 of 1 updated replicas are available
FATA[0120] Failed to run operator: error creating registry resources: error registering package: error waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to roll out: timed out waiting for the condition
```

Looking at the registry-server we created you can see the same error identified in the bugzilla:

```
[jesusr@transam community-operators{master}]$ oc logs jenkins-operator-registry-server-6ff8f55cb7-ms678 -n openshift-operator-lifecycle-manager
time="2020-09-24T22:12:39Z" level=info msg="loading Bundles" dir=/registry/manifests
time="2020-09-24T22:12:39Z" level=info msg=directory dir=/registry/manifests file=manifests load=bundles
time="2020-09-24T22:12:39Z" level=info msg=directory dir=/registry/manifests file=jenkins-operator-registry-manifests-0-2-2 load=bundles
time="2020-09-24T22:12:39Z" level=info msg="skipping hidden directory" dir=/registry/manifests file=..2020_09_24_22_06_50.262414143 load=bundles
time="2020-09-24T22:12:39Z" level=info msg="skipping hidden file" dir=/registry/manifests file=..data load=bundles
[snip]

Error: open /etc/nsswitch.conf: permission denied
Usage:
   [flags]

Flags:
  -d, --database string          relative path to sqlite db (default "bundles.db")
  -h, --help                     help for this command
  -p, --port string              port number to serve on (default "50051")
      --skip-migrate             do  not attempt to migrate to the latest db revision when starting
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")

time="2020-09-24T22:12:40Z" level=panic msg="open /etc/nsswitch.conf: permission denied"
panic: (*logrus.Entry) (0x1138d20,0xc0001e1b20)

goroutine 1 [running]:
github.com/sirupsen/logrus.Entry.log(0xc0000cc000, 0xc000227770, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /build/vendor/github.com/sirupsen/logrus/entry.go:239 +0x2db
github.com/sirupsen/logrus.(*Entry).Log(0xc0001e1ab0, 0xc000000000, 0xc000169f40, 0x1, 0x1)
        /build/vendor/github.com/sirupsen/logrus/entry.go:268 +0xeb
github.com/sirupsen/logrus.(*Logger).Log(0xc0000cc000, 0x0, 0xc000169f40, 0x1, 0x1)
        /build/vendor/github.com/sirupsen/logrus/logger.go:192 +0x7d
github.com/sirupsen/logrus.(*Logger).Panic(...)
        /build/vendor/github.com/sirupsen/logrus/logger.go:233
github.com/sirupsen/logrus.Panic(...)
        /build/vendor/github.com/sirupsen/logrus/exported.go:129
```

Comment 2 Jesus M. Rodriguez 2020-09-24 22:47:39 UTC
FIX worked

#### Running new image with /etc/nsswitch.conf added

Using the new image against OCP 4.6.0-rc.7, it worked as expected.

```
[jesusr@transam community-operators{master}]$ /tmp/operator-sdk/operator-sdk run packagemanifests ./jenkins-operator --olm-namespace openshift-operator-lifecycle-manager --operator-version 0.4.1-rc3 --operator-namespace $TEST_NAMESPACE
INFO[0000] Running operator from directory ./jenkins-operator
INFO[0000] Creating jenkins-operator registry
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-2-2"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-3-31"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-0"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc1"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc2"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc3"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-package"
INFO[0001]   Creating Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001]   Creating Service "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001] Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" rollout to complete
INFO[0001]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 out of 1 new replicas have been updated
INFO[0002]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 of 1 updated replicas are available
INFO[0010]   Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" successfully rolled out
INFO[0010] Creating resources
INFO[0010]   Creating CatalogSource "test-jenkins-operator/jenkins-operator-ocs"
INFO[0010]   Creating Subscription "test-jenkins-operator/jenkins-operator-v0-4-1-rc3-sub"
INFO[0010]   Creating OperatorGroup "test-jenkins-operator/operator-sdk-og"
INFO[0010] Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to reach 'Succeeded' phase
INFO[0010]   Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to appear
INFO[0014]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Pending
INFO[0017]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Installing
INFO[0032]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Succeeded
INFO[0032] Successfully installed "jenkins-operator.v0.4.1-rc3" on OLM version "0.16.1"
NAME                           NAMESPACE                KIND                        STATUS
jenkinsimages.jenkins.io       test-jenkins-operator    CustomResourceDefinition    Installed
jenkins.jenkins.io             test-jenkins-operator    CustomResourceDefinition    Installed
jenkins-operator.v0.4.1-rc3    test-jenkins-operator    ClusterServiceVersion       Installed
```

It still works against a KinD cluster:

```
[jesusr@transam community-operators{master}]$ /tmp/operator-sdk/operator-sdk run packagemanifests ./jenkins-operator  --operator-version 0.4.1-rc3 --operator-namespace $TEST_NAMESPACE
INFO[0000] Running operator from directory ./jenkins-operator 
INFO[0000] Creating jenkins-operator registry           
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-3-31" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-0" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-1-rc1" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-1-rc2" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-1-rc3" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-package" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-2-2" 
INFO[0000]   Creating Deployment "olm/jenkins-operator-registry-server" 
INFO[0000]   Creating Service "olm/jenkins-operator-registry-server" 
INFO[0000] Waiting for Deployment "olm/jenkins-operator-registry-server" rollout to complete 
INFO[0000] Waiting for Deployment "olm/jenkins-operator-registry-server" to rollout: waiting for deployment spec update to be observed 
INFO[0001]   Waiting for Deployment "olm/jenkins-operator-registry-server" to rollout: 0 of 1 updated replicas are available 
INFO[0007]   Deployment "olm/jenkins-operator-registry-server" successfully rolled out 
INFO[0007] Creating resources                           
INFO[0007]   Creating CatalogSource "test-jenkins-operator/jenkins-operator-ocs" 
INFO[0007]   Creating Subscription "test-jenkins-operator/jenkins-operator-v0-4-1-rc3-sub" 
INFO[0007]   Creating OperatorGroup "test-jenkins-operator/operator-sdk-og" 
INFO[0007] Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to reach 'Succeeded' phase 
INFO[0007]   Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to appear 
INFO[0013]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Pending 
INFO[0015]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Installing 
INFO[0028]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Succeeded 
INFO[0028] Successfully installed "jenkins-operator.v0.4.1-rc3" on OLM version "0.15.1" 
NAME                           NAMESPACE                KIND                        STATUS
jenkinsimages.jenkins.io       test-jenkins-operator    CustomResourceDefinition    Installed
jenkins.jenkins.io             test-jenkins-operator    CustomResourceDefinition    Installed
jenkins-operator.v0.4.1-rc3    test-jenkins-operator    ClusterServiceVersion       Installed
```

Comment 3 Jesus M. Rodriguez 2020-09-30 21:43:42 UTC
Moving this bug to OLM as the problem is in OLM upstream image builder that is used by sdk. PR 466 addresses the problem: https://github.com/operator-framework/operator-registry/pull/466#issuecomment-701661467

Comment 5 Bruno Andrade 2020-10-03 20:48:32 UTC
Looks good right now thanks. Marking as VERIFIED.
                                                                              
operator-sdk version: "v0.19.4", commit: "125d0dfcc71fef4f9d7e2a42b1354cb79ffdee03", kubernetes version: "v1.18.2", go version: "go1.13.15 linux/amd64"
OCP: 4.6.0-0.nightly-2020-10-03-051134


 operator-sdk run --olm --operator-version 0.0.1 --olm-namespace openshift-operator-lifecycle-manager                                                        
Flag --olm has been deprecated, use 'run packagemanifests' instead
Flag --operator-version has been deprecated, use this flag with 'run packagemanifests' instead
Flag --olm-namespace has been deprecated, use this flag with 'run packagemanifests' instead
INFO[0002] Creating memcached-operator registry         
INFO[0002]   Creating ConfigMap "openshift-operator-lifecycle-manager/memcached-operator-registry-manifests-package" 
INFO[0003]   Creating ConfigMap "openshift-operator-lifecycle-manager/memcached-operator-registry-manifests-0-0-1" 
INFO[0003]   Creating Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" 
INFO[0003]   Creating Service "openshift-operator-lifecycle-manager/memcached-operator-registry-server" 
INFO[0003] Waiting for Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" rollout to complete 
INFO[0004]   Waiting for Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" to rollout: 0 of 1 updated replicas are available 
INFO[0008]   Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" successfully rolled out 
INFO[0008] Creating resources                           
INFO[0008]   Creating CatalogSource "default/memcached-operator-ocs" 
INFO[0008]   Creating Subscription "default/memcached-operator-v0-0-1-sub" 
INFO[0008]   Creating OperatorGroup "default/operator-sdk-og" 
INFO[0008] Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to reach 'Succeeded' phase 
INFO[0008]   Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to appear 
INFO[0013]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Pending 
INFO[0016]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Installing 
INFO[0023]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Succeeded 
INFO[0023] Successfully installed "memcached-operator.v0.0.1" on OLM version "0.16.1" 
NAME                            NAMESPACE    KIND                        STATUS
memcached-operator.v0.0.1       default      ClusterServiceVersion       Installed
memcacheds.cache.example.com    default      CustomResourceDefinition    Installed


 oc get pods -n default
NAME                                  READY   STATUS    RESTARTS   AGE
memcached-operator-6db56fdb94-whshr   1/1     Running   0          10

Comment 8 errata-xmlrpc 2020-10-27 16:44:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196