Bug 1882103 - memcached-operator-registry-server launch by operator-sdk run packagemanifests fails
Summary: memcached-operator-registry-server launch by operator-sdk run packagemanifest...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Evan Cordell
QA Contact: Bruno Andrade
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-23 19:12 UTC by Tom Buskey
Modified: 2020-10-27 16:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:44:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-registry pull 466 0 None closed Bug 1882103: set group permissions on /etc 2021-02-17 02:09:11 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:45:15 UTC

Description Tom Buskey 2020-09-23 19:12:34 UTC
Description of problem:
Following https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30834
at stage operator-sdk run packagemanifests  --operator-version 0.0.1 --olm-namespace openshift-operator-lifecycle-manager, it times our waiting for the CSV


Version-Release number of selected component (if applicable):
operator-sdk version;oc version
operator-sdk version: "v0.19.4-1-g416d4466", commit: "416d4466d73d5a66e86eebbe4f5c7d48a1a51416", kubernetes version: "v1.18.2", go version: "go1.15 linux/amd64"
Client Version: openshift-clients-4.6.0-202006250705.p0-137-g0a570695f
Server Version: 4.6.0-0.nightly-2020-09-23-022756
Kubernetes Version: v1.19.0+8a39924


How reproducible:
always

Steps to Reproduce:
1. Follow https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30834
2. ...
3. operator-sdk run packagemanifests  --operator-version 0.0.1 --olm-namespace openshift-operator-lifecycle-manager


Actual results:
NFO[0048] Waiting for ClusterServiceVersion "openshift-operator-lifecycle-manager/memcached-operator.v0.0.1" to reach 'Succeeded' phase 
INFO[0048]   Waiting for ClusterServiceVersion "openshift-operator-lifecycle-manager/memcached-operator.v0.0.1" to appear 
FATA[0120] Failed to run operator: error waiting for CSV to install: timed out waiting for the condition 
#

oc get pod
memcached-operator-registry-server-5cfbc4f8bd-h2tjr   0/1     CrashLoopBackOff   6          7m30s

oc logs memcached-operator-registry-server-5cfbc4f8bd-h2tjr :
ime="2020-09-23T18:42:49Z" level=info msg="skipping hidden directory" dir=/registry/manifests file=..2020_09_23_18_41_14.602544883 load=package
time="2020-09-23T18:42:49Z" level=info msg="skipping hidden file" dir=/registry/manifests file=..data load=package
Error: open /etc/nsswitch.conf: permission denied
Usage:
   [flags]

Flags:
  -d, --database string          relative path to sqlite db (default "bundles.db")
  -h, --help                     help for this command
  -p, --port string              port number to serve on (default "50051")
      --skip-migrate             do  not attempt to migrate to the latest db revision when starting
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")

time="2020-09-23T18:42:49Z" level=panic msg="open /etc/nsswitch.conf: permission denied"
panic: (*logrus.Entry) (0x1138d00,0xc0001afb20)

Expected results:
NFO[0051] Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to reach 'Succeeded' phase
INFO[0052]   Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to appear
INFO[0056]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Pending
INFO[0058]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Installing
INFO[0087]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Succeeded
INFO[0088] Successfully installed "memcached-operator.v0.0.1" on OLM version "0.14.2"



Additional info:

Comment 1 Jesus M. Rodriguez 2020-09-24 22:47:05 UTC
#### Running existing image with no /etc/nsswitch.conf against OCP 4.6.0

I was able to recreate the problem against OCP 4.6.0 with the existing image.
```
[jesusr@transam community-operators{master}]$ operator-sdk run packagemanifests ./jenkins-operator --olm-namespace openshift-operator-lifecycle-manager --operator-version 0.4.1-rc3 --operator-namespace $TEST_NAMESPACE
INFO[0000] Running operator from directory ./jenkins-operator
INFO[0000] Creating jenkins-operator registry
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-package"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-2-2"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-3-31"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-0"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc1"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc2"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc3"
INFO[0001]   Creating Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001]   Creating Service "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001] Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" rollout to complete
INFO[0001]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 out of 1 new replicas have been updated
INFO[0002]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 of 1 updated replicas are available
FATA[0120] Failed to run operator: error creating registry resources: error registering package: error waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to roll out: timed out waiting for the condition
```

Looking at the registry-server we created you can see the same error identified in the bugzilla:

```
[jesusr@transam community-operators{master}]$ oc logs jenkins-operator-registry-server-6ff8f55cb7-ms678 -n openshift-operator-lifecycle-manager
time="2020-09-24T22:12:39Z" level=info msg="loading Bundles" dir=/registry/manifests
time="2020-09-24T22:12:39Z" level=info msg=directory dir=/registry/manifests file=manifests load=bundles
time="2020-09-24T22:12:39Z" level=info msg=directory dir=/registry/manifests file=jenkins-operator-registry-manifests-0-2-2 load=bundles
time="2020-09-24T22:12:39Z" level=info msg="skipping hidden directory" dir=/registry/manifests file=..2020_09_24_22_06_50.262414143 load=bundles
time="2020-09-24T22:12:39Z" level=info msg="skipping hidden file" dir=/registry/manifests file=..data load=bundles
[snip]

Error: open /etc/nsswitch.conf: permission denied
Usage:
   [flags]

Flags:
  -d, --database string          relative path to sqlite db (default "bundles.db")
  -h, --help                     help for this command
  -p, --port string              port number to serve on (default "50051")
      --skip-migrate             do  not attempt to migrate to the latest db revision when starting
  -t, --termination-log string   path to a container termination log file (default "/dev/termination-log")

time="2020-09-24T22:12:40Z" level=panic msg="open /etc/nsswitch.conf: permission denied"
panic: (*logrus.Entry) (0x1138d20,0xc0001e1b20)

goroutine 1 [running]:
github.com/sirupsen/logrus.Entry.log(0xc0000cc000, 0xc000227770, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /build/vendor/github.com/sirupsen/logrus/entry.go:239 +0x2db
github.com/sirupsen/logrus.(*Entry).Log(0xc0001e1ab0, 0xc000000000, 0xc000169f40, 0x1, 0x1)
        /build/vendor/github.com/sirupsen/logrus/entry.go:268 +0xeb
github.com/sirupsen/logrus.(*Logger).Log(0xc0000cc000, 0x0, 0xc000169f40, 0x1, 0x1)
        /build/vendor/github.com/sirupsen/logrus/logger.go:192 +0x7d
github.com/sirupsen/logrus.(*Logger).Panic(...)
        /build/vendor/github.com/sirupsen/logrus/logger.go:233
github.com/sirupsen/logrus.Panic(...)
        /build/vendor/github.com/sirupsen/logrus/exported.go:129
```

Comment 2 Jesus M. Rodriguez 2020-09-24 22:47:39 UTC
FIX worked

#### Running new image with /etc/nsswitch.conf added

Using the new image against OCP 4.6.0-rc.7, it worked as expected.

```
[jesusr@transam community-operators{master}]$ /tmp/operator-sdk/operator-sdk run packagemanifests ./jenkins-operator --olm-namespace openshift-operator-lifecycle-manager --operator-version 0.4.1-rc3 --operator-namespace $TEST_NAMESPACE
INFO[0000] Running operator from directory ./jenkins-operator
INFO[0000] Creating jenkins-operator registry
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-2-2"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-3-31"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-0"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc1"
INFO[0000]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc2"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-0-4-1-rc3"
INFO[0001]   Creating ConfigMap "openshift-operator-lifecycle-manager/jenkins-operator-registry-manifests-package"
INFO[0001]   Creating Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001]   Creating Service "openshift-operator-lifecycle-manager/jenkins-operator-registry-server"
INFO[0001] Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" rollout to complete
INFO[0001]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 out of 1 new replicas have been updated
INFO[0002]   Waiting for Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" to rollout: 0 of 1 updated replicas are available
INFO[0010]   Deployment "openshift-operator-lifecycle-manager/jenkins-operator-registry-server" successfully rolled out
INFO[0010] Creating resources
INFO[0010]   Creating CatalogSource "test-jenkins-operator/jenkins-operator-ocs"
INFO[0010]   Creating Subscription "test-jenkins-operator/jenkins-operator-v0-4-1-rc3-sub"
INFO[0010]   Creating OperatorGroup "test-jenkins-operator/operator-sdk-og"
INFO[0010] Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to reach 'Succeeded' phase
INFO[0010]   Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to appear
INFO[0014]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Pending
INFO[0017]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Installing
INFO[0032]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Succeeded
INFO[0032] Successfully installed "jenkins-operator.v0.4.1-rc3" on OLM version "0.16.1"
NAME                           NAMESPACE                KIND                        STATUS
jenkinsimages.jenkins.io       test-jenkins-operator    CustomResourceDefinition    Installed
jenkins.jenkins.io             test-jenkins-operator    CustomResourceDefinition    Installed
jenkins-operator.v0.4.1-rc3    test-jenkins-operator    ClusterServiceVersion       Installed
```

It still works against a KinD cluster:

```
[jesusr@transam community-operators{master}]$ /tmp/operator-sdk/operator-sdk run packagemanifests ./jenkins-operator  --operator-version 0.4.1-rc3 --operator-namespace $TEST_NAMESPACE
INFO[0000] Running operator from directory ./jenkins-operator 
INFO[0000] Creating jenkins-operator registry           
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-3-31" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-0" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-1-rc1" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-1-rc2" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-4-1-rc3" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-package" 
INFO[0000]   Creating ConfigMap "olm/jenkins-operator-registry-manifests-0-2-2" 
INFO[0000]   Creating Deployment "olm/jenkins-operator-registry-server" 
INFO[0000]   Creating Service "olm/jenkins-operator-registry-server" 
INFO[0000] Waiting for Deployment "olm/jenkins-operator-registry-server" rollout to complete 
INFO[0000] Waiting for Deployment "olm/jenkins-operator-registry-server" to rollout: waiting for deployment spec update to be observed 
INFO[0001]   Waiting for Deployment "olm/jenkins-operator-registry-server" to rollout: 0 of 1 updated replicas are available 
INFO[0007]   Deployment "olm/jenkins-operator-registry-server" successfully rolled out 
INFO[0007] Creating resources                           
INFO[0007]   Creating CatalogSource "test-jenkins-operator/jenkins-operator-ocs" 
INFO[0007]   Creating Subscription "test-jenkins-operator/jenkins-operator-v0-4-1-rc3-sub" 
INFO[0007]   Creating OperatorGroup "test-jenkins-operator/operator-sdk-og" 
INFO[0007] Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to reach 'Succeeded' phase 
INFO[0007]   Waiting for ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" to appear 
INFO[0013]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Pending 
INFO[0015]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Installing 
INFO[0028]   Found ClusterServiceVersion "test-jenkins-operator/jenkins-operator.v0.4.1-rc3" phase: Succeeded 
INFO[0028] Successfully installed "jenkins-operator.v0.4.1-rc3" on OLM version "0.15.1" 
NAME                           NAMESPACE                KIND                        STATUS
jenkinsimages.jenkins.io       test-jenkins-operator    CustomResourceDefinition    Installed
jenkins.jenkins.io             test-jenkins-operator    CustomResourceDefinition    Installed
jenkins-operator.v0.4.1-rc3    test-jenkins-operator    ClusterServiceVersion       Installed
```

Comment 3 Jesus M. Rodriguez 2020-09-30 21:43:42 UTC
Moving this bug to OLM as the problem is in OLM upstream image builder that is used by sdk. PR 466 addresses the problem: https://github.com/operator-framework/operator-registry/pull/466#issuecomment-701661467

Comment 5 Bruno Andrade 2020-10-03 20:48:32 UTC
Looks good right now thanks. Marking as VERIFIED.
                                                                              
operator-sdk version: "v0.19.4", commit: "125d0dfcc71fef4f9d7e2a42b1354cb79ffdee03", kubernetes version: "v1.18.2", go version: "go1.13.15 linux/amd64"
OCP: 4.6.0-0.nightly-2020-10-03-051134


 operator-sdk run --olm --operator-version 0.0.1 --olm-namespace openshift-operator-lifecycle-manager                                                        
Flag --olm has been deprecated, use 'run packagemanifests' instead
Flag --operator-version has been deprecated, use this flag with 'run packagemanifests' instead
Flag --olm-namespace has been deprecated, use this flag with 'run packagemanifests' instead
INFO[0002] Creating memcached-operator registry         
INFO[0002]   Creating ConfigMap "openshift-operator-lifecycle-manager/memcached-operator-registry-manifests-package" 
INFO[0003]   Creating ConfigMap "openshift-operator-lifecycle-manager/memcached-operator-registry-manifests-0-0-1" 
INFO[0003]   Creating Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" 
INFO[0003]   Creating Service "openshift-operator-lifecycle-manager/memcached-operator-registry-server" 
INFO[0003] Waiting for Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" rollout to complete 
INFO[0004]   Waiting for Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" to rollout: 0 of 1 updated replicas are available 
INFO[0008]   Deployment "openshift-operator-lifecycle-manager/memcached-operator-registry-server" successfully rolled out 
INFO[0008] Creating resources                           
INFO[0008]   Creating CatalogSource "default/memcached-operator-ocs" 
INFO[0008]   Creating Subscription "default/memcached-operator-v0-0-1-sub" 
INFO[0008]   Creating OperatorGroup "default/operator-sdk-og" 
INFO[0008] Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to reach 'Succeeded' phase 
INFO[0008]   Waiting for ClusterServiceVersion "default/memcached-operator.v0.0.1" to appear 
INFO[0013]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Pending 
INFO[0016]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Installing 
INFO[0023]   Found ClusterServiceVersion "default/memcached-operator.v0.0.1" phase: Succeeded 
INFO[0023] Successfully installed "memcached-operator.v0.0.1" on OLM version "0.16.1" 
NAME                            NAMESPACE    KIND                        STATUS
memcached-operator.v0.0.1       default      ClusterServiceVersion       Installed
memcacheds.cache.example.com    default      CustomResourceDefinition    Installed


 oc get pods -n default
NAME                                  READY   STATUS    RESTARTS   AGE
memcached-operator-6db56fdb94-whshr   1/1     Running   0          10

Comment 8 errata-xmlrpc 2020-10-27 16:44:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.