Bug 1666212 - The packageserver of the OLM crash
Summary: The packageserver of the OLM crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Evan Cordell
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-15 08:44 UTC by Jian Zhang
Modified: 2019-06-04 10:42 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:41:55 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:42:02 UTC

Description Jian Zhang 2019-01-15 08:44:54 UTC
Description of problem:
[core@ip-10-0-2-236 ~]$ oc get packagemanifest --all-namespaces
No resources found.
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get packagemanifests.packages.apps.redhat.com)
[core@ip-10-0-2-236 ~]$ oc get pods
NAME                                READY     STATUS             RESTARTS   AGE
catalog-operator-5fc76d5986-82zdv   1/1       Running            0          5h
olm-operator-79b6d85749-tlxvw       1/1       Running            0          5h
olm-operators-s9kww                 1/1       Running            0          5h
packageserver-985786f6b-znlbc       0/1       CrashLoopBackOff   19         5h

Version-Release number of selected component (if applicable):
cluster version: 4.0.0-0.nightly-2019-01-15-010905

OLM version: 0.8.1
The OLM code source version:
"io.openshift.source-repo-commit": "47482491fb29def1a3df05c3178b07de5761708f"

How reproducible:
often

Steps to Reproduce:
1. Install the OCP 4.0 by the openshift-installer
2. After running a while, run `oc get packagemanifest --all-namespaces`.


Actual results:
[core@ip-10-0-2-236 ~]$ oc get packagemanifest --all-namespaces
No resources found.
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get packagemanifests.packages.apps.redhat.com)


Expected results:
The package server works stable.

Additional info:
The package server logs:
[core@ip-10-0-2-236 ~]$ oc logs packageserver-985786f6b-znlbc
...
I0115 07:01:50.483461       1 wrap.go:42] GET /healthz: (203.718µs) 200 [[kube-probe/1.11+] 10.128.0.1:37558]
E0115 07:01:57.126536       1 webhook.go:192] Failed to make webhook authorizer request: subjectaccessreviews.authorization.k8s.io is forbidden: User "system:serviceaccount:openshift-operator-lifecycle-manager:packageserver" cannot create subjectaccessreviews.authorization.k8s.io at the cluster scope: no RBAC policy matched
E0115 07:01:57.126802       1 errors.go:90] subjectaccessreviews.authorization.k8s.io is forbidden: User "system:serviceaccount:openshift-operator-lifecycle-manager:packageserver" cannot create subjectaccessreviews.authorization.k8s.io at the cluster scope: no RBAC policy matched
I0115 07:01:57.126872       1 wrap.go:42] GET /healthz: (217.367529ms) 500
goroutine 142149 [running]:
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc420150fc0, 0x1f4)
	/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:207 +0xd2
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc420150fc0, 0x1f4)
	/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:186 +0x35
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).WriteHeader(0xc421568f40, 0x1f4)
	/go/src/github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:197 +0xac
github.com/operator-framework/operator-lifecycle-manager/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.InternalError(0x7f2246a30970, 0xc42009c648, 0xc420e4c000, 0x7f22469f04e0, 0xc4215a21e0)
...

I delete that pod and recreate it, got below errors:
NAME                                READY     STATUS             RESTARTS   AGE
catalog-operator-5fc76d5986-82zdv   1/1       Running            0          6h
olm-operator-79b6d85749-tlxvw       1/1       Running            0          6h
olm-operators-s9kww                 1/1       Running            0          6h
packageserver-985786f6b-qsfjt       0/1       CrashLoopBackOff   3          1m

[core@ip-10-0-2-236 ~]$ oc logs -f packageserver-985786f6b-qsfjt  
W0115 08:42:28.542195       1 authentication.go:245] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
Error: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:openshift-operator-lifecycle-manager:packageserver" cannot get configmaps in the namespace "kube-system": no RBAC policy matched
Usage:
   [flags]

Flags:
      --alsologtostderr                                         log to standard error as well as files
      --authentication-kubeconfig string                        kubeconfig file pointing at the 'core' kubernetes server with enough rights to create tokenaccessreviews.authentication.k8s.io.
      --authentication-skip-lookup                              If false, the authentication-kubeconfig will be used to lookup missing authentication configuration from the cluster.
      --authentication-token-webhook-cache-ttl duration         The duration to cache responses from the webhook token authenticator. (default 10s)
      --authorization-kubeconfig string                         kubeconfig file pointing at the 'core' kubernetes server with enough rights to create  subjectaccessreviews.authorization.k8s.io.
      --authorization-webhook-cache-authorized-ttl duration     The duration to cache 'authorized' responses from the webhook authorizer. (default 10s)
      --authorization-webhook-cache-unauthorized-ttl duration   The duration to cache 'unauthorized' responses from the webhook authorizer. (default 10s)
      --bind-address ip                                         The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces). (default 0.0.0.0)
      --cert-dir string                                         The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "apiserver.local.config/certificates")
      --client-ca-file string                                   If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate.
      --contention-profiling                                    Enable lock contention profiling, if profiling is enabled
      --debug                                                   use debug log level
      --enable-swagger-ui                                       Enables swagger ui on the apiserver at /swagger-ui
      --global-namespace string                                 Name of the namespace where the global CatalogSources are located
  -h, --help                                                    help for this command
      --http2-max-streams-per-connection int                    The limit that the server gives to clients for the maximum number of streams in an HTTP/2 connection. Zero means to use golang's default.
      --interval duration                                       Interval at which to re-sync CatalogSources (default 5m0s)
      --kubeconfig string                                       The path to the kubeconfig used to connect to the Kubernetes API server and the Kubelets (defaults to in-cluster config)
      --log-flush-frequency duration                            Maximum number of seconds between log flushes (default 5s)
      --log_backtrace_at traceLocation                          when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                                          If non-empty, write log files in this directory
      --logtostderr                                             log to standard error instead of files (default true)
      --profiling                                               Enable profiling via web interface host:port/debug/pprof/ (default true)
      --requestheader-allowed-names strings                     List of client certificate common names to allow to provide usernames in headers specified by --requestheader-username-headers. If empty, any client certificate validated by the authorities in --requestheader-client-ca-file is allowed.
      --requestheader-client-ca-file string                     Root certificate bundle to use to verify client certificates on incoming requests before trusting usernames in headers specified by --requestheader-username-headers. WARNING: generally do not depend on authorization being already done for incoming requests.
      --requestheader-extra-headers-prefix strings              List of request header prefixes to inspect. X-Remote-Extra- is suggested. (default [x-remote-extra-])
      --requestheader-group-headers strings                     List of request headers to inspect for groups. X-Remote-Group is suggested. (default [x-remote-group])
      --requestheader-username-headers strings                  List of request headers to inspect for usernames. X-Remote-User is common. (default [x-remote-user])
      --secure-port int                                         The port on which to serve HTTPS with authentication and authorization. If 0, don't serve HTTPS at all. (default 443)
      --stderrthreshold severity                                logs at or above this threshold go to stderr (default 2)
      --tls-cert-file string                                    File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir.
      --tls-cipher-suites strings                               Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be use.  Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA
      --tls-min-version string                                  Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12
      --tls-private-key-file string                             File containing the default x509 private key matching --tls-cert-file.
      --tls-sni-cert-key namedCertKey                           A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default [])
  -v, --v Level                                                 log level for V logs (default 0)
      --vmodule moduleSpec                                      comma-separated list of pattern=N settings for file-filtered logging
      --watched-namespaces strings                              List of namespaces the package-server will watch watch for CatalogSources

time="2019-01-15T08:42:28Z" level=fatal msg="configmaps \"extension-apiserver-authentication\" is forbidden: User \"system:serviceaccount:openshift-operator-lifecycle-manager:packageserver\" cannot get configmaps in the namespace \"kube-system\": no RBAC policy matched"

Comment 1 Nick Hale 2019-02-15 15:22:39 UTC
I'm not seeing this in fresh 4.0 instances, can we confirm that this is still an issue?

Comment 2 Jian Zhang 2019-02-19 08:43:14 UTC
Nick,

Yes, it works well, no crash now. Could you help change status to "ON_QA" so that I can verify it?
[jzhang@dhcp-140-18 payload]$ oc get pods
NAME                                READY     STATUS    RESTARTS   AGE
catalog-operator-7fc5d98dbd-vf5rc   1/1       Running   0          6h18m
olm-operator-75558c6d7-s7mrt        1/1       Running   0          31m
olm-operators-ldt4l                 1/1       Running   0          6h18m
packageserver-54d858d7c6-jwkmf      1/1       Running   0          15m
packageserver-54d858d7c6-kx478      1/1       Running   0          15m

[jzhang@dhcp-140-18 payload]$ oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.ci-2019-02-18-160525   True        False         6h7m      Cluster version is 4.0.0-0.ci-2019-02-18-160525
OLM build commit id: cce4af21efb662527a8f71d22f7f2c37007ea4bf

Comment 5 errata-xmlrpc 2019-06-04 10:41:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.