Bug 1575606 - installation failed on: Verify that the catalog api server is running
Summary: installation failed on: Verify that the catalog api server is running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Catalog
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.0
Assignee: Jay Boyd
QA Contact: Hongkai Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-07 12:19 UTC by Hongkai Liu
Modified: 2018-07-30 19:14 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
fixed in current release
Clone Of:
Environment:
Last Closed: 2018-07-30 19:14:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:14:59 UTC

Description Hongkai Liu 2018-05-07 12:19:15 UTC
Description of problem:

Version-Release number of the following components:
$ ansible --version
ansible 2.4.3.0

$ rpm -q ansible
ansible-2.4.3.0-1.fc27.noarch

$ git log --oneline -1
17155a2eb (HEAD -> master, origin/master, origin/HEAD) Merge pull request #8239 from vrutkovs/containerized-3.9-to-3.10

How reproducible: Always

Steps to Reproduce:
$ ansible-playbook -i /tmp/2.file openshift-ansible/playbooks/deploy_cluster.yml -vvv

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

# oc get pod
NAME                       READY     STATUS             RESTARTS   AGE
apiserver-lzfdf            0/1       CrashLoopBackOff   8          20m
controller-manager-thkkj   0/1       CrashLoopBackOff   16         1h

# oc describe pod apiserver-lzfdf
Name:           apiserver-lzfdf
Namespace:      kube-service-catalog
Node:           ip-172-31-12-183.us-west-2.compute.internal/172.31.12.183
Start Time:     Mon, 07 May 2018 11:57:46 +0000
Labels:         app=apiserver
                controller-revision-hash=2231596181
                pod-template-generation=3
Annotations:    ca_hash=329167a02b24e855ae23c185434575b7d2884b01
                openshift.io/scc=hostmount-anyuid
Status:         Running
IP:             172.20.0.6
Controlled By:  DaemonSet/apiserver
Containers:
  apiserver:
    Container ID:  cri-o://284dce31652fb2fb4e447f277c8315386a618e250b5b6a6433297e376943eb6a
    Image:         registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10
    Image ID:      registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:29d17e4d123de20b9e73738bf17ceea1718212276176b1f027049813e745477d
    Port:          6443/TCP
    Host Port:     0/TCP
    Command:
      /usr/bin/service-catalog
    Args:
      apiserver
      --storage-type
      etcd
      --secure-port
      6443
      --etcd-servers
      https://ip-172-31-12-183.us-west-2.compute.internal:2379
      --etcd-cafile
      /etc/origin/master/master.etcd-ca.crt
      --etcd-certfile
      /etc/origin/master/master.etcd-client.crt
      --etcd-keyfile
      /etc/origin/master/master.etcd-client.key
      -v
      3
      --cors-allowed-origins
      localhost
      --enable-admission-plugins
      KubernetesNamespaceLifecycle,DefaultServicePlan,ServiceBindingsLifecycle,ServicePlanChangeValidator,BrokerAuthSarCheck
      --feature-gates
      OriginatingIdentity=true
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 07 May 2018 12:13:42 +0000
      Finished:     Mon, 07 May 2018 12:13:42 +0000
    Ready:          False
    Restart Count:  8
    Environment:    <none>
    Mounts:
      /etc/origin/master from etcd-host-cert (ro)
      /var/run/kubernetes-service-catalog from apiserver-ssl (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from service-catalog-apiserver-token-9qkck (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  apiserver-ssl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apiserver-ssl
    Optional:    false
  etcd-host-cert:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/origin/master
    HostPathType:  
  data-dir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  service-catalog-apiserver-token-9qkck:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  service-catalog-apiserver-token-9qkck
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=true
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason                 Age                 From                                                  Message
  ----     ------                 ----                ----                                                  -------
  Normal   SuccessfulMountVolume  20m                 kubelet, ip-172-31-12-183.us-west-2.compute.internal  MountVolume.SetUp succeeded for volume "data-dir"
  Normal   SuccessfulMountVolume  20m                 kubelet, ip-172-31-12-183.us-west-2.compute.internal  MountVolume.SetUp succeeded for volume "etcd-host-cert"
  Normal   SuccessfulMountVolume  20m                 kubelet, ip-172-31-12-183.us-west-2.compute.internal  MountVolume.SetUp succeeded for volume "service-catalog-apiserver-token-9qkck"
  Normal   SuccessfulMountVolume  20m                 kubelet, ip-172-31-12-183.us-west-2.compute.internal  MountVolume.SetUp succeeded for volume "apiserver-ssl"
  Normal   Started                19m (x4 over 20m)   kubelet, ip-172-31-12-183.us-west-2.compute.internal  Started container
  Normal   Pulled                 19m (x5 over 20m)   kubelet, ip-172-31-12-183.us-west-2.compute.internal  Container image "registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10" already present on machine
  Normal   Created                19m (x5 over 20m)   kubelet, ip-172-31-12-183.us-west-2.compute.internal  Created container
  Warning  BackOff                43s (x90 over 20m)  kubelet, ip-172-31-12-183.us-west-2.compute.internal  Back-off restarting failed container



Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Scott Dodson 2018-05-07 12:30:19 UTC
Can you please gather logs from the service catalog pods that are crash looping?

Comment 4 Hongkai Liu 2018-05-07 12:55:51 UTC
This is from another try:

# oc project
Using project "kube-service-catalog" on server "https://ip-172-31-12-183.us-west-2.compute.internal:8443".

# oc get pod
NAME                       READY     STATUS             RESTARTS   AGE
apiserver-bn2c7            0/1       CrashLoopBackOff   16         25m
controller-manager-thkkj   1/1       Running            26         1h


# oc logs apiserver-bn2c7
Error: unknown flag: --enable-admission-plugins

The main API entrypoint and interface to the storage system. The API server is
also the focal point for all authorization decisions.

Usage:
  apiserver [flags]

Available Flags:
      --admission-control stringSlice                           Admission is divided into two phases. In the first phase, only mutating admission plugins run. In the second phase, only validating admission plugins run. The names in the below list may represent a validating plugin, a mutating plugin, or both. Within each phase, the plugins will run in the order in which they are passed to this flag. Comma-delimited list of: BrokerAuthSarCheck, DefaultServicePlan, Initializers, KubernetesNamespaceLifecycle, MutatingAdmissionWebhook, NamespaceLifecycle, ServiceBindingsLifecycle, ServicePlanChangeValidator, ValidatingAdmissionWebhook.
      --admission-control-config-file string                    File with admission control configuration.
      --advertise-address ip                                    The IP address on which to advertise the apiserver to members of the cluster. This address must be reachable by the rest of the cluster. If blank, the --bind-address will be used. If --bind-address is unspecified, the host's default interface will be used.
      --alsologtostderr                                         log to standard error as well as files
      --audit-log-format string                                 Format of saved audits. "legacy" indicates 1-line text format for each event. "json" indicates structured json format. Requires the 'AdvancedAuditing' feature gate. Known formats are legacy,json. (default "json")
      --audit-log-maxage int                                    The maximum number of days to retain old audit log files based on the timestamp encoded in their filename.
      --audit-log-maxbackup int                                 The maximum number of old audit log files to retain.
      --audit-log-maxsize int                                   The maximum size in megabytes of the audit log file before it gets rotated.
      --audit-log-path string                                   If set, all requests coming to the apiserver will be logged to this file.  '-' means standard out.
      --audit-policy-file string                                Path to the file that defines the audit policy configuration. Requires the 'AdvancedAuditing' feature gate. With AdvancedAuditing, a profile is required to enable auditing.
      --audit-webhook-batch-buffer-size int                     The size of the buffer to store events before batching and sending to the webhook. Only used in batch mode. (default 10000)
      --audit-webhook-batch-initial-backoff duration            The amount of time to wait before retrying the first failed requests. Only used in batch mode. (default 10s)
      --audit-webhook-batch-max-size int                        The maximum size of a batch sent to the webhook. Only used in batch mode. (default 400)
      --audit-webhook-batch-max-wait duration                   The amount of time to wait before force sending the batch that hadn't reached the max size. Only used in batch mode. (default 30s)
      --audit-webhook-batch-throttle-burst int                  Maximum number of requests sent at the same moment if ThrottleQPS was not utilized before. Only used in batch mode. (default 15)
      --audit-webhook-batch-throttle-qps float32                Maximum average number of requests per second. Only used in batch mode. (default 10)
      --audit-webhook-config-file string                        Path to a kubeconfig formatted file that defines the audit webhook configuration. Requires the 'AdvancedAuditing' feature gate.
      --audit-webhook-mode string                               Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the webhook to buffer and send events asynchronously. Known modes are batch,blocking. (default "batch")
      --authentication-kubeconfig string                        kubeconfig file pointing at the 'core' kubernetes server with enough rights to create tokenaccessreviews.authentication.k8s.io.
      --authentication-skip-lookup                              If false, the authentication-kubeconfig will be used to lookup missing authentication configuration from the cluster.
      --authentication-token-webhook-cache-ttl duration         The duration to cache responses from the webhook token authenticator. (default 10s)
      --authorization-kubeconfig string                         kubeconfig file pointing at the 'core' kubernetes server with enough rights to create  subjectaccessreviews.authorization.k8s.io.
      --authorization-webhook-cache-authorized-ttl duration     The duration to cache 'authorized' responses from the webhook authorizer. (default 10s)
      --authorization-webhook-cache-unauthorized-ttl duration   The duration to cache 'unauthorized' responses from the webhook authorizer. (default 10s)
      --bind-address ip                                         The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0). (default 0.0.0.0)
      --cert-dir string                                         The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "/var/run/kubernetes-service-catalog")
      --client-ca-file string                                   If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate.
      --cors-allowed-origins stringSlice                        List of allowed origins for CORS, comma separated.  An allowed origin can be a regular expression to support subdomain matching. If this list is empty CORS will not be enabled.
      --default-watch-cache-size int                            Default watch cache size. If zero, watch cache will be disabled for resources that do not have a default watch size set. (default 100)
      --delete-collection-workers int                           Number of workers spawned for DeleteCollection call. These are used to speed up namespace cleanup. (default 1)
      --deserialization-cache-size int                          Number of deserialized json objects to cache in memory.
      --disable-auth                                            Disable authentication and authorization for testing purposes
      --enable-garbage-collector                                Enables the generic garbage collector. MUST be synced with the corresponding flag of the kube-controller-manager. (default true)
      --etcd-cafile string                                      SSL Certificate Authority file used to secure etcd communication.
      --etcd-certfile string                                    SSL certification file used to secure etcd communication.
      --etcd-compaction-interval duration                       The interval of compaction requests. If 0, the compaction request from apiserver is disabled. (default 5m0s)
      --etcd-keyfile string                                     SSL key file used to secure etcd communication.
      --etcd-prefix string                                      The prefix to prepend to all resource paths in etcd. (default "/registry")
      --etcd-servers stringSlice                                List of etcd servers to connect with (scheme://ip:port), comma separated.
      --etcd-servers-overrides stringSlice                      Per-resource etcd servers overrides, comma separated. The individual override format: group/resource#servers, where servers are http://ip:port, semicolon separated.
      --experimental-encryption-provider-config string          The file containing configuration for encryption providers to be used for storing secrets in etcd
      --external-hostname string                                The hostname to use when generating externalized URLs for this master (e.g. Swagger API Docs).
      --feature-gates mapStringBool                             A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
APIListChunking=true|false (BETA - default=true)
APIResponseCompression=true|false (ALPHA - default=false)
AdvancedAuditing=true|false (BETA - default=true)
AllAlpha=true|false (ALPHA - default=false)
AsyncBindingOperations=true|false (ALPHA - default=false)
Initializers=true|false (ALPHA - default=false)
NamespacedServiceBroker=true|false (ALPHA - default=false)
OriginatingIdentity=true|false (ALPHA - default=false)
PodPreset=true|false (ALPHA - default=false)
ResponseSchema=true|false (ALPHA - default=false)
StreamingProxyRedirects=true|false (BETA - default=true)
  -h, --help                                                    help for service-catalog
      --log-backtrace-at traceLocation                          when logging hits line file:N, emit a stack trace (default :0)
      --log-dir string                                          If non-empty, write log files in this directory
      --log-flush-frequency duration                            Maximum number of seconds between log flushes (default 5s)
      --logtostderr                                             log to standard error instead of files (default true)
      --master-service-namespace string                         DEPRECATED: the namespace from which the kubernetes master services should be injected into pods. (default "default")
      --max-mutating-requests-inflight int                      The maximum number of mutating requests in flight at a given time. When the server exceeds this, it rejects requests. Zero for no limit. (default 200)
      --max-requests-inflight int                               The maximum number of non-mutating requests in flight at a given time. When the server exceeds this, it rejects requests. Zero for no limit. (default 400)
      --min-request-timeout int                                 An optional field indicating the minimum number of seconds a handler must keep a request open before timing it out. Currently only honored by the watch request handler, which picks a randomized value above this number as the connection timeout, to spread out load. (default 1800)
      --request-timeout duration                                An optional field indicating the duration a handler must keep a request open before timing it out. This is the default request timeout for requests but may be overridden by flags such as --min-request-timeout for specific types of requests. (default 1m0s)
      --requestheader-allowed-names stringSlice                 List of client certificate common names to allow to provide usernames in headers specified by --requestheader-username-headers. If empty, any client certificate validated by the authorities in --requestheader-client-ca-file is allowed.
      --requestheader-client-ca-file string                     Root certificate bundle to use to verify client certificates on incoming requests before trusting usernames in headers specified by --requestheader-username-headers
      --requestheader-extra-headers-prefix stringSlice          List of request header prefixes to inspect. X-Remote-Extra- is suggested. (default [x-remote-extra-])
      --requestheader-group-headers stringSlice                 List of request headers to inspect for groups. X-Remote-Group is suggested. (default [x-remote-group])
      --requestheader-username-headers stringSlice              List of request headers to inspect for usernames. X-Remote-User is common. (default [x-remote-user])
      --secure-port int                                         The port on which to serve HTTPS with authentication and authorization. If 0, don't serve HTTPS at all. (default 443)
      --serve-openapi-spec                                      Whether this API server should serve the OpenAPI spec (problematic with older versions of kubectl)
      --stderrthreshold severity                                logs at or above this threshold go to stderr (default 2)
      --storage-backend string                                  The storage backend for persistence. Options: 'etcd3' (default), 'etcd2'.
      --storage-media-type string                               The media type to use to store objects in storage. Some resources or storage backends may only support a specific media type and will ignore this setting. (default "application/json")
      --storage-type string                                     The type of backing storage this API server should use (default "etcd")
      --target-ram-mb int                                       Memory limit for apiserver in MB (used to configure sizes of caches, etc.)
      --tls-ca-file string                                      If set, this certificate authority will used for secure access from Admission Controllers. This must be a valid PEM-encoded CA bundle. Altneratively, the certificate authority can be appended to the certificate provided by --tls-cert-file.
      --tls-cert-file string                                    File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir.
      --tls-private-key-file string                             File containing the default x509 private key matching --tls-cert-file.
      --tls-sni-cert-key namedCertKey                           A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default [])
  -v, --v Level                                                 log level for V logs (default 0)
      --version                                                 Print version information and quit
      --vmodule moduleSpec                                      comma-separated list of pattern=N settings for file-filtered logging
      --watch-cache                                             Enable watch caching in the apiserver (default true)
      --watch-cache-sizes stringSlice                           List of watch cache sizes for every resource (pods, nodes, etc.), comma separated. The individual override format: resource#size, where size is a number. It takes effect when watch-cache is enabled.

Comment 5 Jay Boyd 2018-05-07 13:31:40 UTC
There was some catalog build instability last week as we worked to upgrade to the latest upstream Service Catalog.  Unfortunately you have a build which is using a newer images of upstream catalog but it's missing necessary ansible installer changes.  git log shows this build is one too early - it's missing commit 642807537 which merges https://github.com/openshift/openshift-ansible/pull/8205

A newer build should work properly.  Sorry for the confusion.

Comment 6 Hongkai Liu 2018-05-07 14:02:16 UTC
@Jay,

which build can we expect to have the fix?

Currently, I am using
3.10.0-0.32.0.git.0.2b17fd0.el7

Comment 7 Jay Boyd 2018-05-07 14:15:53 UTC
From the tags in openshift-ansible, it looks like 3.10.0-0.34.0 or newer.

Comment 8 Hongkai Liu 2018-05-07 14:37:04 UTC
Thanks, Jay.
I will verify with 34+ build.

Comment 9 Hongkai Liu 2018-05-09 20:56:27 UTC
Verified with
root@ip-172-31-13-136: ~ # yum list installed | grep openshift
atomic-openshift.x86_64       3.10.0-0.37.0.git.0.a452028.el7


# oc get pod --all-namespaces | grep -E "catalog"
kube-service-catalog                apiserver-8cptt                                                  1/1       Running   0          18m
kube-service-catalog                controller-manager-dtpcx                                         1/1       Running   0          17m
root@ip-172-31-13-136: ~ # oc get pod --all-namespaces | grep -E "catalog|broker"
kube-service-catalog                apiserver-8cptt                                                  1/1       Running   0          18m
kube-service-catalog                controller-manager-dtpcx                                         1/1       Running   0          18m
openshift-ansible-service-broker    asb-1-zhqt2                                                      1/1       Running   0          17m
openshift-template-service-broker   apiserver-vlk5x                                                  1/1       Running   0          17m

Comment 11 errata-xmlrpc 2018-07-30 19:14:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.