Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1579227

Summary: [upgrade]service-catalog upgrade to 3.10 failed
Product: OpenShift Container Platform Reporter: Zihan Tang <zitang>
Component: Service CatalogAssignee: Jay Boyd <jaboyd>
Status: CLOSED DUPLICATE QA Contact: Zihan Tang <zitang>
Severity: medium Docs Contact:
Priority: high    
Version: 3.10.0CC: chezhang, jaboyd, jiazha, wmeng, zhsun, zitang
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-22 14:03:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Zihan Tang 2018-05-17 08:36:23 UTC
I'll share the upgrade log after bug is assigned.

Comment 3 Jay Boyd 2018-05-17 21:19:14 UTC
is this deployment still available?  I'd like to get the output from oc describe pod XXX

Or, can I ssh into the deployment to look around?

Comment 6 Zihan Tang 2018-05-21 09:26:13 UTC
In other v3.9 env, service-catalog upgrade to 3.10 successfully. This may be caused by env.

bug 1579261 which is about asb upgrade failed in the same env.

Comment 7 Jay Boyd 2018-05-21 14:37:38 UTC
daemonset was updated to list image as v3.10: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10
but pod is running a v3.9 image

[root@ip-172-18-12-195 ~]# oc describe daemonset controller-manager -n kube-service-catalog
Name:           controller-manager
Selector:       app=controller-manager
Node-Selector:  node-role.kubernetes.io/master=true
Labels:         app=controller-manager
Annotations:    <none>
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status:  1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=controller-manager
  Service Account:  service-catalog-controller
  Containers:
   controller-manager:
    Image:  registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10
    Port:   6443/TCP
    Command:
      /usr/bin/service-catalog
    Args:
      controller-manager
      --secure-port
      6443
      -v
      3
      --leader-election-namespace
      kube-service-catalog
      --leader-elect-resource-lock
      configmaps
      --broker-relist-interval
      5m
      --feature-gates
      OriginatingIdentity=true
      --feature-gates
      AsyncBindingOperations=true
    Environment:
      K8S_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /var/run/kubernetes-service-catalog from service-catalog-ssl (ro)
  Volumes:
   service-catalog-ssl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  controllermanager-ssl
    Optional:    false
Events:          <none>



$ oc get daemonsets controller-manager -n kube-service-catalog -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  creationTimestamp: 2018-05-21T07:12:06Z
  generation: 2
  labels:
    app: controller-manager
  name: controller-manager
  namespace: kube-service-catalog
  resourceVersion: "9789"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-service-catalog/daemonsets/controller-manager
  uid: 44e758d3-5cc6-11e8-9299-0e9a287ac26e
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: controller-manager
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: controller-manager
    spec:
      containers:
      - args:
        - controller-manager
        - --secure-port
        - "6443"
        - -v
        - "3"
        - --leader-election-namespace
        - kube-service-catalog
        - --leader-elect-resource-lock
        - configmaps
        - --broker-relist-interval
        - 5m
        - --feature-gates
        - OriginatingIdentity=true
        - --feature-gates
        - AsyncBindingOperations=true
        command:
        - /usr/bin/service-catalog
        env:
        - name: K8S_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10
        imagePullPolicy: IfNotPresent
        name: controller-manager
        ports:
        - containerPort: 6443
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/kubernetes-service-catalog
          name: service-catalog-ssl
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/master: "true"
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: service-catalog-controller
      serviceAccountName: service-catalog-controller
      terminationGracePeriodSeconds: 30
      volumes:
      - name: service-catalog-ssl
        secret:
          defaultMode: 420
          items:
          - key: tls.crt
            path: apiserver.crt
          - key: tls.key
            path: apiserver.key
          secretName: controllermanager-ssl
  templateGeneration: 2
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 1
  desiredNumberScheduled: 1
  numberAvailable: 1
  numberMisscheduled: 0
  numberReady: 1
  observedGeneration: 1
  updatedNumberScheduled: 1





POD is showing old version though:  image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.9.29


%oc describe pod -n kube-service-catalog controller-manager-6fj6c
Name:           controller-manager-6fj6c
Namespace:      kube-service-catalog
Node:           ip-172-18-12-195.ec2.internal/172.18.12.195
Start Time:     Mon, 21 May 2018 03:12:06 -0400
Labels:         app=controller-manager
                controller-revision-hash=2156777767
                pod-template-generation=1
Annotations:    openshift.io/scc=restricted
Status:         Running
IP:             10.2.0.11
Controlled By:  DaemonSet/controller-manager
Containers:
  controller-manager:
    Container ID:  docker://7444d9002f61620a2f27cac3621e752a400a40d14a47451b1251e59931e28d6e
    Image:         registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.9.29
    Image ID:      docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:0b5ad372b0e0e63d94d087311aaabb9af21a090b6487d9e820a4de9df9c85b83
    Port:          8080/TCP
    Command:
      /usr/bin/service-catalog
    Args:
      controller-manager
      --port
      8080
      -v
      3
      --leader-election-namespace
      kube-service-catalog
      --leader-elect-resource-lock
      configmaps
      --broker-relist-interval
      5m
      --feature-gates
      OriginatingIdentity=true
    State:          Running
      Started:      Mon, 21 May 2018 03:55:31 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Mon, 21 May 2018 03:53:47 -0400
      Finished:     Mon, 21 May 2018 03:54:37 -0400
    Ready:          True
    Restart Count:  5
    Environment:
      K8S_NAMESPACE:  kube-service-catalog (v1:metadata.namespace)
    Mounts:
      /var/run/kubernetes-service-catalog from service-catalog-ssl (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from service-catalog-controller-token-bwkp6 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  service-catalog-ssl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apiserver-ssl
    Optional:    false
  service-catalog-controller-token-bwkp6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  service-catalog-controller-token-bwkp6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=true
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:          <none>





$oc get pod -n kube-service-catalog controller-manager-6fj6c -o yaml
apiVersion: v1                                                                                                                                                                                                                               
kind: Pod                                                                                                                                                                                                                                    
metadata:                                                                                                                                                                                                                                    
  annotations:                                                                                                                                                                                                                               
    openshift.io/scc: restricted                                                                                                                                                                                                             
  creationTimestamp: 2018-05-21T07:12:06Z                                                                                                                                                                                                    
  generateName: controller-manager-                                                                                                                                                                                                          
  labels:                                                                                                                                                                                                                                    
    app: controller-manager                                                                                                                                                                                                                  
    controller-revision-hash: "2156777767"                                                                                                                                                                                                   
    pod-template-generation: "1"                                                                                                                                                                                                             
  name: controller-manager-6fj6c                                                                                                                                                                                                             
  namespace: kube-service-catalog                                                                                                                                                                                                            
  ownerReferences:                                                                                                                                                                                                                           
  - apiVersion: extensions/v1beta1                                                                                                                                                                                                           
    blockOwnerDeletion: true                                                                                                                                                                                                                 
    controller: true                                                                                                                                                                                                                         
    kind: DaemonSet                                                                                                                                                                                                                          
    name: controller-manager                                                                                                                                                                                                                 
    uid: 44e758d3-5cc6-11e8-9299-0e9a287ac26e                                                                                                                                                                                                
  resourceVersion: "9447"                                                                                                                                                                                                                    
  selfLink: /api/v1/namespaces/kube-service-catalog/pods/controller-manager-6fj6c                                                                                                                                                            
  uid: 44ed3929-5cc6-11e8-9299-0e9a287ac26e                                                                                                                                                                                                  
spec:                                                                                                                                                                                                                                        
  containers:                                                                                                                                                                                                                                
  - args:
    - controller-manager
    - --port
    - "8080"
    - -v
    - "3"
    - --leader-election-namespace
    - kube-service-catalog
    - --leader-elect-resource-lock
    - configmaps
    - --broker-relist-interval
    - 5m
    - --feature-gates
    - OriginatingIdentity=true
    command:
    - /usr/bin/service-catalog
    env:
    - name: K8S_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.9.29
    imagePullPolicy: IfNotPresent
    name: controller-manager
    ports:
    - containerPort: 8080
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000090000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/kubernetes-service-catalog
      name: service-catalog-ssl
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: service-catalog-controller-token-bwkp6
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: service-catalog-controller-dockercfg-tfxjp
  nodeName: ip-172-18-12-195.ec2.internal
  nodeSelector:
    node-role.kubernetes.io/master: "true"
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000090000
    seLinuxOptions:
      level: s0:c10,c0
  serviceAccount: service-catalog-controller
  serviceAccountName: service-catalog-controller
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: service-catalog-ssl
    secret:
      defaultMode: 420
      items:
      - key: tls.crt
        path: apiserver.crt
      secretName: apiserver-ssl
  - name: service-catalog-controller-token-bwkp6
    secret:
      defaultMode: 420
      secretName: service-catalog-controller-token-bwkp6
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-05-21T07:12:06Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-05-21T07:55:32Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2018-05-21T07:12:14Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://7444d9002f61620a2f27cac3621e752a400a40d14a47451b1251e59931e28d6e
    image: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.9.29
    imageID: docker-pullable://registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog@sha256:0b5ad372b0e0e63d94d087311aaabb9af21a090b6487d9e820a4de9df9c85b83
    lastState:
      terminated:
        containerID: docker://18ca99cb95325c39938ca22e8325e24f6f8b7e548153d372774fef87cc31ac37
        exitCode: 255
        finishedAt: 2018-05-21T07:54:37Z
        reason: Error
        startedAt: 2018-05-21T07:53:47Z
    name: controller-manager
    ready: true
    restartCount: 5
    state:
      running:
        startedAt: 2018-05-21T07:55:31Z
  hostIP: 172.18.12.195
  phase: Running
  podIP: 10.2.0.11
  qosClass: BestEffort
  startTime: 2018-05-21T07:12:06Z





$ docker images
REPOSITORY                                                                  TAG                 IMAGE ID            CREATED             SIZE                                                                                                 
registry.reg-aws.openshift.com:443/openshift3/registry-console              v3.9                34d5fa3a2e44        4 days ago          244 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose-node                      v3.10               2476ee92df86        5 days ago          1.2 GB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose-control-plane             v3.10               a752326cdcd8        5 days ago          633 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.10.0-0.47.0      0ae5c896b92c        5 days ago          214 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/openvswitch                   v3.9.29             7e1947e66b50        12 days ago         1.48 GB                                                                                              
registry.reg-aws.openshift.com:443/openshift3/node                          v3.9.29             5b0cba6e43a1        12 days ago         1.46 GB                                                                                              
registry.reg-aws.openshift.com:443/openshift3/ose-deployer                  v3.9.29             17067913d670        12 days ago         1.23 GB                                                                                              
registry.reg-aws.openshift.com:443/openshift3/ose                           v3.9                a5cc46c3354e        12 days ago         1.23 GB                                                                                              
registry.reg-aws.openshift.com:443/openshift3/ose                           v3.9.29             a5cc46c3354e        12 days ago         1.23 GB                                                                                              
registry.reg-aws.openshift.com:443/openshift3/ose-web-console               v3.9.29             ae9986489fc7        12 days ago         466 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.9.29             490f82693c58        12 days ago         214 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose-template-service-broker   v3.9.29             e017c1d36a84        12 days ago         299 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog           v3.9.29             6800c342ce7e        12 days ago         288 MB                                                                                               
registry.access.redhat.com/rhel7/etcd                                       latest              924412659272        12 days ago         256 MB                                                                                               
registry.access.redhat.com/rhel7/etcd                                       3.2.15              4f35b6516d22        6 weeks ago         256 MB                                                                                               
registry.reg-aws.openshift.com:443/openshift3/ose                           v3.10               a5bb4dd94b69        7 weeks ago         1.59 GB

Comment 8 Jay Boyd 2018-05-21 15:12:19 UTC
I added a new label to the daemonset for service-catalog controller, we should see the catalog controller pod restarted, it is not happening.

It looks like the kube controller pod is in trouble, its in crash loop backoff:

kube-system   master-controllers-ip-172-18-12-195.ec2.internal   0/1   CrashLoopBackOff   81    7h


tail of the master-controllers pod:


I0521 14:49:15.867483       1 aws.go:1026] Building AWS cloudprovider
I0521 14:49:15.867540       1 regions.go:74] found AWS region "us-east-1"
I0521 14:49:15.867557       1 aws_credentials.go:103] registering credentials provider for AWS region "us-east-1"
I0521 14:49:15.867575       1 plugins.go:41] Registered credential provider "aws-ecr-us-east-1"
I0521 14:49:15.870583       1 log_handler.go:32] AWS API Send: ec2metadata GetMetadata &{GetMetadata GET /meta-data/instance-id <nil> <nil>} <nil>
I0521 14:49:15.870615       1 log_handler.go:37] AWS API ValidateResponse: ec2metadata GetMetadata &{GetMetadata GET /meta-data/instance-id <nil> <nil>} <nil> 200 OK
I0521 14:49:15.870782       1 log_handler.go:27] AWS request: ec2 DescribeInstances
I0521 14:49:15.966435       1 log_handler.go:32] AWS API Send: ec2 DescribeInstances &{DescribeInstances POST / 0xc4211ff040 <nil>} {
  InstanceIds: ["i-07bf0c9c6b2f7f248"]
}
I0521 14:49:15.966504       1 log_handler.go:37] AWS API ValidateResponse: ec2 DescribeInstances &{DescribeInstances POST / 0xc4211ff040 <nil>} {
  InstanceIds: ["i-07bf0c9c6b2f7f248"]
} 401 Unauthorized
F0521 14:49:15.966744       1 controllermanager.go:166] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": error finding instance i-07bf0c9c6b2f7f248: "error listing AWS instances: \"AuthFailure: AWS was not able to validate the provided access credentials\\n\\tstatus code: 401, request id: 1b9749b1-0985-4297-985b-0a59125f3678\""

Comment 9 Jay Boyd 2018-05-21 18:13:49 UTC
sdodson reviewed, found that the password in the 3.10 configuration file /etc/origin/master/master.env was different from what was used in 3.9 (and located in /etc/sysconfig/atomic-openshift-master-controllers).  He updated master.env and restarted the controllers and got past the AuthFailure issue.

Next blocking problem is this in the controller-manager pod:

I0521 16:48:22.319671       1 request.go:1099] body was not decodable (unable to check for Status): Object 'Kind' is missing in 'Error: 'x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "service-catalog-signer")'
Trying to reach: 'https://172.31.138.133:443/apis/servicecatalog.k8s.io/v1beta1?timeout=32s''
F0521 16:48:22.319888       1 controller_manager.go:194] Error starting "openshift.io/cluster-quota-reconciliation" (failed to discover resources: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the request)

similar errors are being logged in the Service Catalog API Server pod indicating it doesn't recognize the service-catalog-signer cert.  This is a new cert created during the upgrade, the catalog pods should be restarted by the master controller, but we appear to be in a bit of a deadlock/race condition.

I deleted the catalog api server pod and master controller started successfully and then restarted the Catalog's api server and controller manager pods.

I can see in the Catalog api server and Catalog controller manager that both pods are showing Service Catalog version v3.10.0-0.47.0 (built 2018-05-16T01:36:14Z).  Catalog has been upgraded successfully.

I don't believe this is normally a problem - these pods are usually restarted and pick up the new certs properly.

Comment 10 Jay Boyd 2018-05-22 14:03:41 UTC
I discussed with sdodson why this deployment has this issue with wrong credentials in the master.env.  His reply:  I'm not sure, I was very surprised to see different credentials between the two files. If anything I was expecting to see them present in /etc/sysconfig/atomic-openshift-master-controllers but to be absent in /etc/origin/master/master.env. We've got a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1571608) where we're going to copy the old file to the new file during upgrade which may make this go away.

*** This bug has been marked as a duplicate of bug 1571608 ***