Description of problem: The current version of the Ansible playbook which is used to redeploy certificates (redeploy-certificates.yml) is not complete as it needs additional steps for the router and registry certificates. This is described in this solution: https://access.redhat.com/solutions/2796981 (Point 5) Version-Release number of selected component (if applicable): 3.2 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Test this bug with openshift-ansible-3.2.55-1.git.0.5feab7c.el7.noarch Now the cert redeploy playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml also includes redeploying registry cert and router cert playbooks. 1. For registry certificates redeployment playbook, it works well against an ocp-3.2 cluster when it have "registry-certificates" secret for docker-registry. After cert redeployment, a new set of registry.crt/key generated under /etc/origin/master, "registry-certificates" secret updated with the new cert files. docker-registry could be redeployed successfully and sti-build test passed. 2. For router certificates redeployment playbook, it didn't regenerate "router-certs" secret, then router pod was always in ContainerCreating due to secrets "router-certs" not found. The ansible log shows no problem when running router cert redeployment: TASK [Update router environment variables] ************************************* skipping: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true} TASK [Delete existing router certificate secret] ******************************* changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "delete", "secret/router-certs", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.503799", "end": "2017-04-14 01:49:38.548672", "rc": 0, "start": "2017-04-14 01:49:38.044873", "stderr": "", "stdout": "secret \"router-certs\" deleted", "stdout_lines": ["secret \"router-certs\" deleted"], "warnings": []} TASK [Remove router service annotations] *************************************** changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "annotate", "service/router", "service.alpha.openshift.io/serving-cert-secret-name-", "service.alpha.openshift.io/serving-cert-signed-by-", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.497830", "end": "2017-04-14 01:49:40.743203", "rc": 0, "start": "2017-04-14 01:49:40.245373", "stderr": "", "stdout": "service \"router\" annotated", "stdout_lines": ["service \"router\" annotated"], "warnings": []} TASK [Add serving-cert-secret annotation to router service] ******************** changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "annotate", "service/router", "service.alpha.openshift.io/serving-cert-secret-name=router-certs", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.517662", "end": "2017-04-14 01:49:42.896729", "rc": 0, "start": "2017-04-14 01:49:42.379067", "stderr": "", "stdout": "service \"router\" annotated", "stdout_lines": ["service \"router\" annotated"], "warnings": []} TASK [Redeploy router] ********************************************************* changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "deploy", "dc/router", "--latest", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.512418", "end": "2017-04-14 01:49:45.097579", "rc": 0, "start": "2017-04-14 01:49:44.585161", "stderr": "", "stdout": "Started deployment #3", "stdout_lines": ["Started deployment #3"], "warnings": []} ... But actually during step "Add serving-cert-secret annotation to router service", it didn't regenerate "router-certs" secret, here's a manual try: 1). After installation, check router pod status and router-certs secret [root@ip-172-18-9-176 ~]# oc get pod |grep router router-1-0hknf 1/1 Running 0 1m [root@ip-172-18-9-176 ~]# oc get secret|grep router-certs router-certs kubernetes.io/tls 2 1m 2). Delete existing router certificate secret [root@ip-172-18-9-176 ~]# oc delete secret router-certs secret "router-certs" deleted [root@ip-172-18-9-176 ~]# oc get secret|grep router-certs [root@ip-172-18-9-176 ~]# 3). Remove router service annotations [root@ip-172-18-9-176 ~]# oc annotate service router \ > service.alpha.openshift.io/serving-cert-secret-name- \ > service.alpha.openshift.io/serving-cert-signed-by- service "router" annotated 4). Add serving-cert-secret annotation to router service [root@ip-172-18-9-176 ~]# oc annotate service router \ > service.alpha.openshift.io/serving-cert-secret-name=router-certs service "router" annotated [root@ip-172-18-9-176 ~]# oc get secret|grep router-certs [root@ip-172-18-9-176 ~]# 5). Redeploy router [root@ip-172-18-9-176 ~]# oc deploy dc/router --latest Started deployment #2 [root@ip-172-18-9-176 ~]# oc get pod NAME READY STATUS RESTARTS AGE router-2-qbopc 0/1 ContainerCreating 0 1m [root@ip-172-18-9-176 ~]# oc describe pod router-2-qbopc Name: router-2-qbopc Namespace: default Node: ip-172-18-3-88.ec2.internal/172.18.3.88 Start Time: Fri, 14 Apr 2017 02:27:30 -0400 Labels: deployment=router-2,deploymentconfig=router,router=router Status: Pending IP: 172.18.3.88 Controllers: ReplicationController/router-2 Containers: router: Container ID: Image: x.com/openshift3/ose-haproxy-router:v3.2.1.31 Image ID: Ports: 80/TCP, 443/TCP, 1936/TCP QoS Tier: memory: BestEffort cpu: BestEffort State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment Variables: DEFAULT_CERTIFICATE_PATH: /etc/pki/tls/private/tls.crt ROUTER_EXTERNAL_HOST_HOSTNAME: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER: ROUTER_EXTERNAL_HOST_HTTP_VSERVER: ROUTER_EXTERNAL_HOST_INSECURE: false ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: router ROUTER_SERVICE_NAMESPACE: default ROUTER_SUBDOMAIN: STATS_PASSWORD: paYRXO8NPM STATS_PORT: 1936 STATS_USERNAME: admin Conditions: Type Status Ready False Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: router-certs router-token-bjbv6: Type: Secret (a volume populated by a Secret) SecretName: router-token-bjbv6 Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned router-2-qbopc to ip-172-18-3-88.ec2.internal 1m 5s 7 {kubelet ip-172-18-3-88.ec2.internal} Warning FailedMount Unable to mount volumes for pod "router-2-qbopc_default(6f960196-20db-11e7-b29a-0e2a308162cc)": secrets "router-certs" not found 1m 5s 7 {kubelet ip-172-18-3-88.ec2.internal} Warning FailedSync Error syncing pod, skipping: secrets "router-certs" not found [root@ip-172-18-9-176 ~]# oc get secret|grep router-certs [root@ip-172-18-9-176 ~]#
How was this cluster prepared? I don't see a router-certs secret specified in the router deploymentConfig when installing a 3.2 cluster with openshift-ansible. # oc get dc/router -o jsonpath='{.spec.template.spec.volumes}' [] Steps which remove and add the service serving certificate secret annotation will only run when the secret is specified in the router deploymentConfig. If there are no secrets or environment variables then the router will just be redeployed. TASK [Update router environment variables] ************************************* skipping: [master1.abutcher.com] TASK [Delete existing router certificate secret] ******************************* skipping: [master1.abutcher.com] TASK [Remove router service annotations] *************************************** skipping: [master1.abutcher.com] TASK [Add serving-cert-secret annotation to router service] ******************** skipping: [master1.abutcher.com] TASK [Redeploy router] ********************************************************* changed: [master1.abutcher.com]
@Andrew, I checked the previous installation log, I should have openshift_hosted_router_certificate specified in ansible inventory. openshift_hosted_router_certificate={"certfile": "/files/router_1.crt", "keyfile": "/files/router_1.key","cafile": "/files/router_1_rootca.crt"}
@Gaoyun, the redeploy playbooks were not taking custom router certificates into account and this problem exists in all versions of the installer. I've created https://bugzilla.redhat.com/show_bug.cgi?id=1446737 for 3.5 and cloned for other versions. 3.4 https://bugzilla.redhat.com/show_bug.cgi?id=1446745 3.3 https://bugzilla.redhat.com/show_bug.cgi?id=1446745 Proposed fix for 3.2: https://github.com/openshift/openshift-ansible/pull/4043
Verify this bug with openshift-ansible-3.2.56-1.git.0.b844ab7.el7.noarch When custom router certificate provided during install via openshift_hosted_router_certificate, run redeploy cert playbook against the cluster, custom router cert would be retained and router pod was running well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1244