Bug 1418032 - [3.2] Update router and registry certificates in the redeploy-certificates.yml
Summary: [3.2] Update router and registry certificates in the redeploy-certificates.yml
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Andrew Butcher
QA Contact: Gaoyun Pei
Depends On:
TreeView+ depends on / blocked
Reported: 2017-01-31 16:27 UTC by Francesco Marchioni
Modified: 2017-05-17 17:38 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2017-05-17 17:38:33 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:1244 0 normal SHIPPED_LIVE Important: ansible and openshift-ansible security and bug fix update 2017-05-25 21:43:49 UTC

Description Francesco Marchioni 2017-01-31 16:27:03 UTC
Description of problem:
The current version of the Ansible playbook which is used to redeploy certificates (redeploy-certificates.yml) is not complete as it needs additional steps for the router and registry certificates. This is described in this solution: https://access.redhat.com/solutions/2796981 (Point 5)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 3 Gaoyun Pei 2017-04-14 07:00:21 UTC
Test this bug with openshift-ansible-3.2.55-1.git.0.5feab7c.el7.noarch

Now the cert redeploy playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml also includes redeploying registry cert and router cert playbooks. 

1. For registry certificates redeployment playbook, it works well against an ocp-3.2 cluster when it have "registry-certificates" secret for docker-registry. 
After cert redeployment, a new set of registry.crt/key generated under /etc/origin/master, "registry-certificates" secret updated with the new cert files. docker-registry could be redeployed successfully and sti-build test passed.

2. For router certificates redeployment playbook, it didn't regenerate "router-certs" secret, then router pod was always in ContainerCreating due to secrets "router-certs" not found. 

The ansible log shows no problem when running router cert redeployment:

TASK [Update router environment variables] *************************************
skipping: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [Delete existing router certificate secret] *******************************
changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "delete", "secret/router-certs", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.503799", "end": "2017-04-14 01:49:38.548672", "rc": 0, "start": "2017-04-14 01:49:38.044873", "stderr": "", "stdout": "secret \"router-certs\" deleted", "stdout_lines": ["secret \"router-certs\" deleted"], "warnings": []}

TASK [Remove router service annotations] ***************************************
changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "annotate", "service/router", "service.alpha.openshift.io/serving-cert-secret-name-", "service.alpha.openshift.io/serving-cert-signed-by-", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.497830", "end": "2017-04-14 01:49:40.743203", "rc": 0, "start": "2017-04-14 01:49:40.245373", "stderr": "", "stdout": "service \"router\" annotated", "stdout_lines": ["service \"router\" annotated"], "warnings": []}

TASK [Add serving-cert-secret annotation to router service] ********************
changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "annotate", "service/router", "service.alpha.openshift.io/serving-cert-secret-name=router-certs", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.517662", "end": "2017-04-14 01:49:42.896729", "rc": 0, "start": "2017-04-14 01:49:42.379067", "stderr": "", "stdout": "service \"router\" annotated", "stdout_lines": ["service \"router\" annotated"], "warnings": []}

TASK [Redeploy router] *********************************************************
changed: [ec2-54-146-165-55.compute-1.amazonaws.com] => {"changed": true, "cmd": ["oc", "deploy", "dc/router", "--latest", "--config=/tmp/openshift-ansible-dqqYTg/admin.kubeconfig", "-n", "default"], "delta": "0:00:00.512418", "end": "2017-04-14 01:49:45.097579", "rc": 0, "start": "2017-04-14 01:49:44.585161", "stderr": "", "stdout": "Started deployment #3", "stdout_lines": ["Started deployment #3"], "warnings": []}

But actually during step "Add serving-cert-secret annotation to router service", it didn't regenerate "router-certs" secret, here's a manual try:

1). After installation, check router pod status and router-certs secret
[root@ip-172-18-9-176 ~]# oc get pod |grep router
router-1-0hknf   1/1       Running   0          1m
[root@ip-172-18-9-176 ~]# oc get secret|grep router-certs
router-certs               kubernetes.io/tls                     2         1m

2). Delete existing router certificate secret
[root@ip-172-18-9-176 ~]# oc delete secret router-certs
secret "router-certs" deleted
[root@ip-172-18-9-176 ~]# oc get secret|grep router-certs
[root@ip-172-18-9-176 ~]# 

3). Remove router service annotations
[root@ip-172-18-9-176 ~]# oc annotate service router \
>     service.alpha.openshift.io/serving-cert-secret-name- \
>     service.alpha.openshift.io/serving-cert-signed-by-
service "router" annotated

4). Add serving-cert-secret annotation to router service
[root@ip-172-18-9-176 ~]# oc annotate service router \
>     service.alpha.openshift.io/serving-cert-secret-name=router-certs
service "router" annotated
[root@ip-172-18-9-176 ~]# oc get secret|grep router-certs
[root@ip-172-18-9-176 ~]# 

5). Redeploy router
[root@ip-172-18-9-176 ~]# oc deploy dc/router --latest
Started deployment #2
[root@ip-172-18-9-176 ~]# oc get pod
NAME             READY     STATUS              RESTARTS   AGE
router-2-qbopc   0/1       ContainerCreating   0          1m
[root@ip-172-18-9-176 ~]# oc describe pod router-2-qbopc
Name:		router-2-qbopc
Namespace:	default
Node:		ip-172-18-3-88.ec2.internal/
Start Time:	Fri, 14 Apr 2017 02:27:30 -0400
Labels:		deployment=router-2,deploymentconfig=router,router=router
Status:		Pending
Controllers:	ReplicationController/router-2
    Container ID:	
    Image:		x.com/openshift3/ose-haproxy-router:v3.2.1.31
    Image ID:		
    Ports:		80/TCP, 443/TCP, 1936/TCP
    QoS Tier:
      memory:		BestEffort
      cpu:		BestEffort
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Liveness:		http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:		http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment Variables:
      DEFAULT_CERTIFICATE_PATH:			/etc/pki/tls/private/tls.crt
      ROUTER_EXTERNAL_HOST_PRIVKEY:		/etc/secret-volume/router.pem
      ROUTER_SERVICE_NAME:			router
      STATS_PORT:				1936
      STATS_USERNAME:				admin
  Type		Status
  Ready 	False 
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-certs
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-token-bjbv6
  FirstSeen	LastSeen	Count	From					SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----					-------------	--------	------		-------
  1m		1m		1	{default-scheduler }					Normal		Scheduled	Successfully assigned router-2-qbopc to ip-172-18-3-88.ec2.internal
  1m		5s		7	{kubelet ip-172-18-3-88.ec2.internal}			Warning		FailedMount	Unable to mount volumes for pod "router-2-qbopc_default(6f960196-20db-11e7-b29a-0e2a308162cc)": secrets "router-certs" not found
  1m		5s		7	{kubelet ip-172-18-3-88.ec2.internal}			Warning		FailedSync	Error syncing pod, skipping: secrets "router-certs" not found
[root@ip-172-18-9-176 ~]# oc get secret|grep router-certs
[root@ip-172-18-9-176 ~]#

Comment 4 Andrew Butcher 2017-04-17 17:08:23 UTC
How was this cluster prepared? I don't see a router-certs secret specified in the router deploymentConfig when installing a 3.2 cluster with openshift-ansible.

# oc get dc/router -o jsonpath='{.spec.template.spec.volumes}'

Steps which remove and add the service serving certificate secret annotation will only run when the secret is specified in the router deploymentConfig. If there are no secrets or environment variables then the router will just be redeployed.

TASK [Update router environment variables] *************************************
skipping: [master1.abutcher.com]

TASK [Delete existing router certificate secret] *******************************
skipping: [master1.abutcher.com]

TASK [Remove router service annotations] ***************************************
skipping: [master1.abutcher.com]

TASK [Add serving-cert-secret annotation to router service] ********************
skipping: [master1.abutcher.com]

TASK [Redeploy router] *********************************************************
changed: [master1.abutcher.com]

Comment 5 Gaoyun Pei 2017-04-18 02:27:42 UTC
@Andrew, I checked the previous installation log, I should have  openshift_hosted_router_certificate specified in ansible inventory.

openshift_hosted_router_certificate={"certfile": "/files/router_1.crt", "keyfile": "/files/router_1.key","cafile": "/files/router_1_rootca.crt"}

Comment 7 Andrew Butcher 2017-04-28 18:06:49 UTC
@Gaoyun, the redeploy playbooks were not taking custom router certificates into account and this problem exists in all versions of the installer.

I've created https://bugzilla.redhat.com/show_bug.cgi?id=1446737 for 3.5 and cloned for other versions.

3.4 https://bugzilla.redhat.com/show_bug.cgi?id=1446745
3.3 https://bugzilla.redhat.com/show_bug.cgi?id=1446745

Proposed fix for 3.2: https://github.com/openshift/openshift-ansible/pull/4043

Comment 9 Gaoyun Pei 2017-05-04 06:29:39 UTC
Verify this bug with openshift-ansible-3.2.56-1.git.0.b844ab7.el7.noarch

When custom router certificate provided during install via openshift_hosted_router_certificate, run redeploy cert playbook against the cluster, custom router cert would be retained and router pod was running well.

Comment 11 errata-xmlrpc 2017-05-17 17:38:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.