Bug 1635613 - oc_adm_router doesn't create router-metrics-tls secret
Summary: oc_adm_router doesn't create router-metrics-tls secret
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 3.10.0
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
: 3.11.z
Assignee: Miciah Dashiel Butler Masters
QA Contact: Hongan Li
URL:
Whiteboard:
: 1671626 (view as bug list)
Depends On:
Blocks: 1672454
TreeView+ depends on / blocked
 
Reported: 2018-10-03 11:42 UTC by Juan Luis de Sousa-Valadas
Modified: 2019-05-27 14:38 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: If a router had previously been deployed with an older version of openshift-ansible, its service could be missing the service.alpha.openshift.io/serving-cert-secret-name annotation. openshift-ansible did not add the missing annotation. Consequence: The service serving cert controller was not creating the router-metrics-tls secret, and as a result, the newly deployed router would fail to start. Fix: openshift-ansible was changed to update any existing router service to have the needed annotation so that the service serving cert controller will create the router-metrics-tls secret. Result: openshift-ansible can now deploy a functioning router even if an old router service that is missing the annotation exists.
Clone Of:
: 1672454 (view as bug list)
Environment:
Last Closed: 2019-02-20 14:11:01 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0326 None None None 2019-02-20 14:11:07 UTC
Github openshift openshift-ansible pull 11100 None None None 2019-01-30 19:49:49 UTC
Red Hat Bugzilla 1672011 None CLOSED "redeploy-router-certificates.yml" makes changes to wrong "service serving certificate secrets" annotation 2019-09-05 05:37:37 UTC

Description Juan Luis de Sousa-Valadas 2018-10-03 11:42:30 UTC
Description of problem:

When a customer openshift_hosted_routers and deploys the routers using:
ansible-playbook -i <ansible inventory> \
/usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/deploy_router.yml

The routers won't work because the secret router-metrics-tls secret is missing. If openshift_hosted_routers is not defined oc_adm_router works as expected creating the secret.

I have tried reproducing using the exact same configuration for openshift_hosted_routers

Version-Release number of selected component (if applicable):
openshift-ansible-3.10.47-1.git.0.95bc2d2.el7_5.noarch

How reproducible:
Customer always faces this issue, I'm unable to reproduce with the same values 

Steps to Reproduce:
1.ansible-playbook -i <ansible inventory> /usr/share/ansible/openshift-ansible/playbooks/openshift-hosted/deploy_router.yml

Actual results:
router-metrics-tls is absent and therefore fails to start the router

Expected results:
OpenShift router works as expected

Additional info:
Workaround: 
1. Deploy the routers without custom openshift_hosted_routers
2. oc delete dc router -n default
3. Deploy the routers with custom openshift_hosted_routers

Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 7 Scott Dodson 2018-10-18 13:49:23 UTC
Juan,

Can you confirm whether in your scenario this is happening during a clean install or was this happening in an upgraded environment?

Comment 8 Juan Luis de Sousa-Valadas 2018-10-22 07:21:10 UTC
Scott,
It was in an upgraded environment, but I had deleted manually all the router components.

Comment 13 Rune Henriksen 2018-11-09 15:53:46 UTC
I had this exact issue when deleting my routers and redeploying from the 3.10.45 playbook.

Editing the router deployment config, using oc, to not mount "router-metrics-tls" and then removing both environment variables concerning metrics TLS will "fix" this issue. 

I hope this gets some attention soon because it's concerning that Red Hat ships playbooks that literally do not work in their enterprise products.

Comment 14 Dan Mace 2018-11-15 17:32:50 UTC
The `router-metrics-tls` secret is provided by the serving cert signer component, which generates certificate secrets based on annotated services. To help us narrow down the issue, please reproduce the problem and then provide the output of the following command:

  $ oc get -n default services -o yaml

What we expect is a service with the following annotation:

  service.alpha.openshift.io/serving-cert-secret-name: router-metrics-tls

Normally, the annotated service is created by the `oc adm router` command. The presence of that annotated service is what causes the service cert signer component to generate the `router-metrics-tls` secret for use by the router deployment.

We can continue diagnosing once we have the output of the `oc` command I listed.

Thanks!

Comment 16 steffen.seckler 2018-12-20 09:28:57 UTC
Hi,
can confirm this issue on an openshift origin version updated from v3.9 to v3.11.
The mentioned secret is not show. Possibly this service should have been added by the ansible-playbook upgrade.yaml, but was forgotten?


> oc get -n default services -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: 2018-07-11T12:19:48Z
    labels:
      docker-registry: default
    name: docker-registry
    namespace: default
    resourceVersion: "30194106"
    selfLink: /api/v1/namespaces/default/services/docker-registry
    uid: b47a8907-8504-11e8-b082-5cf3fce5f1c8
  spec:
    clusterIP: 172.30.76.0
    ports:
    - name: 5000-tcp
      port: 5000
      protocol: TCP
      targetPort: 5000
    selector:
      docker-registry: default
    sessionAffinity: ClientIP
    sessionAffinityConfig:
      clientIP:
        timeoutSeconds: 10800
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: 2018-07-11T12:10:31Z
    labels:
      component: apiserver
      provider: kubernetes
    name: kubernetes
    namespace: default
    resourceVersion: "40893"
    selfLink: /api/v1/namespaces/default/services/kubernetes
    uid: 684f7838-8503-11e8-aa0c-5cf3fce5f1c8
  spec:
    clusterIP: 172.30.0.1
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 8443
    - name: dns
      port: 53
      protocol: UDP
      targetPort: 8053
    - name: dns-tcp
      port: 53
      protocol: TCP
      targetPort: 8053
    sessionAffinity: ClientIP
    sessionAffinityConfig:
      clientIP:
        timeoutSeconds: 10800
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"createdBy":"registry-console-template","name":"registry-console"},"name":"registry-console","namespace":"default"},"spec":{"ports":[{"name":"registry-console","port":9000,"protocol":"TCP","targetPort":9090}],"selector":{"name":"registry-console"},"type":"ClusterIP"}}
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: 2018-07-11T12:52:31Z
    labels:
      app: registry-console
      createdBy: registry-console-template
      name: registry-console
    name: registry-console
    namespace: default
    resourceVersion: "30054148"
    selfLink: /api/v1/namespaces/default/services/registry-console
    uid: 464c8afa-8509-11e8-b082-5cf3fce5f1c8
  spec:
    clusterIP: 172.30.165.61
    ports:
    - name: registry-console
      port: 9000
      protocol: TCP
      targetPort: 9090
    selector:
      name: registry-console
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      prometheus.io/port: "1936"
      prometheus.io/scrape: "true"
      prometheus.openshift.io/password: <deleted>
      prometheus.openshift.io/username: <deleted>
    creationTimestamp: 2018-07-11T12:19:39Z
    labels:
      router: router
    name: router
    namespace: default
    resourceVersion: "42089"
    selfLink: /api/v1/namespaces/default/services/router
    uid: aea585f0-8504-11e8-b082-5cf3fce5f1c8
  spec:
    clusterIP: 172.30.77.211
    ports:
    - name: 80-tcp
      port: 80
      protocol: TCP
      targetPort: 80
    - name: 443-tcp
      port: 443
      protocol: TCP
      targetPort: 443
    - name: 1936-tcp
      port: 1936
      protocol: TCP
      targetPort: 1936
    selector:
      router: router
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

```

Comment 17 Dan Mace 2019-01-14 18:14:56 UTC
Does manually adding the annotation `service.alpha.openshift.io/serving-cert-secret-name: router-metrics-tls` to the `router` service in the `default` namespace work around the issue?

Comment 18 Miciah Dashiel Butler Masters 2019-01-30 19:49:50 UTC
PR: https://github.com/openshift/openshift-ansible/pull/11100

Comment 19 Dan Mace 2019-02-01 13:36:36 UTC
*** Bug 1671626 has been marked as a duplicate of this bug. ***

Comment 24 Miciah Dashiel Butler Masters 2019-02-05 00:56:57 UTC
3.10 backport: https://bugzilla.redhat.com/show_bug.cgi?id=1672454

Comment 28 Hongan Li 2019-02-12 07:22:23 UTC
verified with openshift-ansible-3.11.82-1.git.0.f29227a.el7 and issue has been fixed, the router can be deployed by ansible playbook.

Comment 31 errata-xmlrpc 2019-02-20 14:11:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326


Note You need to log in before you can comment on or make changes to this bug.