Bug 1397346 - [3.3] Deploy of custom router fails on "list of unattached/unmounted volumes=[server-certificate]"
Summary: [3.3] Deploy of custom router fails on "list of unattached/unmounted volumes=...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.3.1
Assignee: Andrew Butcher
QA Contact: liujia
URL:
Whiteboard:
: 1410757 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-22 10:57 UTC by Vladislav Walek
Modified: 2020-08-13 08:42 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
It was a typo in the master-config.yaml. No doc update needed.
Clone Of:
Environment:
Last Closed: 2017-06-12 15:40:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1429 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-06-12 19:40:05 UTC

Description Vladislav Walek 2016-11-22 10:57:27 UTC
Description of problem:

Hi,
while running the custom router with command:

oc adm router int-router --stats-port=1940 --ports=12080:12080,12443:12443 --service-account=router --host-network=true --host-ports=true --replicas=3 --labels='router=internal' -n default

It will create in dc following entry:

        volumeMounts:
        - mountPath: /etc/pki/tls/private
          name: server-certificate
          readOnly: true

Unfortunately, the volume is not mounted and the deployment will fail on:

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "int-router-1-9pr9l"/"default". list of unattached/unmounted volumes=[server-certificate]

In normal router, the volume is not there.

Version-Release number of selected component (if applicable):

OpenShift Conatiner Platfrom 3.3.1

How reproducible:

Start new deployment of router in default and check the dc (compare it to normal router):
oc adm router int-router --stats-port=1940 --ports=12080:12080,12443:12443 --service-account=router --host-network=true --host-ports=true --replicas=3 --labels='router=internal' -n default

Actual results:

the deployment fails

Expected results:

what is exactly the volume for, if it is needed. The workaround is to remove the volume entry from dc, then the deployment works.

Additional info:

Comment 1 Ben Bennett 2016-11-22 13:56:31 UTC
The mount should be there.  In 3.3 we changed the router so that it gets a unique default certificate (if one is not provided) from a service.

Please look at the service definition associated with that router and make sure that it has annotations like:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: router-certs
    service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1478546568


Then do 'oc get secrets' and check that one with the name referenced by the service.alpha.openshift.io/serving-cert-secret-name annotation exists.  In my case it is named 'router-certs' and is there.

If it was not created, please check that the master-config.yaml has:

controllerConfig:
  serviceServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key

If it doesn't, then you can add that and restart the master.  How did you perform the upgrade?

Comment 2 Vladislav Walek 2016-11-24 07:37:05 UTC
Hello, this is comment from customer:

1. we made an automated in-place upgrade as documented here: https://docs.openshift.com/container-platform/3.3/install_config/upgrading/automated_upgrades.html
2. The following is present in master-config.yaml:
controllerConfig:
  serviceServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key
3. The service looks as follows:
{
    "kind": "Service",
    "apiVersion": "v1",
    "metadata": {
        "name": "tools-prod-internal",
        "namespace": "openpaas-router-ingress-internal",
        ...
        "labels": {
            "router": "tools-prod-internal"
        },
        "annotations": {
            "service.alpha.openshift.io/serving-cert-secret-name": "tools-prod-internal-certs"
        }
    },
    "spec": {
        "ports": [
-> The second annotation service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1478546568 is missing
4. -> A secret with the name tools-prod-internal-certs is missing


What is this new feature used for? Any documentation?

Comment 3 Phil Cameron 2016-11-28 15:17:56 UTC
vwalek The router must always have a default cert. When one is not provided by the user in "oadm router --default-cert=..." one is automatically provided through a annotation in the router's service.

Comment 4 Ben Bennett 2016-11-28 15:53:32 UTC
Vladislav: Does the service-signer.crt and service-signer.key exist in the same directory as master-config.yaml?

Is there anything in the openshift logs about those files?

Comment 7 Ben Bennett 2016-12-01 15:24:33 UTC
Vladislav: Since those files are there, can you look at the log messages from the master to see if there's anything funny:
  journalctl -lu atomic-openshift-master | grep -i sign

Comment 8 Vladislav Walek 2016-12-06 09:17:51 UTC
(In reply to Ben Bennett from comment #7)
> Vladislav: Since those files are there, can you look at the log messages
> from the master to see if there's anything funny:
>   journalctl -lu atomic-openshift-master | grep -i sign

Hello Ben, customer checked, nothing in the logs.

Comment 12 Ben Bennett 2016-12-06 18:57:23 UTC
This seems to be a problem with the secret generator... passing it off to the Kubernetes team.

Comment 13 Takayoshi Kimura 2016-12-13 06:57:08 UTC
I also got this issue when trying out router sharding. A named router pod stuck at ContainerCreating state for missing cert secret.

Comment 19 Jaspreet Kaur 2017-01-11 06:33:58 UTC
*** Bug 1410757 has been marked as a duplicate of this bug. ***

Comment 26 Michal Fojtik 2017-02-08 12:01:05 UTC
Just FYI, we just fixed a bug where if you delete the secret that was automatically created when you annotate the svc manually, it won't be generated again.

I can confirm that the secret is automatically created after annotation.

Comment 54 Vladislav Walek 2017-03-13 08:24:51 UTC
Hello Maciej,
here is reply from customer:
----------
Hello Vladislav

I just tried to create a router again in our 3.4.1.7 maint environment, then checked the logs on the 3 masters:

journalctl -ru atomic-openshift-master-api | grep 'service serving cert controller failed'
journalctl -ru atomic-openshift-master-controllers | grep 'service serving cert controller failed'

-> no result
----------

So if the controller is not failing, where else the error could occur?
Thank you

Comment 86 Vladislav Walek 2017-05-15 09:21:29 UTC
Hello,

as customer reported, the issue is withing the upgrading from 3.2 to 3.3:

https://github.com/openshift/openshift-ansible/search?utf8=%E2%9C%93&q=servicesServingCert&type=

Thx

Comment 87 Maciej Szulik 2017-05-15 09:46:09 UTC
Based on the information from the customer I'm moving this to on-qa. The error was misspelled option in master-config.yaml: servicesServingCert vs serviceServingCert.

Comment 88 Anping Li 2017-05-16 01:40:39 UTC
@scott, This be pull in by certificated changes. I think we should update our upgrade playbooks.

Comment 89 Andrew Butcher 2017-05-16 17:27:29 UTC
Proposed: https://github.com/openshift/openshift-ansible/pull/4201

Comment 94 liujia 2017-06-05 10:02:19 UTC
Reproduced successfully.

Version:
atomic-openshift-utils-3.3.68-1.git.0.3792453.el7.noarch
openshift v3.3.1.17

Steps:
1. Upgrade ocp3.2 to ocp3.3

2. After upgrade successfully, new router
# oc adm router int-router --stats-port=1940 --ports=12080:12080,12443:12443 --service-account=router --host-network=true --host-ports=true

3. Router int-router deploy failed for volumes to attach/mount for pod "int-router-1-75l11"/"default". list of unattached/unmounted volumes=[server-certificate]

4. Check no secret created corresponding to new router's service
# oc get svc/int-router -o json |grep -A 5 annotation
        "annotations": {
            "service.alpha.openshift.io/serving-cert-secret-name": "int-router-certs"
        }
# oc get secrets|grep int-router-certs
#
5. Check master-config.yml
# cat /etc/origin/master/master-config.yaml | grep -A 5 "controllerConfig"
controllerConfig:
  servicesServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key

The bug can be reproduced.

Comment 95 liujia 2017-06-05 10:07:29 UTC
Version:
atomic-openshift-utils-3.3.84-1.git.0.4104d2d.el7.noarch
openshift v3.3.1.34

Steps:
1. Upgrade ocp3.2 to ocp3.3

2. After upgrade successfully, new router
# oc adm router int-router --stats-port=1940 --ports=12080:12080,12443:12443 --service-account=router --host-network=true --host-ports=true

3. Check new router created successfully.
# oc get po|grep int-router
int-router-1-tb227        1/1       Running   0          11m

4. Verify secret created rightly.
# oc get svc/int-router -o json |grep -A 5 annotation
        "annotations": {
            "service.alpha.openshift.io/serving-cert-secret-name": "int-router-certs",
            "service.alpha.openshift.io/serving-cert-signed-by": "/tmp/openshift-ansible-O8iNQTY/openshift-service-serving-signer"
        }

# oc get secrets|grep int-router-certs
int-router-certs           kubernetes.io/tls                     2         13m

5. Verify master-config.yaml is configured rightly.
# cat /etc/origin/master/master-config.yaml | grep -A 5 "controllerConfig"
controllerConfig:
  serviceServingCert:
    signer:
      certFile: service-signer.crt
      keyFile: service-signer.key

Change bug status to verify.

Comment 97 errata-xmlrpc 2017-06-12 15:40:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1429


Note You need to log in before you can comment on or make changes to this bug.