Bug 1500981
| Summary: | Metrics upgrade failing to create heapster-certs - 3.5 to 3.6.1 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Matthew Robson <mrobson> |
| Component: | Hawkular | Assignee: | Juraci Paixão Kröhling <jcosta> |
| Status: | CLOSED NOTABUG | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.6.1 | CC: | aos-bugs, deads, dlbewley, erjones, jcosta, jmalde, juzhao, mrobson, mwringe |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-27 18:47:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Matthew Robson
2017-10-11 21:59:27 UTC
What is the OpenShift Ansible version they are using? They if they are installing metrics onto OCP 3.6.1 they need to be using a 3.6.x version of OpenShift Ansible. I suspect that is the problem here. Looks like this was a documentation issue (or rather, lack thereof). I'm closing this issue as NOTABUG, but if you think there's something we could do here, feel free to reopen. *** Bug 1500946 has been marked as a duplicate of this bug. *** Same problem in v3.6.173.0.21. Nothing in docs on this particular issue. The heapster rc has a volume requesting mount of secret `heapster-certs`, but playbook does not create such a secret in `openshift-infra`. If you're running into this, here is the solution:
Check the master controller log for a skipping of the service-serving-cert controller. You may need to restart the controller if the logs has rolled over:
Oct 12 10:23:27 atomic-openshift-master-controllers: I1012 10:23:27.804711 49015 start_master.go:773] Starting "openshift.io/service-serving-cert"
Oct 12 10:23:27 atomic-openshift-master-controllers: W1012 10:23:27.804716 49015 start_master.go:780] Skipping "openshift.io/service-serving-cert"
The root of this is a missing cert config for serviceServingCert in the master-config.yaml
Check if this exists for the masters:
controllerConfig:
serviceServingCert:
signer:
certFile: service-signer.crt
keyFile: service-signer.key
You can generate those keys via;
oc adm ca create-signer-cert --cert=service-signer.crt --key=service-signer.key --name=openshift-service-serving-signer --serial=service-signer.serial.txt
Copy the crt and key into /etc/origin/master/ and add the above config to master-config.yaml and restart the masters. Service Certs should start working now if you recreate the service.
Typically this would have been created on an OpenShift upgrade via:
https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml
But if you do manual upgrades, this would not have been done.
Raised a docs issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1501994
And oc adm diagnostics / master startup will be enhanced to better WARN about this:
https://github.com/openshift/origin/pull/16863
The missing secret is actually created by the heapster service using a special annotation 'serving-cert-secret':
oc describe svc heapster
[root@osemaster1 etcd]# oc describe svc heapster
Name: heapster
Namespace: openshift-infra
Labels: metrics-infra=heapster
name=heapster
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.alpha.openshift.io/serving-cert-secret-name":"heapster-certs"},"labels":{"metri...
service.alpha.openshift.io/serving-cert-secret-name=heapster-certs
service.alpha.openshift.io/serving-cert-signed-by=openshift-service-serving-signer@1498853387
Selector: name=heapster
Thank you for those details. I am in fact missing `serviceServingCert`.
I do use the playbook for upgrades and have upgraded this to each point release since 3.0, so perhaps there is a corner case caused by cruft. It appears that those tasks were skipped by conditional evaluation in the playbook. I have a case open 01948010 and will update details there when I investigate further.
From upgrade log:
```
[root@ose-prod-master-01 3.6]# grep -B1 -A3 create_service_signer_cert.yml 20171008-1736-ansible-upgrade.log
TASK [Create local temp directory for syncing certs] ******************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:8
skipping: [localhost] => {
"changed": false,
"skip_reason": "Conditional result was False",
--
TASK [Create remote temp directory for creating certs] ****************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:17
skipping: [ose-prod-master-01.example.com] => {
"changed": false,
"skip_reason": "Conditional result was False",
--
TASK [Create service signer certificate] ******************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:23
skipping: [ose-prod-master-01.example.com] => {
"changed": false,
"skip_reason": "Conditional result was False",
--
TASK [Retrieve service signer certificate] ****************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:34
skipping: [ose-prod-master-01.example.com] => (item=service-signer.crt) => {
"changed": false,
"item": "service-signer.crt",
--
TASK [Delete remote temp directory] ***********************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:46
skipping: [ose-prod-master-01.example.com] => {
"changed": false,
"skip_reason": "Conditional result was False",
--
TASK [Deploy service signer certificate] ******************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:56
skipping: [ose-prod-master-01.example.com] => (item=service-signer.crt) => {
"changed": false,
"item": "service-signer.crt",
--
TASK [Delete local temp directory] ************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/create_service_signer_cert.yml:71
skipping: [localhost] => {
"changed": false,
"skip_reason": "Conditional result was False",
```
Reopening this bug as customer in new attached case is seeing the "start_master.go:780] Skipping "openshift.io/service-serving-cert"" message in the master's logs and they do have the controllerConfig stuff in the master config. We even tried recreating the certs but it still is producing that message ^. And we might've identified a typo/possible other bug. I will update this bug again after getting confirmation from customer Problem was caused by a typo. Customer had [0] but it should be [1].
[0]
controllerConfig:
servicesServingCert:
signer:
certFile: service-signer.crt
keyFile: service-signer.key
[1]
controllerConfig:
serviceServingCert:
signer:
certFile: service-signer.crt
keyFile: service-signer.key
Correcting this fixed the customer's issue. reclosing this bz.
|