1469358 – node_certificates failed when the master sequence changed in inventory file

Bug 1469358 - node_certificates failed when the master sequence changed in inventory file

Summary: node_certificates failed when the master sequence changed in inventory file

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Andrew Butcher
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-11 05:47 UTC by Anping Li
Modified:	2019-04-17 17:21 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 14:13:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1529532	0	low	CLOSED	[RFE] etcd ca should be available on all nodes after installation	2022-03-13 14:36:40 UTC

Description Anping Li 2017-07-11 05:47:13 UTC

Description of problem:
/etc/origin/master/ca.serial.txt is only on the origin first master. When the first master is broken or the sequence of masters is changed in inventory file, the node_certificates may fail for there isn't /etc/origin/master/ca.serial.txt on the other masters.

Version-Release number of the following components:
openshift-ansible-3.6.140

How reproducible:
always 

Steps to Reproduce:
1. install HA OCP 3.6
2. adjust the sequence of masters in inventory file.
3. redeploy node certification or scaleup node
   ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-node-certificates.yml
   ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-node/scaleup.yml


Actual results:
TASK [openshift_node_certificates : Generate the node client config] ***********
failed: [openshift-210.lab.eng.nay.redhat.com -> openshift-182.lab.eng.nay.redhat.com] (item=openshift-210.lab.eng.nay.redhat.com) => {
    "changed": true, 
    "cmd": [
        "/usr/local/bin/oc", 
        "adm", 
        "create-api-client-config", 
        "--certificate-authority=/etc/origin/master/ca.crt", 
        "--client-dir=/etc/origin/generated-configs/node-openshift-210.lab.eng.nay.redhat.com", 
        "--groups=system:nodes", 
        "--master=https://openshift-220.lab.eng.nay.redhat.com:8443", 
        "--signer-cert=/etc/origin/master/ca.crt", 
        "--signer-key=/etc/origin/master/ca.key", 
        "--signer-serial=/etc/origin/master/ca.serial.txt", 
        "--user=system:node:openshift-210.lab.eng.nay.redhat.com", 
        "--expire-days=730"
    ], 
    "delta": "0:00:00.244486", 
    "end": "2017-07-10 22:59:01.502011", 
    "failed": true, 
    "item": "openshift-210.lab.eng.nay.redhat.com", 
    "rc": 1, 
    "start": "2017-07-10 22:59:01.257525", 
    "warnings": []
}

STDERR:

error: --signer-serial, "/etc/origin/master/ca.serial.txt" must be a valid file
See 'oc adm create-api-client-config -h' for help and examples.

Expected results:


Expected results:

Additional info:
/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml works well.

Comment 1 Scott Dodson 2017-07-11 13:08:47 UTC

Andrew, Tim,

Do you think we should replicate the CA data to the other masters for disaster recovery?

Comment 2 Andrew Butcher 2017-07-13 18:34:42 UTC

@Scott, yep we should absolutely be syncing the serial file after we sign any certificates within openshift_master_certificates, openshift_node_certificates and likely openshift_hosted roles.

Comment 3 Tim Bielawa 2017-08-14 19:45:54 UTC

beginning work on this here https://github.com/openshift/openshift-ansible/pull/5085

Note You need to log in before you can comment on or make changes to this bug.