Bug 1469358

Summary: node_certificates failed when the master sequence changed in inventory file
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: InstallerAssignee: Andrew Butcher <abutcher>
Status: CLOSED WONTFIX QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: abutcher, aos-bugs, aprajapa, bleanhar, dmoessne, gmarcote, jkaur, jokerman, jolee, jrosenta, mmccomas, nrevo, sdodson
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 14:13:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anping Li 2017-07-11 05:47:13 UTC
Description of problem:
/etc/origin/master/ca.serial.txt is only on the origin first master. When the first master is broken or the sequence of masters is changed in inventory file, the node_certificates may fail for there isn't /etc/origin/master/ca.serial.txt on the other masters.

Version-Release number of the following components:
openshift-ansible-3.6.140

How reproducible:
always 

Steps to Reproduce:
1. install HA OCP 3.6
2. adjust the sequence of masters in inventory file.
3. redeploy node certification or scaleup node
   ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-node-certificates.yml
   ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-node/scaleup.yml


Actual results:
TASK [openshift_node_certificates : Generate the node client config] ***********
failed: [openshift-210.lab.eng.nay.redhat.com -> openshift-182.lab.eng.nay.redhat.com] (item=openshift-210.lab.eng.nay.redhat.com) => {
    "changed": true, 
    "cmd": [
        "/usr/local/bin/oc", 
        "adm", 
        "create-api-client-config", 
        "--certificate-authority=/etc/origin/master/ca.crt", 
        "--client-dir=/etc/origin/generated-configs/node-openshift-210.lab.eng.nay.redhat.com", 
        "--groups=system:nodes", 
        "--master=https://openshift-220.lab.eng.nay.redhat.com:8443", 
        "--signer-cert=/etc/origin/master/ca.crt", 
        "--signer-key=/etc/origin/master/ca.key", 
        "--signer-serial=/etc/origin/master/ca.serial.txt", 
        "--user=system:node:openshift-210.lab.eng.nay.redhat.com", 
        "--expire-days=730"
    ], 
    "delta": "0:00:00.244486", 
    "end": "2017-07-10 22:59:01.502011", 
    "failed": true, 
    "item": "openshift-210.lab.eng.nay.redhat.com", 
    "rc": 1, 
    "start": "2017-07-10 22:59:01.257525", 
    "warnings": []
}

STDERR:

error: --signer-serial, "/etc/origin/master/ca.serial.txt" must be a valid file
See 'oc adm create-api-client-config -h' for help and examples.

Expected results:


Expected results:

Additional info:
/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml works well.

Comment 1 Scott Dodson 2017-07-11 13:08:47 UTC
Andrew, Tim,

Do you think we should replicate the CA data to the other masters for disaster recovery?

Comment 2 Andrew Butcher 2017-07-13 18:34:42 UTC
@Scott, yep we should absolutely be syncing the serial file after we sign any certificates within openshift_master_certificates, openshift_node_certificates and likely openshift_hosted roles.

Comment 3 Tim Bielawa 2017-08-14 19:45:54 UTC
beginning work on this here https://github.com/openshift/openshift-ansible/pull/5085