Description of problem: /etc/origin/master/ca.serial.txt is only on the origin first master. When the first master is broken or the sequence of masters is changed in inventory file, the node_certificates may fail for there isn't /etc/origin/master/ca.serial.txt on the other masters. Version-Release number of the following components: openshift-ansible-3.6.140 How reproducible: always Steps to Reproduce: 1. install HA OCP 3.6 2. adjust the sequence of masters in inventory file. 3. redeploy node certification or scaleup node ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-node-certificates.yml ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-node/scaleup.yml Actual results: TASK [openshift_node_certificates : Generate the node client config] *********** failed: [openshift-210.lab.eng.nay.redhat.com -> openshift-182.lab.eng.nay.redhat.com] (item=openshift-210.lab.eng.nay.redhat.com) => { "changed": true, "cmd": [ "/usr/local/bin/oc", "adm", "create-api-client-config", "--certificate-authority=/etc/origin/master/ca.crt", "--client-dir=/etc/origin/generated-configs/node-openshift-210.lab.eng.nay.redhat.com", "--groups=system:nodes", "--master=https://openshift-220.lab.eng.nay.redhat.com:8443", "--signer-cert=/etc/origin/master/ca.crt", "--signer-key=/etc/origin/master/ca.key", "--signer-serial=/etc/origin/master/ca.serial.txt", "--user=system:node:openshift-210.lab.eng.nay.redhat.com", "--expire-days=730" ], "delta": "0:00:00.244486", "end": "2017-07-10 22:59:01.502011", "failed": true, "item": "openshift-210.lab.eng.nay.redhat.com", "rc": 1, "start": "2017-07-10 22:59:01.257525", "warnings": [] } STDERR: error: --signer-serial, "/etc/origin/master/ca.serial.txt" must be a valid file See 'oc adm create-api-client-config -h' for help and examples. Expected results: Expected results: Additional info: /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml works well.
Andrew, Tim, Do you think we should replicate the CA data to the other masters for disaster recovery?
@Scott, yep we should absolutely be syncing the serial file after we sign any certificates within openshift_master_certificates, openshift_node_certificates and likely openshift_hosted roles.
beginning work on this here https://github.com/openshift/openshift-ansible/pull/5085