Description of problem: Run a playbook with openshift_certificate_expiry role to check certs in my env(one master with embeded etcd and one node), the check result shows that there is no etcd certs. But for embeded etcd, it indeed has a cert which configured in master-config.yaml and we missed check these certs. etcdConfig: address: 192.168.2.53:4001 peerAddress: 192.168.2.53:7001 peerServingInfo: bindAddress: 0.0.0.0:7001 certFile: etcd.server.crt clientCA: ca.crt keyFile: etcd.server.key servingInfo: bindAddress: 0.0.0.0:4001 certFile: etcd.server.crt clientCA: ca.crt keyFile: etcd.server.key storageDirectory: /var/lib/origin/openshift.local.etcd Version-Release number of selected component (if applicable): openshift-ansible-roles-3.4.12-1.git.0.0b5efd2.el7.noarch How reproducible: always Steps to Reproduce: 1.Install OCP3.4 agasint env(one master with embeded etcd and one node) 2.Wrote a playbook to run role openshift_certificate_expiry. cat playbook.yml --- - name: Check cert expirys hosts: all become: yes gather_facts: no roles: - role: openshift_certificate_expiry #ansible-playbook -v -i /root/.config/openshift/hosts playbook.yml Actual results: No etcd certs have been checked. TASK [openshift_certificate_expiry : Check cert expirys on host] *************** ok: [openshift-116.x.x.x] => { ... "summary": { "etcd_certificates": 0, "expired": 0, "kubeconfig_certificates": 1, "ok": 3, "registry_certs": 0, "router_certs": 0, "system_certificates": 2, "total": 3, "warning": 0 } } MSG: Checked 3 total certificates. Expired/Warning/OK: 0/0/3. Warning window: 30 days ok: [openshift-136.x.x.x] => { ... "summary": { "etcd_certificates": 0, "expired": 0, "kubeconfig_certificates": 5, "ok": 11, "registry_certs": 1, "router_certs": 1, "system_certificates": 4, "total": 11, "warning": 0 } } MSG: Checked 11 total certificates. Expired/Warning/OK: 0/0/11. Warning window: 30 days Expected results: For embeded etcd, we need check certs in other path instead of '/etc/etcd/etcd.conf'. Additional info:
Nice catch. Pushing this back while we focus on release critical bugs. Added this bug to the Cert Expiration trello card for some upcoming feature updates (more certificate types will be checked in the next update). https://trello.com/c/F92ZDSXy/300-3-warn-admins-that-their-certificates-will-soon-expire
A fix for this (and other things) is in the works in here: https://github.com/openshift/openshift-ansible/pull/2829
Patch has been merged into master.
Version: openshift-ansible-roles-3.4.43-1.git.0.a9dbe87.el7.noarch Steps: 1.Install OCP3.4 agasint env(one master with embeded etcd) 2.Wrote a playbook to run role openshift_certificate_expiry. cat test.yml --- - name: Check cert expirys hosts: all become: yes gather_facts: no roles: - role: openshift_certificate_expiry #ansible-playbook -v -i hosts test.yml Result: Embeded etcd certificates are still not checked. Then according to PR, i checked the expected changes in $/roles/openshift_certificate_expiry directory to find that no new changes in these files for latest version of OCP3.4(3.4.43).So change the status back.
liujia, that's strange. *ALL* of the code for checking etcd (embedded and external) is under this block: https://github.com/openshift/openshift-ansible/blame/master/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py#L469 This selection checks the external etcd, it has not been modified since it was originally written https://github.com/openshift/openshift-ansible/blame/master/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py#L473-L510 This selected block checks for embedded etcd, it reads etcd configuration values from the /etc/origin/master/master-config.yaml file. It is in the master branch: https://github.com/openshift/openshift-ansible/blame/master/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py#L512-L547 You can see in the blame view that it was added November 18th. Maybe there is confusion here? Presently *all* checked etcd certificate are categorized under the topic "etcd_certs", there is no differentiation between embedded and internal, as far as result reporting is concerned. I'm going to reinstall a cluster and test this out again. I need to make a note of how to ensure a cluster is installed with *embedded* etcd, since it's not intuitive.
EMBEDDED ETCD CONFIGURATION NOTE: To ENSURE you install etcd in an embedded state you must: * Remove 'etcd' from the [OSEv3:children] inventory section * Remove the [etcd] inventory section Then run your install. Etcd will not be installed externally. It will run inside of openshift. You can verify this by checking for the existence of '/etc/etcd/'. It will not exist if the install is properly embedded. Furthermore, if you run > lsof -i -P | grep -E '(4001|7001)' on the master with embedded etcd you will find that only the 'openshift' process is listening on those ports.
Created attachment 1238831 [details] embedded etcd inventory
@liujia, i've attached my inventory which installs OCP using embedded etcd. I have ran the verification steps I described in my previous comment to ensure etcd is embedded and not external. Using this playbook and the inventory file attachment I ran the certificate expiry checker > --- > - name: Check cert expirys > hosts: nodes:masters:etcd > become: yes > gather_facts: no > vars: > openshift_certificate_expiry_show_all: yes > openshift_certificate_expiry_generate_html_report: yes > openshift_certificate_expiry_warning_days: 1500 > openshift_certificate_expiry_save_json_results: yes > roles: > - role: openshift_certificate_expiry This GitHub gist has the generated JSON results for viewing: https://gist.github.com/tbielawa/05e3c4c24295b8180cecd00d31680ee0#file-check-results-json-L3 You can see under the path `data.[m01.example.com].etcd` that a certificate was checked. > "etcd": [ > { > "cert_cn": "CN:172.30.0.1, DNS:kubernetes,...", > "days_remaining": 730, > "expiry": "2019-01-09 17:00:03", > "health": "warning", > "path": "/etc/origin/master/etcd.server.crt" > } The github gist also has the playbook I used included in it for easier copy-paste. Are you expecting to see something else, liujia? Such as a specific section for embedded etcd certs? > $ rpm -q openshift-ansible-roles > openshift-ansible-roles-3.4.17-1.git.315.96fe76d.fc23.noarch
Hi Tim, I do reviewed all changes about embedded etcd certs check from the pr in the comment 2. I think that is a perfect fix for this bug just like your comment 6. As for embedded state for install, my inventory file is the same as yours in comment 7/8. * no 'etcd' from the [OSEv3:children] inventory section * no [etcd] inventory section You can checked it at /root/work/inventory/test.(I will give you env info in next comment) All your verification steps in comment 9 looks good to me. And I checked files again in /usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry directory to find that no new changes in these files for latest version i got. # rpm -q openshift-ansible-roles openshift-ansible-roles-3.4.43-1.git.0.a9dbe87.el7.noarch So i think the only problem seems your fix are not included in the latest 3.4 puddle.
liujia, from the latest puddles I downloaded and extracted * openshift-ansible-roles-3.4.43-1.git.0.a9dbe87.el7.noarch.rpm * openshift-ansible-roles-3.4.44-1.git.0.efa61c6.el7.noarch.rpm You are correct in that the updated embedded etcd checks are not present in those packages. I'm going to take actions to correct that now.
liujia, I've synced with someone who knows more about puddles than I do and the conclusion is: the PR with the fixes you are missing will be included in the next 3.4 and 3.5 puddle. Scott Dodson is going to build the new puddle today. The puddle will ship after 3.4 goes GA.
Thx, Tim. I got it now. Then I will verify it when the new puddle comes out later.
> Thx, Tim. I got it now. Then I will verify it when the new puddle comes out later. New puddle just dropped! http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.5/2017-01-13.2/x86_64/os/Packages/ I examined ./openshift-ansible-roles-3.5.0-1.git.0.847bfb9.el7.noarch/usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py and verified that the embedded etcd checking is present.
Version: openshift-ansible-roles-3.4.55-1.git.0.9cb1f40.el7.noarch Steps: 1.Install OCP3.4 agasint env(one master with embeded etcd) 2.Wrote a playbook to run role openshift_certificate_expiry. cat test.yml --- - name: Check cert expirys hosts: all become: yes gather_facts: no roles: - role: openshift_certificate_expiry #ansible-playbook -v -i hosts test.yml Result: "summary": { "etcd_certificates": 1, "expired": 0, "kubeconfig_certificates": 5, "ok": 12, "registry_certs": 1, "router_certs": 1, "system_certificates": 4, "total": 12, "warning": 0 } Change bug status to verify.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0224