Bug 1389264 - [3.4] openshift_certificate_expiry missed embeded-etcd's cert check
Summary: [3.4] openshift_certificate_expiry missed embeded-etcd's cert check
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.4.z
Assignee: Tim Bielawa
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-27 10:04 UTC by liujia
Modified: 2017-01-31 21:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The expiry role was not checking for emebdded etcd environments. Consequence: Health of embedded etcd certificates were not evaluated. Fix: The openshift_cert_expiry module correctly identifies embedded etcd environments. Result: Health of embedded etcd certificates are now evaluated.
Clone Of:
Environment:
Last Closed: 2017-01-31 21:10:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
embedded etcd inventory (954 bytes, text/plain)
2017-01-09 17:13 UTC, Tim Bielawa
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0224 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix update 2017-02-01 02:10:09 UTC

Description liujia 2016-10-27 10:04:39 UTC
Description of problem:
Run a playbook with openshift_certificate_expiry role to check certs in my env(one master with embeded etcd and one node), the check result shows that there is no etcd certs. But for embeded etcd, it indeed has a cert which configured in master-config.yaml and we missed check these certs. 

etcdConfig:
  address: 192.168.2.53:4001
  peerAddress: 192.168.2.53:7001
  peerServingInfo:
    bindAddress: 0.0.0.0:7001
    certFile: etcd.server.crt
    clientCA: ca.crt
    keyFile: etcd.server.key
  servingInfo:
    bindAddress: 0.0.0.0:4001
    certFile: etcd.server.crt
    clientCA: ca.crt
    keyFile: etcd.server.key
  storageDirectory: /var/lib/origin/openshift.local.etcd


Version-Release number of selected component (if applicable):
openshift-ansible-roles-3.4.12-1.git.0.0b5efd2.el7.noarch

How reproducible:
always

Steps to Reproduce:
1.Install OCP3.4 agasint env(one master with embeded etcd and one node)
2.Wrote a playbook to run role openshift_certificate_expiry.
cat playbook.yml
---
- name: Check cert expirys
  hosts: all
  become: yes
  gather_facts: no
  roles:
    - role: openshift_certificate_expiry

#ansible-playbook -v -i /root/.config/openshift/hosts playbook.yml


Actual results:
No etcd certs have been checked.

TASK [openshift_certificate_expiry : Check cert expirys on host] ***************
ok: [openshift-116.x.x.x] => {
    ...
    "summary": {
        "etcd_certificates": 0, 
        "expired": 0, 
        "kubeconfig_certificates": 1, 
        "ok": 3, 
        "registry_certs": 0, 
        "router_certs": 0, 
        "system_certificates": 2, 
        "total": 3, 
        "warning": 0
    }
}

MSG:

Checked 3 total certificates. Expired/Warning/OK: 0/0/3. Warning window: 30 days

ok: [openshift-136.x.x.x] => {
    ...
    "summary": {
        "etcd_certificates": 0, 
        "expired": 0, 
        "kubeconfig_certificates": 5, 
        "ok": 11, 
        "registry_certs": 1, 
        "router_certs": 1, 
        "system_certificates": 4, 
        "total": 11, 
        "warning": 0
    }
}

MSG:

Checked 11 total certificates. Expired/Warning/OK: 0/0/11. Warning window: 30 days


Expected results:
For embeded etcd, we need check certs in other path instead of '/etc/etcd/etcd.conf'.

Additional info:

Comment 1 Tim Bielawa 2016-10-27 17:42:17 UTC
Nice catch.

Pushing this back while we focus on release critical bugs.

Added this bug to the Cert Expiration trello card for some upcoming feature updates (more certificate types will be checked in the next update).

https://trello.com/c/F92ZDSXy/300-3-warn-admins-that-their-certificates-will-soon-expire

Comment 2 Tim Bielawa 2016-11-22 17:07:49 UTC
A fix for this (and other things) is in the works in here:

https://github.com/openshift/openshift-ansible/pull/2829

Comment 3 Tim Bielawa 2016-12-19 18:01:37 UTC
Patch has been merged into master.

Comment 5 liujia 2017-01-09 08:20:23 UTC
Version:
openshift-ansible-roles-3.4.43-1.git.0.a9dbe87.el7.noarch

Steps:
1.Install OCP3.4 agasint env(one master with embeded etcd)
2.Wrote a playbook to run role openshift_certificate_expiry.
cat test.yml
---
- name: Check cert expirys
  hosts: all
  become: yes
  gather_facts: no
  roles:
    - role: openshift_certificate_expiry

#ansible-playbook -v -i hosts test.yml

Result:
Embeded etcd certificates are still not checked.
Then according to PR, i checked the expected changes in $/roles/openshift_certificate_expiry directory to find that no new changes in these files for latest version of OCP3.4(3.4.43).So change the status back.

Comment 6 Tim Bielawa 2017-01-09 16:57:09 UTC
liujia, that's strange.

*ALL* of the code for checking etcd (embedded and external) is under this block:

https://github.com/openshift/openshift-ansible/blame/master/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py#L469

This selection checks the external etcd, it has not been modified since it was originally written

https://github.com/openshift/openshift-ansible/blame/master/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py#L473-L510

This selected block checks for embedded etcd, it reads etcd configuration values from the /etc/origin/master/master-config.yaml file. It is in the master branch:

https://github.com/openshift/openshift-ansible/blame/master/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py#L512-L547

You can see in the blame view that it was added November 18th. 


Maybe there is confusion here? Presently *all* checked etcd certificate are categorized under the topic "etcd_certs", there is no differentiation between embedded and internal, as far as result reporting is concerned.

I'm going to reinstall a cluster and test this out again. I need to make a note of how to ensure a cluster is installed with *embedded* etcd, since it's not intuitive.

Comment 7 Tim Bielawa 2017-01-09 17:13:04 UTC
EMBEDDED ETCD CONFIGURATION NOTE:

To ENSURE you install etcd in an embedded state you must:

* Remove 'etcd' from the [OSEv3:children] inventory section
* Remove the [etcd] inventory section

Then run your install. Etcd will not be installed externally. It will run inside of openshift. You can verify this by checking for the existence of '/etc/etcd/'. It will not exist if the install is properly embedded.

Furthermore, if you run

> lsof -i -P | grep -E '(4001|7001)'

on the master with embedded etcd you will find that only the 'openshift' process is listening on those ports.

Comment 8 Tim Bielawa 2017-01-09 17:13:55 UTC
Created attachment 1238831 [details]
embedded etcd inventory

Comment 9 Tim Bielawa 2017-01-09 17:23:52 UTC
@liujia, i've attached my inventory which installs OCP using embedded etcd. I have ran the verification steps I described in my previous comment to ensure etcd is embedded and not external.

Using this playbook and the inventory file attachment I ran the certificate expiry checker

> ---
> - name: Check cert expirys
>   hosts: nodes:masters:etcd
>   become: yes
>   gather_facts: no
>   vars:
>     openshift_certificate_expiry_show_all: yes
>     openshift_certificate_expiry_generate_html_report: yes
>     openshift_certificate_expiry_warning_days: 1500
>     openshift_certificate_expiry_save_json_results: yes
>   roles:
>     - role: openshift_certificate_expiry

This GitHub gist has the generated JSON results for viewing:

https://gist.github.com/tbielawa/05e3c4c24295b8180cecd00d31680ee0#file-check-results-json-L3

You can see under the path `data.[m01.example.com].etcd` that a certificate was checked.

>      "etcd": [
>        {
>          "cert_cn": "CN:172.30.0.1, DNS:kubernetes,...", 
>          "days_remaining": 730, 
>          "expiry": "2019-01-09 17:00:03", 
>          "health": "warning", 
>          "path": "/etc/origin/master/etcd.server.crt"
>        }

The github gist also has the playbook I used included in it for easier copy-paste.

Are you expecting to see something else, liujia? Such as a specific section for embedded etcd certs?

> $ rpm -q openshift-ansible-roles                 
> openshift-ansible-roles-3.4.17-1.git.315.96fe76d.fc23.noarch

Comment 10 liujia 2017-01-10 02:49:45 UTC
Hi Tim,

I do reviewed all changes about embedded etcd certs check from the pr in the comment 2. I think that is a perfect fix for this bug just like your comment 6.

As for embedded state for install, my inventory file is the same as yours in comment 7/8.
* no 'etcd' from the [OSEv3:children] inventory section
* no [etcd] inventory section

You can checked it at /root/work/inventory/test.(I will give you env info in next comment)

All your verification steps  in comment 9 looks good to me. And I checked files again in /usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry directory to find that no new changes in these files for latest version i got.

# rpm -q openshift-ansible-roles
openshift-ansible-roles-3.4.43-1.git.0.a9dbe87.el7.noarch

So i think the only problem seems your fix are not included in the latest 3.4 puddle.

Comment 12 Tim Bielawa 2017-01-10 17:12:42 UTC
liujia, from the latest puddles I downloaded and extracted

* openshift-ansible-roles-3.4.43-1.git.0.a9dbe87.el7.noarch.rpm
* openshift-ansible-roles-3.4.44-1.git.0.efa61c6.el7.noarch.rpm

You are correct in that the updated embedded etcd checks are not present in those packages. I'm going to take actions to correct that now.

Comment 13 Tim Bielawa 2017-01-11 16:31:35 UTC
liujia, I've synced with someone who knows more about puddles than I do and the conclusion is: the PR with the fixes you are missing will be included in the next 3.4 and 3.5 puddle. 

Scott Dodson is going to build the new puddle today. The puddle will ship after 3.4 goes GA.

Comment 14 liujia 2017-01-12 06:02:53 UTC
Thx, Tim. I got it now. Then I will verify it when the new puddle comes out later.

Comment 15 Tim Bielawa 2017-01-13 20:01:33 UTC
> Thx, Tim. I got it now. Then I will verify it when the new puddle comes out later.

New puddle just dropped!

http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.5/2017-01-13.2/x86_64/os/Packages/

I examined ./openshift-ansible-roles-3.5.0-1.git.0.847bfb9.el7.noarch/usr/share/ansible/openshift-ansible/roles/openshift_certificate_expiry/library/openshift_cert_expiry.py and verified that the embedded etcd checking is present.

Comment 17 liujia 2017-01-19 06:03:49 UTC
Version:
openshift-ansible-roles-3.4.55-1.git.0.9cb1f40.el7.noarch

Steps:
1.Install OCP3.4 agasint env(one master with embeded etcd)
2.Wrote a playbook to run role openshift_certificate_expiry.
cat test.yml
---
- name: Check cert expirys
  hosts: all
  become: yes
  gather_facts: no
  roles:
    - role: openshift_certificate_expiry

#ansible-playbook -v -i hosts test.yml

Result:
    "summary": {
        "etcd_certificates": 1, 
        "expired": 0, 
        "kubeconfig_certificates": 5, 
        "ok": 12, 
        "registry_certs": 1, 
        "router_certs": 1, 
        "system_certificates": 4, 
        "total": 12, 
        "warning": 0
    }

Change bug status to verify.

Comment 19 errata-xmlrpc 2017-01-31 21:10:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0224


Note You need to log in before you can comment on or make changes to this bug.