Bug 1426677
| Summary: | scaleup playbook doesn't consider ca certificate specified in openshift_master_overwrite_certificates | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Takayoshi Tanaka <tatanaka> | |
| Component: | Installer | Assignee: | Andrew Butcher <abutcher> | |
| Status: | CLOSED ERRATA | QA Contact: | Gan Huang <ghuang> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 3.4.1 | CC: | abutcher, aos-bugs, jialiu, jkaur, jokerman, mmccomas | |
| Target Milestone: | --- | Keywords: | NeedsTestCase | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously the specified openshift_master_ca_certificate file was not deployed when performing a master scaleup. The scaleup playbooks have been updated to ensure that this certificate is deployed.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1469230 (view as bug list) | Environment: | ||
| Last Closed: | 2017-04-12 19:02:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1469230 | |||
I found the error occurs this task.
/usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_master/tasks/main.yml
```
# A separate wait is required here for native HA since notifies will
# be resolved after all tasks in the role.
- name: Wait for API to become available
# Using curl here since the uri module requires python-httplib2 and
# wait_for port doesn't provide health information.
command: >
curl --silent
{% if openshift.common.version_gte_3_2_or_1_2 | bool %}
--cacert {{ openshift.common.config_base }}/master/ca-bundle.crt
{% else %}
--cacert {{ openshift.common.config_base }}/master/ca.crt
{% endif %}
{{ openshift.master.api_url }}/healthz/ready
register: api_available_output
until: api_available_output.stdout == 'ok'
retries: 120
delay: 1
run_once: true
changed_when: false
when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and master_api_service_status_changed | bool
```
It appears only this task doesn't consider the custom CA cert and this task is actually nothing to do (only check the status). So I skipped this task.
```
#when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and master_api_service_status_changed | bool
when: false
```
After edited the main.yml and run scaleup playbook again, the playbook runs successfully. (tatanaka-ose3-cert-master4.usersys.redhat.com is new master)
Do you think, is this a workaround for the customer?
1) Test against 3.5
Cannot reproduce this issue with openshift-ansible-3.5.18-1.git.0.01f8d4a.el7.noarch (patches not applied), openshift-ansible-3.5.20-1.git.0.5a5fcd5.el7.noarch (patches applied).
2) Test against 3.4
Test with openshift-ansible-3.4.63-1 (not sure what actual version was used from customer).
Task failed at "verify api server"
fatal: [openshift-125.lab.sjc.redhat.com]: FAILED! => {
"attempts": 120,
"changed": false,
"cmd": [
"curl",
"--silent",
"--tlsv1.2",
"--cacert",
"/etc/origin/master/ca-bundle.crt",
"https://openshift-105.lab.sjc.redhat.com/healthz/ready"
],
"delta": "0:00:00.156160",
"end": "2017-03-03 03:10:14.481095",
"failed": true,
"rc": 60,
"start": "2017-03-03 03:10:14.324935",
"warnings": []
}
Also hit the issue after trying to apply the fix manually.
--- a/filter_plugins/openshift_master.py
+++ b/filter_plugins/openshift_master.py
@@ -526,6 +526,7 @@ class FilterModule(object):
raise errors.AnsibleFilterError("|failed expects hostvars is a dict")
certs = ['ca.crt',
'ca.key',
+ 'ca-bundle.crt',
'admin.crt',
'admin.key',
'admin.kubeconfig',
Move to verified as the fix is targeted to 3.5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0903 |
Description of problem: When adding new master(s) to existing cluster with scaleup playbook, it doesn't consider custom ca certificate specified in openshift_master_overwrite_certificates. This causes failed to complete the scaleup playbook. It means we can't add new master(s) when we use custom ca cert. Version-Release number of selected component (if applicable): # openshift version openshift v3.4.1.7 # ansible version --version ansible 2.2.0.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides How reproducible: I reproduced on my environment. Steps to Reproduce: 1. Create custom ca cert and web server cert and key for custom certificates. 2. Set up multi-masters OpenShift cluster with specifying openshift_master_overwrite_certificates like below. ``` openshift_master_named_certificates=[{"certfile": "/root/www.crt", "keyfile": "/root/www.key", "cafile": "/root/cacert.pem", "names": ["lb.example.com"]}] openshift_master_overwrite_named_certificates=true ``` 3. At this point, OpenShift master API and web console are secured with specified certificate and custom CA. 4. Add new node with scaleup playbook. I added new_master and new_nodes in ansible host file like below. "openshift_master_named_certificates" and "openshift_master_overwrite_named_certificates" remain the same as above. ``` [OSEv3:children] masters nodes etcd lb new_masters new_nodes [new_nodes] new-master.example.com penshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=false [new_masters] new-master.example.com ``` Actual results: Playbook failed by below error. ``` TASK [openshift_master : Wait for API to become available] ********************* FAILED - RETRYING: TASK: openshift_master : Wait for API to become available (120 retries left). FAILED - RETRYING: TASK: openshift_master : Wait for API to become available (119 retries left). ** snipping ** fatal: [new-master.example.com]: FAILED! => { "attempts": 120, "changed": false, "cmd": [ "curl", "--silent", "--tlsv1.2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://lb.example.com:8443/healthz/ready" ], "delta": "0:00:00.149118", "end": "2017-02-24 02:55:38.146070", "failed": true, "rc": 60, "start": "2017-02-24 02:55:37.996952", "warnings": [] } ``` On the new master: ``` # curl --tlsv1.2 --cacert /etc/origin/master/ca-bundle.crt https://lb.example.com:8443/healthz/readycat /etc/origin/master/ca-bundle.crt curl: (60) Peer's Certificate issuer is not recognized. More details here: http://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. curl: (3) <url> malformed ``` Expected results: Complete the playbook and new master is successfully added. Additional info: The customer is facing this issue and can't add new master at present. I attached the case info at the next private comment.