Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1426677

Summary: scaleup playbook doesn't consider ca certificate specified in openshift_master_overwrite_certificates
Product: OpenShift Container Platform Reporter: Takayoshi Tanaka <tatanaka>
Component: InstallerAssignee: Andrew Butcher <abutcher>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.1CC: abutcher, aos-bugs, jialiu, jkaur, jokerman, mmccomas
Target Milestone: ---Keywords: NeedsTestCase
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously the specified openshift_master_ca_certificate file was not deployed when performing a master scaleup. The scaleup playbooks have been updated to ensure that this certificate is deployed.
Story Points: ---
Clone Of:
: 1469230 (view as bug list) Environment:
Last Closed: 2017-04-12 19:02:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469230    

Description Takayoshi Tanaka 2017-02-24 15:17:54 UTC
Description of problem:
When adding new master(s) to existing cluster with scaleup playbook, it doesn't consider custom ca certificate specified in openshift_master_overwrite_certificates. This causes failed to complete the scaleup playbook. It means we can't add new master(s) when we use custom ca cert.

Version-Release number of selected component (if applicable):
# openshift version
openshift v3.4.1.7

# ansible version --version
ansible 2.2.0.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides

How reproducible:
I reproduced on my environment.

Steps to Reproduce:
1. Create custom ca cert and web server cert and key for custom certificates.

2. Set up multi-masters OpenShift cluster with specifying openshift_master_overwrite_certificates like below.
```
openshift_master_named_certificates=[{"certfile": "/root/www.crt", "keyfile": "/root/www.key", "cafile": "/root/cacert.pem", "names": ["lb.example.com"]}]
openshift_master_overwrite_named_certificates=true
```

3. At this point, OpenShift master API and web console are secured with specified certificate and custom CA.

4. Add new node with scaleup playbook. I added new_master and new_nodes in ansible host file like below. "openshift_master_named_certificates" and "openshift_master_overwrite_named_certificates" remain the same as above.

```
[OSEv3:children]
masters
nodes
etcd
lb
new_masters
new_nodes

[new_nodes]
new-master.example.com penshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=false

[new_masters]
new-master.example.com
```

Actual results:
Playbook failed by below error.
```
TASK [openshift_master : Wait for API to become available] *********************
FAILED - RETRYING: TASK: openshift_master : Wait for API to become available (120 retries left).
FAILED - RETRYING: TASK: openshift_master : Wait for API to become available (119 retries left).
** snipping **
fatal: [new-master.example.com]: FAILED! => {
    "attempts": 120, 
    "changed": false, 
    "cmd": [
        "curl", 
        "--silent", 
        "--tlsv1.2", 
        "--cacert", 
        "/etc/origin/master/ca-bundle.crt", 
        "https://lb.example.com:8443/healthz/ready"
    ], 
    "delta": "0:00:00.149118", 
    "end": "2017-02-24 02:55:38.146070", 
    "failed": true, 
    "rc": 60, 
    "start": "2017-02-24 02:55:37.996952", 
    "warnings": []
}
```

On the new master:
```
# curl --tlsv1.2 --cacert /etc/origin/master/ca-bundle.crt https://lb.example.com:8443/healthz/readycat /etc/origin/master/ca-bundle.crt
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
curl: (3) <url> malformed
```

Expected results:
Complete the playbook and new master is successfully added.

Additional info:
The customer is facing this issue and can't add new master at present. I attached the case info at the next private comment.

Comment 2 Takayoshi Tanaka 2017-02-25 08:56:41 UTC
I found the error occurs this task.

/usr/share/ansible/openshift-ansible/playbooks/byo/roles/openshift_master/tasks/main.yml
```
# A separate wait is required here for native HA since notifies will
# be resolved after all tasks in the role.
- name: Wait for API to become available
  # Using curl here since the uri module requires python-httplib2 and
  # wait_for port doesn't provide health information.
  command: >
    curl --silent
    {% if openshift.common.version_gte_3_2_or_1_2 | bool %}
    --cacert {{ openshift.common.config_base }}/master/ca-bundle.crt
    {% else %}
    --cacert {{ openshift.common.config_base }}/master/ca.crt
    {% endif %}
    {{ openshift.master.api_url }}/healthz/ready
  register: api_available_output
  until: api_available_output.stdout == 'ok'
  retries: 120
  delay: 1
  run_once: true
  changed_when: false
  when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and master_api_service_status_changed | bool
```

It appears only this task doesn't consider the custom CA cert and this task is actually nothing to do (only check the status). So I skipped this task.

```
  #when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and master_api_service_status_changed | bool
  when: false
```

After edited the main.yml and run scaleup playbook again, the playbook runs successfully. (tatanaka-ose3-cert-master4.usersys.redhat.com is new master)

Do you think, is this a workaround for the customer?

Comment 5 Gan Huang 2017-03-03 04:22:30 UTC
1) Test against 3.5
Cannot reproduce this issue with openshift-ansible-3.5.18-1.git.0.01f8d4a.el7.noarch (patches not applied), openshift-ansible-3.5.20-1.git.0.5a5fcd5.el7.noarch (patches applied).

2) Test against 3.4
Test with openshift-ansible-3.4.63-1 (not sure what actual version was used from customer).

Task failed at "verify api server"

fatal: [openshift-125.lab.sjc.redhat.com]: FAILED! => {
    "attempts": 120, 
    "changed": false, 
    "cmd": [
        "curl", 
        "--silent", 
        "--tlsv1.2", 
        "--cacert", 
        "/etc/origin/master/ca-bundle.crt", 
        "https://openshift-105.lab.sjc.redhat.com/healthz/ready"
    ], 
    "delta": "0:00:00.156160", 
    "end": "2017-03-03 03:10:14.481095", 
    "failed": true, 
    "rc": 60, 
    "start": "2017-03-03 03:10:14.324935", 
    "warnings": []
}

Also hit the issue after trying to apply the fix manually.
--- a/filter_plugins/openshift_master.py
+++ b/filter_plugins/openshift_master.py
@@ -526,6 +526,7 @@ class FilterModule(object):
             raise errors.AnsibleFilterError("|failed expects hostvars is a dict")
         certs = ['ca.crt',
                  'ca.key',
+                 'ca-bundle.crt',
                  'admin.crt',
                  'admin.key',
                  'admin.kubeconfig',


Move to verified as the fix is targeted to 3.5

Comment 7 errata-xmlrpc 2017-04-12 19:02:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903