Bug 1381335
| Summary: | Scale up playbook does not rerun master-facts. | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> |
| Component: | Installer | Assignee: | Andrew Butcher <abutcher> |
| Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.3.0 | CC: | abutcher, aos-bugs, jiajliu, jokerman, mmccomas, rhowe |
| Target Milestone: | --- | ||
| Target Release: | 3.3.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, the node scale up playbook did not re-generate master facts before adding new nodes which meant that any master configuration changes made to the advanced installation hosts file were not used when configuring the additional nodes. Now master facts are regenerated ensuring configuration changes are applied when adding additional nodes.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-10-27 16:13:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ryan Howe
2016-10-03 18:12:50 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/c7d9c63088f58a3aa338981083a9fb21a8c5c7f5 Merge pull request #2555 from abutcher/node-scaleup-facts Bug 1381335 - Scale up playbook does not rerun master-facts. still blocked by bug 1382887 @Ryan Could u help to checked my reproduced steps?I am not sure whether my reproduce step about "Change master_url" is right because original cluster can not work correctly for changed lb's hostname. I still have a confuse about your description in step 2-"Change master_url" and how to "Change master_url" in your env? @Andrew, Ryan As last comment mentioned, i am not sure my reproduce steps, so i just attached my verified steps on latest 3.3 puddle in the comment. Could u help to check about my verification? Version: atomic-openshift-utils-3.3.38-1.git.0.2637ed5.el7.noarch openshift-ansible-3.3.38-1.git.0.2637ed5.el7.noarch Steps: 1.Install OCP in HA env cat /etc/ansible/facts.d/openshift.fact in one master host "cluster_hostname": "openshift-139.lab.eng.nay.redhat.com" 2.Change lb hostname to openshift-149.lab.eng.nay.redhat.com in my env. 3.Edit original hosts: 1)change cluster_hostname: openshift_master_cluster_hostname=openshift-149.lab.eng.nay.redhat.com 2)add new node [OSEv3:children] nodes nfs masters lb etcd new_nodes ... [new_nodes] openshift-180.lab.eng.nay.redhat.com openshift_public_ip=10.66.147.180 openshift_ip=192.168.2.4 openshift_public_hostname=10.66.147.180 openshift_hostname=192.168.2.4 4. run scaleup playbook with new hosts file. # ansible-playbook -i .config/openshift/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-node/scaleup.yml Result: It still failed with a new error about certificates. Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-node.service holdoff time over, scheduling restart. Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com systemd[1]: Starting Atomic OpenShift Node... -- Subject: Unit atomic-openshift-node.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit atomic-openshift-node.service has begun starting up. Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com atomic-openshift-node[26301]: F1021 06:05:38.365403 26301 start_node.go:126] cannot fetch "default" cluster network: Get https://openshift-149.lab.eng.nay.redhat.com:8443/oapi/v1/clusternetworks/default: x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, openshift, openshift-139.lab.eng.nay.redhat.com, openshift.default, openshift.default.svc, openshift.default.svc.cluster.local, 10.66.147.128, 172.30.0.1, 192.168.2.183, not openshift-149.lab.eng.nay.redhat.com Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com systemd[1]: Failed to start Atomic OpenShift Node. -- Subject: Unit atomic-openshift-node.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit atomic-openshift-node.service has failed. -- -- The result is failed. Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com systemd[1]: Unit atomic-openshift-node.service entered failed state. Oct 21 06:05:38 openshift-180.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-node.service failed. I checked that the old cached file in first master has been updated to new hostname-openshift-149.lab.eng.nay.redhat.com. <--snip--> "cluster_hostname": "openshift-149.lab.eng.nay.redhat.com" <--snip--> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:2122 |