Description of problem: Master controller service failed to start during installation Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Scope libcontainer-40619-systemd-test-default-dependencies.scope has no PIDs. Refusing. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Scope libcontainer-40619-systemd-test-default-dependencies.scope has no PIDs. Refusing. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Created slice libcontainer_40619_systemd_test_default.slice. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Starting libcontainer_40619_systemd_test_default.slice. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Removed slice libcontainer_40619_systemd_test_default.slice. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Stopping libcontainer_40619_systemd_test_default.slice. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal atomic-openshift-master-controllers[40619]: container "atomic-openshift-master-controllers" does not exist Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: atomic-openshift-master-controllers.service: control process exited, code=exited status=1 Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state. Oct 04 20:18:31 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: atomic-openshift-master-controllers.service failed. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: atomic-openshift-master-controllers.service holdoff time over, scheduling restart. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Starting atomic-openshift-master-controllers.service... Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Started atomic-openshift-master-controllers.service. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Scope libcontainer-40627-systemd-test-default-dependencies.scope has no PIDs. Refusing. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Scope libcontainer-40627-systemd-test-default-dependencies.scope has no PIDs. Refusing. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Created slice libcontainer_40627_systemd_test_default.slice. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Starting libcontainer_40627_systemd_test_default.slice. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Removed slice libcontainer_40627_systemd_test_default.slice. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Stopping libcontainer_40627_systemd_test_default.slice. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Started libcontainer container atomic-openshift-master-controllers. Oct 04 20:18:36 ip-172-31-17-67.us-west-2.compute.internal systemd[1]: Starting libcontainer container atomic-openshift-master-controllers. Version-Release number of the following components: openshift-ansible version 84f27a8d66b8638c32e9dca5eec05df684d20773 rpm -q ansible ansible-2.4.0.0-3.el7.noarch ansible --version ansible 2.4.0.0 config file = /root/openshift-ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] How reproducible: Always Steps to Reproduce: 1. Install openshift on atomic host by running byo playbook Actual results: TASK [openshift_master : Wait for master controller service to start on first master] **************************************** task path: /root/openshift-ansible/roles/openshift_master/tasks/main.yml:353 Pausing for 15 seconds (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) Expected results: playbook should complete Additional info: attached inventory and logs
See https://bugzilla.redhat.com/show_bug.cgi?id=1498934
We don't manage the instances and the instances need to be labeled prior to running the installer.
Rob, Just to clarify, if the instance is not tagged with the appropriate label there's no configuration file for either master or node that would label the instance correctly? The stance we took is that we'd rely on the master looking up the correct metadata as that's the most important bit and adding this to a configuration file on the instance is just another config file value that needs to be kept in sync. -- Scott
Correct. The master will not label nodes under any circumstances. It also doesn't store that label information anywhere. On startup the master looks on its instance for one of 2 specific labels, and if it finds one then it uses the value of that label. If it doesn't find the label (there's an old and new method) then the controllers process will exit or print a warning in its log file depending on command line options.
Ok, I think ansible can gather the tags from the metadata API. I think what we'll do is query that API for all node and master hosts. If a node or master host has the AWS cloud provider configured and they don't have a tag named "kubernetes.io/cluster/xxxx" we'll block the upgrade and install on 3.7 with a message that links to documentation. We need to get that documentation ready that explains both how to properly label new installations and how to retroactively label existing installations. We'll do that in https://bugzilla.redhat.com/show_bug.cgi?id=1498643 *** This bug has been marked as a duplicate of bug 1491399 ***