Created attachment 1419756 [details] install_log Description of problem: Install failed due to "Node start failed" Version-Release number of the following components: openshift-ansible-3.10.0-0.16.0.git.0.8925606.el7.noarch.rpm How reproducible: Always Steps to Reproduce: 1. install OCP 3.10 $ ansible-playbook playbooks/deploy_cluster.yml Actual results: TASK [openshift_node : debug] ************************************************** Tuesday 10 April 2018 02:58:17 -0400 (0:00:00.461) 0:15:38.320 ********* skipping: [shared-wmeng3107nsc-master-etcd-1.0410-2li.qe.rhcloud.com] => {"skip_reason": "Conditional result was False"} ok: [shared-wmeng3107nsc-nrr-1.0410-2li.qe.rhcloud.com] => { "msg": [ "-- Logs begin at Tue 2018-04-10 02:37:58 EDT, end at Tue 2018-04-10 02:59:01 EDT. --", "Apr 10 02:54:02 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18333]: I0410 02:54:02.204507 18333 bootstrap.go:58] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file", "Apr 10 02:59:01 shared-wmeng3107nsc-nrr-2 systemd[1]: atomic-openshift-node.service start operation timed out. Terminating.", "Apr 10 02:59:01 shared-wmeng3107nsc-nrr-2 systemd[1]: Failed to start OpenShift Node.", "Apr 10 02:59:01 shared-wmeng3107nsc-nrr-2 systemd[1]: Unit atomic-openshift-node.service entered failed state.", "Apr 10 02:59:01 shared-wmeng3107nsc-nrr-2 systemd[1]: atomic-openshift-node.service failed." ] } TASK [openshift_node : fail] *************************************************** Tuesday 10 April 2018 02:58:17 -0400 (0:00:00.078) 0:15:38.398 ********* skipping: [shared-wmeng3107nsc-master-etcd-1.0410-2li.qe.rhcloud.com] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true} fatal: [shared-wmeng3107nsc-nrr-1.0410-2li.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "msg": "Node start failed."} fatal: [shared-wmeng3107nsc-nrr-2.0410-2li.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "msg": "Node start failed."} more log will be attached. Expected results: Install succeeds Additional info: There is failed start in log. When I check on host, service is running. [root@shared-wmeng3107nsc-nrr-2 ~]# systemctl status atomic-openshift-node.service ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/atomic-openshift-node.service.d └─override.conf Active: active (running) since 二 2018-04-10 02:59:07 EDT; 26min ago Docs: https://github.com/openshift/origin Main PID: 18377 (hyperkube) Memory: 50.8M CGroup: /system.slice/atomic-openshift-node.service └─18377 /usr/bin/hyperkube kubelet --v=5 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorizat... 4月 10 03:25:23 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:23.350262 18377 config.go:99] Looking for [api file], have seen map[file:{} api:{}] 4月 10 03:25:23 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:23.350326 18377 kubelet.go:1924] SyncLoop (housekeeping) 4月 10 03:25:24 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:24.284880 18377 generic.go:183] GenericPLEG: Relisting 4月 10 03:25:25 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:25.291076 18377 generic.go:183] GenericPLEG: Relisting 4月 10 03:25:25 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:25.350239 18377 config.go:99] Looking for [api file], have seen map[file:{} api:{}] 4月 10 03:25:25 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:25.350296 18377 kubelet.go:1924] SyncLoop (housekeeping) 4月 10 03:25:25 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: W0410 03:25:25.394184 18377 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d 4月 10 03:25:25 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:25.394313 18377 kubelet.go:2103] Container runtime status: Runtime Conditions: RuntimeReady=true reason: messa...initialized 4月 10 03:25:25 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: E0410 03:25:25.394334 18377 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginN...initialized 4月 10 03:25:26 shared-wmeng3107nsc-nrr-2 atomic-openshift-node[18377]: I0410 03:25:26.297177 18377 generic.go:183] GenericPLEG: Relisting Hint: Some lines were ellipsized, use -l to show in full.
This is due to systemd kill the node process for TimeoutStartSec=300 which defined in /etc/systemd/system/atomic-openshift-node.service
This should now be resolved on master, lets re-test.
Fixed. openshift-ansible-3.10.0-0.22.0.git.0.b6ec617.el7.noarch.rpm
I am seeing this in 3.11. Can I get more information around what is causing this?