Description of problem: Enable openshift use system container during ocp-3.9 setup on AH-7.4.5, but node service failed to get started. RUNNING HANDLER [openshift_node : restart node] ******************************** Sunday 11 February 2018 01:43:55 +0000 (0:00:00.043) 0:11:48.425 ******* FAILED - RETRYING: restart node (3 retries left). FAILED - RETRYING: restart node (2 retries left). FAILED - RETRYING: restart node (1 retries left). fatal: [qe-gpei-trymaster-etcd-1.0211-x0e.qe.rhcloud.com]: FAILED! => {"attempts": 3, "changed": false, "msg": "Unable to restart service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n"} Version-Release number of the following components: openshift-ansible-3.9.0-0.42.0.git.0.1a9a61b.el7.noarch.rpm ansible 2.4.2.0-2.el7 [root@qe-gpei-trymaster-etcd-1 ~]# atomic host status State: idle Deployments: ● rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard Version: 7.4.5 (2018-02-06 20:55:50) Commit: 04dc7e9e44994fe9d6c512128d417dc823dd28a8dac166af0990875a1c8a5e22 GPGSignature: Valid signature by 567E347AD0044ADE55BA8A5F199E2F91FD431D51 [root@qe-gpei-trymaster-etcd-1 ~]# atomic images list |grep node > registry.reg-aws.openshift.com:443/openshift3/node v3.9.0 24b61772d34c 2018-02-11 01:42 370.3 MB ostree How reproducible: Always Steps to Reproduce: 1.Set the following options to enable system container openshift_use_system_containers=true system_images_registry=x.x.x.x:443 Actual results: [root@qe-gpei-trymaster-etcd-1 ~]# journalctl -f -u atomic-openshift-node.service -- Logs begin at Sun 2018-02-11 01:28:47 UTC. -- Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: I0211 01:49:33.861400 24714 iptables.go:101] Syncing openshift iptables rules Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: I0211 01:49:33.861433 24714 iptables.go:419] running iptables -N [OPENSHIFT-FIREWALL-FORWARD -t filter] Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: I0211 01:49:33.862102 24714 iptables.go:99] syncIPTableRules took 699.298µs Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: F0211 01:49:33.862122 24714 network.go:46] SDN node startup failed: failed to set up iptables: failed to ensure chain OPENSHIFT-FIREWALL-FORWARD exists: error creating chain "OPENSHIFT-FIREWALL-FORWARD": exit status 127: iptables: error while loading shared libraries: libip4tc.so.0: cannot open shared object file: Permission denied Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24741]: container "atomic-openshift-node" does not exist Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: atomic-openshift-node.service: control process exited, code=exited status=1 Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: Failed to start atomic-openshift-node.service. Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: atomic-openshift-node.service failed. After set SELinux to Permissive, node service could be started. [root@qe-gpei-trymaster-etcd-1 ~]# getenforce Enforcing [root@qe-gpei-trymaster-etcd-1 ~]# setenforce 0 [root@qe-gpei-trymaster-etcd-1 ~]# systemctl status atomic-openshift-node.service ● atomic-openshift-node.service Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Active: active (running) since Sun 2018-02-11 02:45:01 UTC; 54s ago Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
I am already on it, it is not a so obvious issue so I am still investigating what is the reason.
Proposed fix: https://github.com/projectatomic/atomic/pull/1185
Waiting on new atomic build? Do we know when we'll see that?
The change got merged and it is fixed upstream. I will check around when we can get this in a build
Should be fixed in atomic-1.22.1
Thanks! Tried with latest AH-745[1] image which still has atomic-1.21.1-1.git1170769.el7.x86_64 on it, the same error as before. Wait for new AH-745 build including atomic-1.22.1 to verify this bug. [1]# atomic host status State: idle Deployments: ● rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard Version: 7.4.5 (2018-02-22 18:40:44) Commit: e5bc41cb8a4c990382efc992e7dc96a609635edbad178e5a04589491eed97fee GPGSignature: Valid signature by 567E347AD0044ADE55BA8A5F199E2F91FD431D51
Verify this bug with the following version AH-745 image. [root@qe-gpei-testbug2master-etcd-1 ~]# atomic host status State: idle Deployments: ● rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard Version: 7.4.5 (2018-03-01 19:18:33) Commit: 6cb4d618030f69aa4a5732aa0795cb7fe2c167725273cffa11d0357d80e5eef0 GPGSignature: Valid signature by 567E347AD0044ADE55BA8A5F199E2F91FD431D51 [root@qe-gpei-testbug2master-etcd-1 ~]# rpm -q atomic atomic-1.22.1-1.gitd36c015.el7.x86_64 [root@qe-gpei-testbug2master-etcd-1 ~]# rpm -qa |grep selinux container-selinux-2.42-1.gitad8f0f7.el7.noarch libselinux-utils-2.5-11.el7.x86_64 libselinux-2.5-11.el7.x86_64 selinux-policy-3.13.1-166.el7_4.9.noarch selinux-policy-targeted-3.13.1-166.el7_4.9.noarch libselinux-python-2.5-11.el7.x86_64 Enable openshift and docker use system container, the installation is successful. [root@qe-gpei-testbug2master-etcd-1 ~]# runc list ID PID STATUS BUNDLE CREATED OWNER atomic-openshift-master-api 21247 running /var/lib/containers/atomic/atomic-openshift-master-api.0 2018-03-05T03:33:27.222317307Z root atomic-openshift-master-controllers 21253 running /var/lib/containers/atomic/atomic-openshift-master-controllers.0 2018-03-05T03:33:27.23030748Z root atomic-openshift-node 21750 running /var/lib/containers/atomic/atomic-openshift-node.0 2018-03-05T03:34:23.899001298Z root container-engine 20155 running /var/lib/containers/atomic/container-engine.0 2018-03-05T03:32:50.913752138Z root etcd 21139 running /var/lib/containers/atomic/etcd.0 2018-03-05T03:33:22.249690673Z root openvswitch 21558 running /var/lib/containers/atomic/openvswitch.0 2018-03-05T03:34:17.080858625Z root