Bug 1544175 - Node service couldn't start when running as system container on AH-7.4.5
Summary: Node service couldn't start when running as system container on AH-7.4.5
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.0
Assignee: Giuseppe Scrivano
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-11 02:59 UTC by Gaoyun Pei
Modified: 2018-06-18 18:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-18 17:38:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gaoyun Pei 2018-02-11 02:59:32 UTC
Description of problem:
Enable openshift use system container during ocp-3.9 setup on AH-7.4.5, but node service failed to get started.

RUNNING HANDLER [openshift_node : restart node] ********************************
Sunday 11 February 2018  01:43:55 +0000 (0:00:00.043)       0:11:48.425 ******* 
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (1 retries left).
fatal: [qe-gpei-trymaster-etcd-1.0211-x0e.qe.rhcloud.com]: FAILED! => {"attempts": 3, "changed": false, "msg": "Unable to restart service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n"}



Version-Release number of the following components:
openshift-ansible-3.9.0-0.42.0.git.0.1a9a61b.el7.noarch.rpm
ansible 2.4.2.0-2.el7

[root@qe-gpei-trymaster-etcd-1 ~]# atomic host status
State: idle
Deployments:
● rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard
                   Version: 7.4.5 (2018-02-06 20:55:50)
                    Commit: 04dc7e9e44994fe9d6c512128d417dc823dd28a8dac166af0990875a1c8a5e22
              GPGSignature: Valid signature by 567E347AD0044ADE55BA8A5F199E2F91FD431D51

[root@qe-gpei-trymaster-etcd-1 ~]# atomic images list |grep node
>  registry.reg-aws.openshift.com:443/openshift3/node   v3.9.0   24b61772d34c   2018-02-11 01:42   370.3 MB       ostree


How reproducible:
Always

Steps to Reproduce:
1.Set the following options to enable system container
openshift_use_system_containers=true
system_images_registry=x.x.x.x:443


Actual results:
[root@qe-gpei-trymaster-etcd-1 ~]# journalctl  -f -u  atomic-openshift-node.service
-- Logs begin at Sun 2018-02-11 01:28:47 UTC. --
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: I0211 01:49:33.861400   24714 iptables.go:101] Syncing openshift iptables rules
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: I0211 01:49:33.861433   24714 iptables.go:419] running iptables -N [OPENSHIFT-FIREWALL-FORWARD -t filter]
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: I0211 01:49:33.862102   24714 iptables.go:99] syncIPTableRules took 699.298µs
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24704]: F0211 01:49:33.862122   24714 network.go:46] SDN node startup failed: failed to set up iptables: failed to ensure chain OPENSHIFT-FIREWALL-FORWARD exists: error creating chain "OPENSHIFT-FIREWALL-FORWARD": exit status 127: iptables: error while loading shared libraries: libip4tc.so.0: cannot open shared object file: Permission denied
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 atomic-openshift-node[24741]: container "atomic-openshift-node" does not exist
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: atomic-openshift-node.service: control process exited, code=exited status=1
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: Failed to start atomic-openshift-node.service.
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Feb 11 01:49:33 qe-gpei-trymaster-etcd-1 systemd[1]: atomic-openshift-node.service failed.

After set SELinux to Permissive, node service could be started.
[root@qe-gpei-trymaster-etcd-1 ~]# getenforce
Enforcing
[root@qe-gpei-trymaster-etcd-1 ~]# setenforce 0
[root@qe-gpei-trymaster-etcd-1 ~]# systemctl status atomic-openshift-node.service
● atomic-openshift-node.service
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2018-02-11 02:45:01 UTC; 54s ago


Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 5 Giuseppe Scrivano 2018-02-13 08:03:39 UTC
I am already on it, it is not a so obvious issue so I am still investigating what is the reason.

Comment 6 Giuseppe Scrivano 2018-02-13 10:07:20 UTC
Proposed fix:

https://github.com/projectatomic/atomic/pull/1185

Comment 7 Scott Dodson 2018-02-20 02:30:26 UTC
Waiting on new atomic build? Do we know when we'll see that?

Comment 8 Giuseppe Scrivano 2018-02-20 08:07:13 UTC
The change got merged and it is fixed upstream.  I will check around when we can get this in a build

Comment 9 Scott Dodson 2018-02-23 21:06:37 UTC
Should be fixed in atomic-1.22.1

Comment 10 Gaoyun Pei 2018-02-26 06:06:40 UTC
Thanks! Tried with latest AH-745[1] image which still has atomic-1.21.1-1.git1170769.el7.x86_64 on it, the same error as before. Wait for new AH-745 build including atomic-1.22.1 to verify this bug.


[1]# atomic host status
State: idle
Deployments:
● rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard
                   Version: 7.4.5 (2018-02-22 18:40:44)
                    Commit: e5bc41cb8a4c990382efc992e7dc96a609635edbad178e5a04589491eed97fee
              GPGSignature: Valid signature by 567E347AD0044ADE55BA8A5F199E2F91FD431D51

Comment 11 Gaoyun Pei 2018-03-05 06:37:14 UTC
Verify this bug with the following version AH-745 image.

[root@qe-gpei-testbug2master-etcd-1 ~]# atomic host status
State: idle
Deployments:
● rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard
                   Version: 7.4.5 (2018-03-01 19:18:33)
                    Commit: 6cb4d618030f69aa4a5732aa0795cb7fe2c167725273cffa11d0357d80e5eef0
              GPGSignature: Valid signature by 567E347AD0044ADE55BA8A5F199E2F91FD431D51

[root@qe-gpei-testbug2master-etcd-1 ~]# rpm -q atomic
atomic-1.22.1-1.gitd36c015.el7.x86_64

[root@qe-gpei-testbug2master-etcd-1 ~]# rpm -qa |grep selinux
container-selinux-2.42-1.gitad8f0f7.el7.noarch
libselinux-utils-2.5-11.el7.x86_64
libselinux-2.5-11.el7.x86_64
selinux-policy-3.13.1-166.el7_4.9.noarch
selinux-policy-targeted-3.13.1-166.el7_4.9.noarch
libselinux-python-2.5-11.el7.x86_64


Enable openshift and docker use system container, the installation is successful.
[root@qe-gpei-testbug2master-etcd-1 ~]# runc list
ID                                    PID         STATUS      BUNDLE                                                             CREATED                          OWNER
atomic-openshift-master-api           21247       running     /var/lib/containers/atomic/atomic-openshift-master-api.0           2018-03-05T03:33:27.222317307Z   root
atomic-openshift-master-controllers   21253       running     /var/lib/containers/atomic/atomic-openshift-master-controllers.0   2018-03-05T03:33:27.23030748Z    root
atomic-openshift-node                 21750       running     /var/lib/containers/atomic/atomic-openshift-node.0                 2018-03-05T03:34:23.899001298Z   root
container-engine                      20155       running     /var/lib/containers/atomic/container-engine.0                      2018-03-05T03:32:50.913752138Z   root
etcd                                  21139       running     /var/lib/containers/atomic/etcd.0                                  2018-03-05T03:33:22.249690673Z   root
openvswitch                           21558       running     /var/lib/containers/atomic/openvswitch.0                           2018-03-05T03:34:17.080858625Z   root


Note You need to log in before you can comment on or make changes to this bug.