Description of problem: Node service could be started when set net.ipv4.ip_forward = 0 Version-Release number of selected component (if applicable): oc v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd How reproducible: Always Steps to Reproduce: 1. Set net.ipv4.ip_forward flag to 0 # sysctl -w net.ipv4.ip_forward=0 net.ipv4.ip_forward = 0 2. Restart node service # ovs-ofctl mod-flows br0 "table=253, actions=note:01.ff" -O openflow13 # systemctl restart atomic-openshift-node # systemctl status atomic-openshift-node 3. check the flag again 4. Check the node log Actual results: step2: Could get warning for failing to start node service but actually the node could start normally # systemctl restart atomic-openshift-node Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details. # systemctl status atomic-openshift-node ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: active (running) since Wed 2017-09-13 04:53:03 EDT; 1min 41s ago Docs: https://github.com/openshift/origin Process: 10791 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS) Process: 10789 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS) Process: 10797 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS) Process: 10795 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS) Main PID: 10800 (openshift) Memory: 56.5M CGroup: /system.slice/atomic-openshift-node.service ├─10800 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 └─10971 journalctl -k -f Sep 13 04:54:40 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:40.023059 10800 helpers.go:125] Unable to get udp stats from pid 12051:...rectory Sep 13 04:54:40 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:40.023066 10800 helpers.go:132] Unable to get udp6 stats from pid 12051...rectory Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.381690 10800 prober.go:106] Readiness probe for "nodejs-mongodb-example-1-j... Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.385667 10800 prober.go:106] Readiness probe for "registry-console-1-...to host Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.385719 10800 prober.go:106] Readiness probe for "docker-registry-1-l...to host Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.886235 10800 proxy.go:199] hybrid proxy: (always) update ep kube-sch...g proxy Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.886270 10800 proxy.go:221] hybrid proxy: update ep kube-scheduler in...n proxy Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:43.891721 10800 proxy.go:199] hybrid proxy: (always) update ep kube-sch...g proxy Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:43.891746 10800 proxy.go:221] hybrid proxy: update ep kube-scheduler in...n proxy Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: W0913 04:54:43.974482 10800 helpers.go:771] eviction manager: no observation found ...ailable Hint: Some lines were ellipsized, use -l to show in full. step3: # sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 0 Expected results: node service could not start when set net.ipv4.ip_forward to 0 Additional info: Node service start failed at first, we could get the error from cli warning and node log: 12288 node.go:310] error: SDN node startup failed: net/ipv4/ip_forward=0, it must be set to 1 But after that, it still could start successfully finally, and check the flag, et.ipv4.ip_forward is 0 after node started Full node log have attached.
Created attachment 1330442 [details] node log
Easily reproduced in oc v3.7.0-0.143.1: [root@qe-weliang-3-node-registry-router-1 ~]# systemctl status atomic-openshift-node ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: active (running) since Thu 2017-10-05 14:11:17 EDT; 1h 25min ago Docs: https://github.com/openshift/origin Main PID: 17952 (openshift) Memory: 156.0M CGroup: /system.slice/atomic-openshift-node.service ├─17952 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 └─18049 journalctl -k -f Oct 05 15:35:17 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: E1005 15:35:17.482335 17952 metrics.go:172] failed to parse max ARP entries "51... syntax Oct 05 15:35:27 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:27.269045 17952 helpers.go:771] eviction manager: no observation fo...ailable Oct 05 15:35:37 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:37.286035 17952 helpers.go:771] eviction manager: no observation fo...ailable Oct 05 15:35:47 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:47.302938 17952 helpers.go:771] eviction manager: no observation fo...ailable Oct 05 15:35:57 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:57.319182 17952 helpers.go:771] eviction manager: no observation fo...ailable Oct 05 15:36:07 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:07.335624 17952 helpers.go:771] eviction manager: no observation fo...ailable Oct 05 15:36:16 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: I1005 15:36:16.781789 17952 container_manager_linux.go:415] [ContainerManager]:...service Oct 05 15:36:16 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: I1005 15:36:16.878004 17952 qos_container_manager_linux.go:286] [ContainerManag...uration Oct 05 15:36:17 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:17.365614 17952 helpers.go:771] eviction manager: no observation fo...ailable Oct 05 15:36:27 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:27.382848 17952 helpers.go:771] eviction manager: no observation fo...ailable Hint: Some lines were ellipsized, use -l to show in full. [root@qe-weliang-3-node-registry-router-1 ~]# sysctl -w net.ipv4.ip_forward=0 net.ipv4.ip_forward = 0 [root@qe-weliang-3-node-registry-router-1 ~]# systemctl restart atomic-openshift-node [root@qe-weliang-3-node-registry-router-1 ~]# systemctl status atomic-openshift-node ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: active (running) since Thu 2017-10-05 15:37:19 EDT; 7s ago Docs: https://github.com/openshift/origin Process: 19328 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS) Process: 19326 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS) Process: 19334 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS) Process: 19332 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS) Main PID: 19337 (openshift) Memory: 48.5M CGroup: /system.slice/atomic-openshift-node.service ├─19337 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 └─19425 journalctl -k -f Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.433076 19337 proxy.go:221] hybrid proxy: update ep mongodb in main proxy Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.628573 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.828387 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.028211 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.028543 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.228339 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.428610 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.628186 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.628216 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.828470 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14") Hint: Some lines were ellipsized, use -l to show in full. [root@qe-weliang-3-node-registry-router-1 ~]# cat /proc/sys/net/ipv4/ip_forward 0 [root@qe-weliang-3-node-registry-router-1 ~]#
PR 16939 https://github.com/openshift/origin/pull/16939
Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/5b5776bd19b13907903d2fc4088019d8536778e6 Node service could be started when net.ipv4.ip_forward=0 The test for net.ipv4.ip_forward=0 was only run the first time SetupSDN was run. This change runs the test every time SetupSDN is run Bug: 1495105 https://bugzilla.redhat.com/show_bug.cgi?id=1495105 https://github.com/openshift/origin/commit/d870a9e7cc39e74c84a60ef84553ef7f052e3f42 Merge pull request #16939 from pecameron/bz1495105 Automatic merge from submit-queue (batch tested with PRs 16912, 16931, 16939, 16967, 16978). Node service could be started when net.ipv4.ip_forward=0 The test for net.ipv4.ip_forward=0 was only run the first time SetupSDN was run. This change runs the test every time SetupSDN is run Bug: 1495105 https://bugzilla.redhat.com/show_bug.cgi?id=1495105
Test on latest OCP env openshift v3.7.0-0.174.0 kubernetes v1.7.6+a08f5eeb62 Fail to start node when setting ip_forward=0: Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 atomic-openshift-node[52942]: F1023 05:18:08.953905 52942 network.go:45] SDN node startup failed: node SDN setup failed: net/ipv4/ip_forward=0, it must be set to 1 Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: Failed to start OpenShift Node. Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: atomic-openshift-node.service failed.
Verify the bug according #Comments 5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188