Bug 1495105 - [trello:yzMWezC1] Node service could be started when set net.ipv4.ip_forward = 0
Summary: [trello:yzMWezC1] Node service could be started when set net.ipv4.ip_forward = 0
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: Phil Cameron
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-25 08:29 UTC by Yan Du
Modified: 2017-11-28 22:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: No doc change needed Consequence: Fix: Result:
Clone Of:
Environment:
Last Closed: 2017-11-28 22:12:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
node log (118.60 KB, text/x-vhdl)
2017-09-25 08:29 UTC, Yan Du
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 16939 0 None None None 2017-10-18 21:12:29 UTC
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Yan Du 2017-09-25 08:29:12 UTC
Description of problem:
Node service could be started when set net.ipv4.ip_forward = 0

Version-Release number of selected component (if applicable):
oc v3.7.0-0.126.4
kubernetes v1.7.0+80709908fd

How reproducible:
Always

Steps to Reproduce:
1. Set net.ipv4.ip_forward flag to 0
# sysctl -w net.ipv4.ip_forward=0
net.ipv4.ip_forward = 0

2. Restart node service
# ovs-ofctl mod-flows br0 "table=253, actions=note:01.ff" -O openflow13
# systemctl restart atomic-openshift-node
# systemctl status atomic-openshift-node

3. check the flag again
4. Check the node log



Actual results:
step2: Could get warning for failing to start node service but actually the node could start normally

# systemctl restart atomic-openshift-node
Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.

# systemctl status atomic-openshift-node
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Wed 2017-09-13 04:53:03 EDT; 1min 41s ago
     Docs: https://github.com/openshift/origin
  Process: 10791 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
  Process: 10789 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
  Process: 10797 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
  Process: 10795 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
 Main PID: 10800 (openshift)
   Memory: 56.5M
   CGroup: /system.slice/atomic-openshift-node.service
           ├─10800 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
           └─10971 journalctl -k -f
 
Sep 13 04:54:40 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:40.023059   10800 helpers.go:125] Unable to get udp stats from pid 12051:...rectory
Sep 13 04:54:40 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:40.023066   10800 helpers.go:132] Unable to get udp6 stats from pid 12051...rectory
Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.381690   10800 prober.go:106] Readiness probe for "nodejs-mongodb-example-1-j...
Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.385667   10800 prober.go:106] Readiness probe for "registry-console-1-...to host
Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.385719   10800 prober.go:106] Readiness probe for "docker-registry-1-l...to host
Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.886235   10800 proxy.go:199] hybrid proxy: (always) update ep kube-sch...g proxy
Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.886270   10800 proxy.go:221] hybrid proxy: update ep kube-scheduler in...n proxy
Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:43.891721   10800 proxy.go:199] hybrid proxy: (always) update ep kube-sch...g proxy
Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:43.891746   10800 proxy.go:221] hybrid proxy: update ep kube-scheduler in...n proxy
Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: W0913 04:54:43.974482   10800 helpers.go:771] eviction manager: no observation found ...ailable
Hint: Some lines were ellipsized, use -l to show in full.


step3:
# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
 

Expected results:
node service could not start when set net.ipv4.ip_forward to 0


Additional info:
Node service start failed at first, we could get the error from cli warning and node log:
12288 node.go:310] error: SDN node startup failed: net/ipv4/ip_forward=0, it must be set to 1

But after that, it still could start successfully finally, and check the flag, et.ipv4.ip_forward is 0 after node started


Full node log have attached.

Comment 1 Yan Du 2017-09-25 08:29:58 UTC
Created attachment 1330442 [details]
node log

Comment 2 Weibin Liang 2017-10-05 19:43:44 UTC
Easily reproduced in oc v3.7.0-0.143.1:

[root@qe-weliang-3-node-registry-router-1 ~]# systemctl status atomic-openshift-node
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Thu 2017-10-05 14:11:17 EDT; 1h 25min ago
     Docs: https://github.com/openshift/origin
 Main PID: 17952 (openshift)
   Memory: 156.0M
   CGroup: /system.slice/atomic-openshift-node.service
           ├─17952 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
           └─18049 journalctl -k -f

Oct 05 15:35:17 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: E1005 15:35:17.482335   17952 metrics.go:172] failed to parse max ARP entries "51... syntax
Oct 05 15:35:27 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:27.269045   17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:35:37 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:37.286035   17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:35:47 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:47.302938   17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:35:57 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:57.319182   17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:36:07 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:07.335624   17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:36:16 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: I1005 15:36:16.781789   17952 container_manager_linux.go:415] [ContainerManager]:...service
Oct 05 15:36:16 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: I1005 15:36:16.878004   17952 qos_container_manager_linux.go:286] [ContainerManag...uration
Oct 05 15:36:17 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:17.365614   17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:36:27 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:27.382848   17952 helpers.go:771] eviction manager: no observation fo...ailable
Hint: Some lines were ellipsized, use -l to show in full.
[root@qe-weliang-3-node-registry-router-1 ~]# sysctl -w net.ipv4.ip_forward=0
net.ipv4.ip_forward = 0
[root@qe-weliang-3-node-registry-router-1 ~]# systemctl restart atomic-openshift-node
[root@qe-weliang-3-node-registry-router-1 ~]# systemctl status atomic-openshift-node
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Thu 2017-10-05 15:37:19 EDT; 7s ago
     Docs: https://github.com/openshift/origin
  Process: 19328 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
  Process: 19326 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
  Process: 19334 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
  Process: 19332 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
 Main PID: 19337 (openshift)
   Memory: 48.5M
   CGroup: /system.slice/atomic-openshift-node.service
           ├─19337 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
           └─19425 journalctl -k -f

Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.433076   19337 proxy.go:221] hybrid proxy: update ep mongodb in main proxy
Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.628573   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.828387   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.028211   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.028543   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.228339   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.428610   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.628186   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.628216   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.828470   19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Hint: Some lines were ellipsized, use -l to show in full.
[root@qe-weliang-3-node-registry-router-1 ~]# cat /proc/sys/net/ipv4/ip_forward
0
[root@qe-weliang-3-node-registry-router-1 ~]#

Comment 3 Phil Cameron 2017-10-18 21:13:07 UTC
PR 16939
https://github.com/openshift/origin/pull/16939

Comment 4 openshift-github-bot 2017-10-20 21:36:06 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/5b5776bd19b13907903d2fc4088019d8536778e6
Node service could be started when net.ipv4.ip_forward=0

The test for net.ipv4.ip_forward=0 was only run the first time
SetupSDN was run. This change runs the test every time SetupSDN
is run

Bug: 1495105
https://bugzilla.redhat.com/show_bug.cgi?id=1495105

https://github.com/openshift/origin/commit/d870a9e7cc39e74c84a60ef84553ef7f052e3f42
Merge pull request #16939 from pecameron/bz1495105

Automatic merge from submit-queue (batch tested with PRs 16912, 16931, 16939, 16967, 16978).

Node service could be started when net.ipv4.ip_forward=0

The test for net.ipv4.ip_forward=0 was only run the first time
SetupSDN was run. This change runs the test every time SetupSDN
is run

Bug: 1495105
https://bugzilla.redhat.com/show_bug.cgi?id=1495105

Comment 5 Yan Du 2017-10-23 09:19:29 UTC
Test on latest OCP env
openshift v3.7.0-0.174.0
kubernetes v1.7.6+a08f5eeb62

Fail to start node when setting ip_forward=0:

Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 atomic-openshift-node[52942]: F1023 05:18:08.953905   52942 network.go:45] SDN node startup failed: node SDN setup failed: net/ipv4/ip_forward=0, it must be set to 1
Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: Failed to start OpenShift Node.
Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: atomic-openshift-node.service failed.

Comment 7 Yan Du 2017-10-27 06:54:15 UTC
Verify the bug according #Comments 5

Comment 11 errata-xmlrpc 2017-11-28 22:12:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.