Bug 1495105
| Summary: | [trello:yzMWezC1] Node service could be started when set net.ipv4.ip_forward = 0 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yan Du <yadu> | ||||
| Component: | Networking | Assignee: | Phil Cameron <pcameron> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.7.0 | CC: | aos-bugs, bbennett, weliang | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.7.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: No doc change needed
Consequence:
Fix:
Result:
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-11-28 22:12:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1330442 [details]
node log
Easily reproduced in oc v3.7.0-0.143.1:
[root@qe-weliang-3-node-registry-router-1 ~]# systemctl status atomic-openshift-node
● atomic-openshift-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
└─openshift-sdn-ovs.conf
Active: active (running) since Thu 2017-10-05 14:11:17 EDT; 1h 25min ago
Docs: https://github.com/openshift/origin
Main PID: 17952 (openshift)
Memory: 156.0M
CGroup: /system.slice/atomic-openshift-node.service
├─17952 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
└─18049 journalctl -k -f
Oct 05 15:35:17 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: E1005 15:35:17.482335 17952 metrics.go:172] failed to parse max ARP entries "51... syntax
Oct 05 15:35:27 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:27.269045 17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:35:37 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:37.286035 17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:35:47 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:47.302938 17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:35:57 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:35:57.319182 17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:36:07 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:07.335624 17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:36:16 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: I1005 15:36:16.781789 17952 container_manager_linux.go:415] [ContainerManager]:...service
Oct 05 15:36:16 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: I1005 15:36:16.878004 17952 qos_container_manager_linux.go:286] [ContainerManag...uration
Oct 05 15:36:17 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:17.365614 17952 helpers.go:771] eviction manager: no observation fo...ailable
Oct 05 15:36:27 qe-weliang-3-node-registry-router-1 atomic-openshift-node[17952]: W1005 15:36:27.382848 17952 helpers.go:771] eviction manager: no observation fo...ailable
Hint: Some lines were ellipsized, use -l to show in full.
[root@qe-weliang-3-node-registry-router-1 ~]# sysctl -w net.ipv4.ip_forward=0
net.ipv4.ip_forward = 0
[root@qe-weliang-3-node-registry-router-1 ~]# systemctl restart atomic-openshift-node
[root@qe-weliang-3-node-registry-router-1 ~]# systemctl status atomic-openshift-node
● atomic-openshift-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
└─openshift-sdn-ovs.conf
Active: active (running) since Thu 2017-10-05 15:37:19 EDT; 7s ago
Docs: https://github.com/openshift/origin
Process: 19328 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS)
Process: 19326 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS)
Process: 19334 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS)
Process: 19332 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS)
Main PID: 19337 (openshift)
Memory: 48.5M
CGroup: /system.slice/atomic-openshift-node.service
├─19337 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
└─19425 journalctl -k -f
Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.433076 19337 proxy.go:221] hybrid proxy: update ep mongodb in main proxy
Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.628573 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:24 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:24.828387 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.028211 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.028543 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.228339 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.428610 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.628186 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.628216 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Oct 05 15:37:25 qe-weliang-3-node-registry-router-1 atomic-openshift-node[19337]: I1005 15:37:25.828470 19337 operation_generator.go:481] MountVolume.SetUp succe...c8e14")
Hint: Some lines were ellipsized, use -l to show in full.
[root@qe-weliang-3-node-registry-router-1 ~]# cat /proc/sys/net/ipv4/ip_forward
0
[root@qe-weliang-3-node-registry-router-1 ~]#
Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/5b5776bd19b13907903d2fc4088019d8536778e6 Node service could be started when net.ipv4.ip_forward=0 The test for net.ipv4.ip_forward=0 was only run the first time SetupSDN was run. This change runs the test every time SetupSDN is run Bug: 1495105 https://bugzilla.redhat.com/show_bug.cgi?id=1495105 https://github.com/openshift/origin/commit/d870a9e7cc39e74c84a60ef84553ef7f052e3f42 Merge pull request #16939 from pecameron/bz1495105 Automatic merge from submit-queue (batch tested with PRs 16912, 16931, 16939, 16967, 16978). Node service could be started when net.ipv4.ip_forward=0 The test for net.ipv4.ip_forward=0 was only run the first time SetupSDN was run. This change runs the test every time SetupSDN is run Bug: 1495105 https://bugzilla.redhat.com/show_bug.cgi?id=1495105 Test on latest OCP env openshift v3.7.0-0.174.0 kubernetes v1.7.6+a08f5eeb62 Fail to start node when setting ip_forward=0: Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 atomic-openshift-node[52942]: F1023 05:18:08.953905 52942 network.go:45] SDN node startup failed: node SDN setup failed: net/ipv4/ip_forward=0, it must be set to 1 Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: Failed to start OpenShift Node. Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Oct 23 05:18:08 qe-gpei-testnode-registry-router-1 systemd[1]: atomic-openshift-node.service failed. Verify the bug according #Comments 5 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |
Description of problem: Node service could be started when set net.ipv4.ip_forward = 0 Version-Release number of selected component (if applicable): oc v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd How reproducible: Always Steps to Reproduce: 1. Set net.ipv4.ip_forward flag to 0 # sysctl -w net.ipv4.ip_forward=0 net.ipv4.ip_forward = 0 2. Restart node service # ovs-ofctl mod-flows br0 "table=253, actions=note:01.ff" -O openflow13 # systemctl restart atomic-openshift-node # systemctl status atomic-openshift-node 3. check the flag again 4. Check the node log Actual results: step2: Could get warning for failing to start node service but actually the node could start normally # systemctl restart atomic-openshift-node Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details. # systemctl status atomic-openshift-node ● atomic-openshift-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d └─openshift-sdn-ovs.conf Active: active (running) since Wed 2017-09-13 04:53:03 EDT; 1min 41s ago Docs: https://github.com/openshift/origin Process: 10791 ExecStopPost=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string: (code=exited, status=0/SUCCESS) Process: 10789 ExecStopPost=/usr/bin/rm /etc/dnsmasq.d/node-dnsmasq.conf (code=exited, status=0/SUCCESS) Process: 10797 ExecStartPre=/usr/bin/dbus-send --system --dest=uk.org.thekelleys.dnsmasq /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetDomainServers array:string:/in-addr.arpa/127.0.0.1,/cluster.local/127.0.0.1 (code=exited, status=0/SUCCESS) Process: 10795 ExecStartPre=/usr/bin/cp /etc/origin/node/node-dnsmasq.conf /etc/dnsmasq.d/ (code=exited, status=0/SUCCESS) Main PID: 10800 (openshift) Memory: 56.5M CGroup: /system.slice/atomic-openshift-node.service ├─10800 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 └─10971 journalctl -k -f Sep 13 04:54:40 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:40.023059 10800 helpers.go:125] Unable to get udp stats from pid 12051:...rectory Sep 13 04:54:40 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:40.023066 10800 helpers.go:132] Unable to get udp6 stats from pid 12051...rectory Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.381690 10800 prober.go:106] Readiness probe for "nodejs-mongodb-example-1-j... Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.385667 10800 prober.go:106] Readiness probe for "registry-console-1-...to host Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.385719 10800 prober.go:106] Readiness probe for "docker-registry-1-l...to host Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.886235 10800 proxy.go:199] hybrid proxy: (always) update ep kube-sch...g proxy Sep 13 04:54:41 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:41.886270 10800 proxy.go:221] hybrid proxy: update ep kube-scheduler in...n proxy Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:43.891721 10800 proxy.go:199] hybrid proxy: (always) update ep kube-sch...g proxy Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: I0913 04:54:43.891746 10800 proxy.go:221] hybrid proxy: update ep kube-scheduler in...n proxy Sep 13 04:54:43 host-8-241-12.host.centralci.eng.rdu2.redhat.com atomic-openshift-node[10800]: W0913 04:54:43.974482 10800 helpers.go:771] eviction manager: no observation found ...ailable Hint: Some lines were ellipsized, use -l to show in full. step3: # sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 0 Expected results: node service could not start when set net.ipv4.ip_forward to 0 Additional info: Node service start failed at first, we could get the error from cli warning and node log: 12288 node.go:310] error: SDN node startup failed: net/ipv4/ip_forward=0, it must be set to 1 But after that, it still could start successfully finally, and check the flag, et.ipv4.ip_forward is 0 after node started Full node log have attached.