Bug 1484272 - Openshift node service doesn't start when iptables-based proxy is disabled
Summary: Openshift node service doesn't start when iptables-based proxy is disabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.5.z
Assignee: Andrew McDermott
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-23 07:32 UTC by Nicolas Nosenzo
Modified: 2017-10-25 13:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Disabling the use of the proxy via '--disable-proxy' triggers a panic because the "service stores" have nil values. Consequence: When disabling the proxy the node will never start leaving the system in an indeterminate state. Fix: The logic has been reworked to ensure that the "service stores" are populated with non-nil values when the proxy has been disabled. Result: Using "--disable=proxy" no longer causes a panic and overall node start failure.
Clone Of:
Environment:
Last Closed: 2017-10-25 13:06:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Description Nicolas Nosenzo 2017-08-23 07:32:50 UTC
Description of problem:
Node service stays on "Activating" when --disable=proxy is added in the /etc/sysconfig/atomic-openshift-node file.

This is performed as requirement for third party network plugin integration, as explained in https://github.com/openshift/origin/blob/master/docs/openshift_networking_requirements.md#advanced-requirements

Error:
 atomic-openshift-node[26524]: E0807 04:45:54.161756   26524 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)



Version-Release number of selected component (if applicable):

# oc version
oc v3.5.5.31
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO


How reproducible:


Steps to Reproduce:
1. Add "--disable proxy" to /etc/sysconfig/atomic-openshift-node file
2. systemctl restart atomic-openshift-node.service


Actual results:

Node service can't start

Expected results:

Iptables-based proxy is disabled and node service starts normally 

Additional info:

Similar issue: https://github.com/openshift/origin/issues/14244

Comment 1 Seth Jennings 2017-08-25 04:37:04 UTC
There is a stack trace in the referenced issue:
https://github.com/openshift/origin/issues/14244#issuecomment-302656375

Unfortunately it is just a lot of reflector calls.  The only hint is that it seems to happen every second.

I just brought up a 3.5 cluster with nothing in it an couldn't recreate.  Likely because I didn't have an resources populating it.

Comment 2 Seth Jennings 2017-08-25 19:18:37 UTC
Andrew, I would begin by starting an openshift cluster with the node having "--disable proxy" and trying to recreate on 3.5.5.31.

Comment 3 Andrew McDermott 2017-09-13 12:42:29 UTC
I am able to reproduce and I am testing the following fix:

https://github.com/frobware/origin/tree/fix-node-panic-with-disable-proxy-bugzilla-1484272

Comment 5 Andrew McDermott 2017-09-19 19:57:51 UTC
https://github.com/openshift/ose/pull/866

Comment 7 Meng Bo 2017-10-12 08:37:05 UTC
Tested with ocp build v3.5.5.31.34

The openshift node service can run well with disable the proxy component.

[root@ip-172-18-8-60 ~]# ps -ef | grep node-config
root      52911      1  3 04:33 ?        00:00:04 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 --disable proxy
root      53648  52182  0 04:35 pts/0    00:00:00 grep --color=auto node-config
[root@ip-172-18-8-60 ~]# 

[root@ip-172-18-8-60 ~]# systemctl status atomic-openshift-node 
● atomic-openshift-node.service - Atomic OpenShift Node
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Thu 2017-10-12 04:33:11 EDT; 2min 40s ago
     Docs: https://github.com/openshift/origin
 Main PID: 52911 (openshift)
   Memory: 46.0M
   CGroup: /system.slice/atomic-openshift-node.service
           ├─52911 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglev...
           └─52974 journalctl -k -f

Comment 9 errata-xmlrpc 2017-10-25 13:06:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.