Bug 1484272

Summary: Openshift node service doesn't start when iptables-based proxy is disabled
Product: OpenShift Container Platform Reporter: Nicolas Nosenzo <nnosenzo>
Component: NodeAssignee: Andrew McDermott <amcdermo>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: aivaraslaimikis, aos-bugs, bmeng, erich, jokerman, mmccomas, sjenning
Target Milestone: ---   
Target Release: 3.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Disabling the use of the proxy via '--disable-proxy' triggers a panic because the "service stores" have nil values. Consequence: When disabling the proxy the node will never start leaving the system in an indeterminate state. Fix: The logic has been reworked to ensure that the "service stores" are populated with non-nil values when the proxy has been disabled. Result: Using "--disable=proxy" no longer causes a panic and overall node start failure.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-25 13:06:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nicolas Nosenzo 2017-08-23 07:32:50 UTC
Description of problem:
Node service stays on "Activating" when --disable=proxy is added in the /etc/sysconfig/atomic-openshift-node file.

This is performed as requirement for third party network plugin integration, as explained in https://github.com/openshift/origin/blob/master/docs/openshift_networking_requirements.md#advanced-requirements

Error:
 atomic-openshift-node[26524]: E0807 04:45:54.161756   26524 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)



Version-Release number of selected component (if applicable):

# oc version
oc v3.5.5.31
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO


How reproducible:


Steps to Reproduce:
1. Add "--disable proxy" to /etc/sysconfig/atomic-openshift-node file
2. systemctl restart atomic-openshift-node.service


Actual results:

Node service can't start

Expected results:

Iptables-based proxy is disabled and node service starts normally 

Additional info:

Similar issue: https://github.com/openshift/origin/issues/14244

Comment 1 Seth Jennings 2017-08-25 04:37:04 UTC
There is a stack trace in the referenced issue:
https://github.com/openshift/origin/issues/14244#issuecomment-302656375

Unfortunately it is just a lot of reflector calls.  The only hint is that it seems to happen every second.

I just brought up a 3.5 cluster with nothing in it an couldn't recreate.  Likely because I didn't have an resources populating it.

Comment 2 Seth Jennings 2017-08-25 19:18:37 UTC
Andrew, I would begin by starting an openshift cluster with the node having "--disable proxy" and trying to recreate on 3.5.5.31.

Comment 3 Andrew McDermott 2017-09-13 12:42:29 UTC
I am able to reproduce and I am testing the following fix:

https://github.com/frobware/origin/tree/fix-node-panic-with-disable-proxy-bugzilla-1484272

Comment 5 Andrew McDermott 2017-09-19 19:57:51 UTC
https://github.com/openshift/ose/pull/866

Comment 7 Meng Bo 2017-10-12 08:37:05 UTC
Tested with ocp build v3.5.5.31.34

The openshift node service can run well with disable the proxy component.

[root@ip-172-18-8-60 ~]# ps -ef | grep node-config
root      52911      1  3 04:33 ?        00:00:04 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2 --disable proxy
root      53648  52182  0 04:35 pts/0    00:00:00 grep --color=auto node-config
[root@ip-172-18-8-60 ~]# 

[root@ip-172-18-8-60 ~]# systemctl status atomic-openshift-node 
● atomic-openshift-node.service - Atomic OpenShift Node
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: active (running) since Thu 2017-10-12 04:33:11 EDT; 2min 40s ago
     Docs: https://github.com/openshift/origin
 Main PID: 52911 (openshift)
   Memory: 46.0M
   CGroup: /system.slice/atomic-openshift-node.service
           ├─52911 /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglev...
           └─52974 journalctl -k -f

Comment 9 errata-xmlrpc 2017-10-25 13:06:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049