Bug 1467387

Summary: error: SDN node startup failed: Allocated ofport (-1) did not match request (1)
Product: OpenShift Container Platform Reporter: Steven Walter <stwalter>
Component: NetworkingAssignee: Dan Williams <dcbw>
Status: CLOSED DUPLICATE QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.1CC: aloughla, aos-bugs, bbennett, danw, eparis, rhowe, stwalter
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-05 18:56:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steven Walter 2017-07-03 15:29:36 UTC
Description of problem:
Trying to start atomic-openshift-node service results in error

Version-Release number of selected component (if applicable):
3.5.1

How reproducible:
Unconfirmed


Actual results:
Jun 30 19:08:48 node1.example.com systemd[1]: Starting Atomic OpenShift Node...
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.123330   78019 start_node.go:250] Reading node configuration from /etc/origin/node/node-config.yaml
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.220043   78019 node.go:88] Initializing SDN node of type "redhat/openshift-ovs-multitenant" with configured hostname "node1.example.com" (IP ""), iptables sync period "30s"
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.283900   78019 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.283933   78019 docker.go:376] Start docker client with request timeout=2m0s
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.323197   78019 start_node.go:343] Starting node node1.example.com (v3.5.5.26)
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.324682   78019 start_node.go:352] Connecting to API server https://master.example.com:8443
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.326131   78019 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.326149   78019 docker.go:376] Start docker client with request timeout=0s
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.357337   78019 node.go:142] Connecting to Docker at unix:///var/run/docker.sock
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.393828   78019 feature_gate.go:181] feature gates: map[]
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: I0630 19:08:49.393935   78019 manager.go:143] cAdvisor running in container: "/system.slice/atomic-openshift-node.service"
Jun 30 19:08:49 node1.example.com ovs-vsctl[78061]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-br br0 -- add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13
Jun 30 19:08:49 node1.example.com ovs-vsctl[78062]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port br0 vxlan0
Jun 30 19:08:49 node1.example.com ovs-vsctl[78063]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --may-exist add-port br0 vxlan0 -- set Interface vxlan0 ofport_request=1 type=vxlan "options:remote_ip=\"flow\"" "options:key=\"flow\""
Jun 30 19:08:49 node1.example.com atomic-openshift-node[78019]: F0630 19:08:49.563667   78019 node.go:351] error: SDN node startup failed: Allocated ofport (-1) did not match request (1)
Jun 30 19:08:49 node1.example.com systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Jun 30 19:08:49 node1.example.com systemd[1]: Failed to start Atomic OpenShift Node.
Jun 30 19:08:49 node1.example.com systemd[1]: Unit atomic-openshift-node.service entered failed state.
Jun 30 19:08:49 node1.example.com systemd[1]: atomic-openshift-node.service failed.


Expected results:
Node started

Additional info:

I see potentially related github issue: https://github.com/openshift/origin/issues/14708

I also found this message in a separate bz https://bugzilla.redhat.com/show_bug.cgi?id=1444279 -- but not the other messages in that bz, no networking overlap, etc

Comment 1 Steven Walter 2017-07-03 16:26:07 UTC
This is a new installation.  Using original yaml and build files. It appears all nodes are impacted.  will attach the log from ansible.

Comment 3 Dan Winship 2017-07-03 17:48:29 UTC
FWIW, in the past we have seen this happen on hosts that have IPv6 disabled, though we don't know why it results in that particular error message (and we don't intend to be requiring IPv6 anyway; something is just accidentally depending on it somewhere).

Comment 4 Steven Walter 2017-07-03 18:03:51 UTC
# ovs-vsctl get Interface vxlan0 ofport
all nodes return a value of -1

Comment 7 Ryan Howe 2017-07-05 13:48:57 UTC
This looks like a duplicate of kernel bug 1445054

Comment 8 Dan Williams 2017-07-05 14:44:27 UTC
(In reply to Ryan Howe from comment #7)
> This looks like a duplicate of kernel bug 1445054

Yeah, it sure does.

Steven, can we confirm/deny that the customer has disabled IPv6 on those nodes?  Also, what RHEL kernel version are they running?

Comment 9 Steven Walter 2017-07-05 18:56:48 UTC
Yep, confirmed as a duplicate.

*** This bug has been marked as a duplicate of bug 1445054 ***