Bug 1523153 - Cannot start atomic-openshift-node when using networkpolicy plugin
Summary: Cannot start atomic-openshift-node when using networkpolicy plugin
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.8.0
Assignee: Casey Callendrello
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-07 10:20 UTC by Meng Bo
Modified: 2019-12-05 21:50 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-05 21:50:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Meng Bo 2017-12-07 10:20:58 UTC
Description of problem:
There is fatal error about "cannot list networkpolicies.networking.k8s.io at the cluster scope", which causes the node cannot be started when using networkpolicy plugin

Version-Release number of selected component (if applicable):
v3.8.7

How reproducible:
always

Steps to Reproduce:
1. Setup multinode env with network policy plugin
2. Try to start the node after the master is running
3.

Actual results:
Node starts failed with fatal error.

Expected results:
Node should be able to run.

Additional info:
Related error in node:
Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: E1207 17:39:25.239301    3472 networkpolicy.go:130] Unable to query NetworkPolicies (networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster) - please ensure your nodes have access to view NetworkPolicy (eg, 'oc adm policy reconcile-cluster-roles')
Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: F1207 17:39:25.239335    3472 network.go:44] SDN node startup failed: networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Failed to start Atomic OpenShift Node.
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Unit atomic-openshift-node.service entered failed state.
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service failed.

Comment 1 Ben Bennett 2017-12-07 13:55:00 UTC
Weibin: Can you please reproduce this, thanks!

Comment 4 Meng Bo 2017-12-08 02:28:26 UTC
@Ben @weibin

The bug is quite easy to be recreated. The node start will fail directly once the env setup. 
I suspect that the recent api changes cause this. Since there are lots of api changes in the 3.8 branch.

Comment 5 weiwei jiang 2017-12-08 02:30:58 UTC
FYI

# oc policy who-can list networkpolicies
Namespace: default
Verb:      list
Resource:  networkpolicies.extensions

Users:  system:admin
        system:kube-controller-manager
        system:serviceaccount:default:router
        system:serviceaccount:kube-service-catalog:default
        system:serviceaccount:kube-system:generic-garbage-collector
        system:serviceaccount:kube-system:namespace-controller
        system:serviceaccount:kube-system:resourcequota-controller
        system:serviceaccount:management-infra:management-admin
        system:serviceaccount:openshift-ansible-service-broker:asb
        system:serviceaccount:openshift-infra:template-instance-controller

Groups: system:cluster-admins
        system:cluster-readers
        system:masters
        system:nodes

# oc policy who-can list networkpolicies.networking.k8s.io
Namespace: default
Verb:      list
Resource:  networkpolicies.networking.k8s.io

Users:  system:admin
        system:kube-controller-manager
        system:serviceaccount:default:router
        system:serviceaccount:kube-system:generic-garbage-collector
        system:serviceaccount:kube-system:namespace-controller
        system:serviceaccount:kube-system:resourcequota-controller
        system:serviceaccount:management-infra:management-admin

Groups: system:cluster-admins
        system:cluster-readers
        system:masters


# openssl x509 -in /etc/origin/node/system\:node\:ip-172-18-3-251.ec2.internal.crt -noout -subject
subject= /O=system:nodes/CN=system:node:ip-172-18-3-251.ec2.internal

# openshift version 
openshift v3.8.11
kubernetes v1.8.1+0d5291c
etcd 3.2.8

Comment 7 Meng Bo 2017-12-08 02:38:27 UTC
cc @danw

Comment 8 Meng Bo 2017-12-08 03:18:27 UTC
https://github.com/openshift/origin/commit/364615da6cf024eeb3190e531c3314667d9d8caa

Seems the changes above causes the issue.

Comment 9 Dan Winship 2017-12-08 14:55:17 UTC
Fixed by https://github.com/openshift/origin/pull/17549 which should merge soon

Comment 11 Meng Bo 2018-01-03 06:29:46 UTC
Checked on v3.9.0-0.11.0.0, the node can be started normally when using networkpolicy plugin.


Note You need to log in before you can comment on or make changes to this bug.