Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1523153

Summary:	Cannot start atomic-openshift-node when using networkpolicy plugin
Product:	OpenShift Container Platform	Reporter:	Meng Bo <bmeng>
Component:	Networking	Assignee:	Casey Callendrello <cdc>
Networking sub component:	openshift-sdn	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, danw, erich, wjiang, wmeng, xtian
Version:	3.8.0
Target Milestone:	---
Target Release:	3.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-12-05 21:50:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Meng Bo 2017-12-07 10:20:58 UTC

Description of problem:
There is fatal error about "cannot list networkpolicies.networking.k8s.io at the cluster scope", which causes the node cannot be started when using networkpolicy plugin

Version-Release number of selected component (if applicable):
v3.8.7

How reproducible:
always

Steps to Reproduce:
1. Setup multinode env with network policy plugin
2. Try to start the node after the master is running
3.

Actual results:
Node starts failed with fatal error.

Expected results:
Node should be able to run.

Additional info:
Related error in node:
Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: E1207 17:39:25.239301    3472 networkpolicy.go:130] Unable to query NetworkPolicies (networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster) - please ensure your nodes have access to view NetworkPolicy (eg, 'oc adm policy reconcile-cluster-roles')
Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: F1207 17:39:25.239335    3472 network.go:44] SDN node startup failed: networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Failed to start Atomic OpenShift Node.
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Unit atomic-openshift-node.service entered failed state.
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service failed.

Comment 1 Ben Bennett 2017-12-07 13:55:00 UTC

Weibin: Can you please reproduce this, thanks!

Comment 4 Meng Bo 2017-12-08 02:28:26 UTC

@Ben @weibin

The bug is quite easy to be recreated. The node start will fail directly once the env setup. 
I suspect that the recent api changes cause this. Since there are lots of api changes in the 3.8 branch.

Comment 5 weiwei jiang 2017-12-08 02:30:58 UTC

FYI

# oc policy who-can list networkpolicies
Namespace: default
Verb:      list
Resource:  networkpolicies.extensions

Users:  system:admin
        system:kube-controller-manager
        system:serviceaccount:default:router
        system:serviceaccount:kube-service-catalog:default
        system:serviceaccount:kube-system:generic-garbage-collector
        system:serviceaccount:kube-system:namespace-controller
        system:serviceaccount:kube-system:resourcequota-controller
        system:serviceaccount:management-infra:management-admin
        system:serviceaccount:openshift-ansible-service-broker:asb
        system:serviceaccount:openshift-infra:template-instance-controller

Groups: system:cluster-admins
        system:cluster-readers
        system:masters
        system:nodes

# oc policy who-can list networkpolicies.networking.k8s.io
Namespace: default
Verb:      list
Resource:  networkpolicies.networking.k8s.io

Users:  system:admin
        system:kube-controller-manager
        system:serviceaccount:default:router
        system:serviceaccount:kube-system:generic-garbage-collector
        system:serviceaccount:kube-system:namespace-controller
        system:serviceaccount:kube-system:resourcequota-controller
        system:serviceaccount:management-infra:management-admin

Groups: system:cluster-admins
        system:cluster-readers
        system:masters


# openssl x509 -in /etc/origin/node/system\:node\:ip-172-18-3-251.ec2.internal.crt -noout -subject
subject= /O=system:nodes/CN=system:node:ip-172-18-3-251.ec2.internal

# openshift version 
openshift v3.8.11
kubernetes v1.8.1+0d5291c
etcd 3.2.8

Comment 6 weiwei jiang 2017-12-08 02:36:50 UTC

https://github.com/kubernetes/kubernetes/pull/39164 according to https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md

Comment 7 Meng Bo 2017-12-08 02:38:27 UTC

cc @danw

Comment 8 Meng Bo 2017-12-08 03:18:27 UTC

https://github.com/openshift/origin/commit/364615da6cf024eeb3190e531c3314667d9d8caa

Seems the changes above causes the issue.

Comment 9 Dan Winship 2017-12-08 14:55:17 UTC

Fixed by https://github.com/openshift/origin/pull/17549 which should merge soon

Comment 11 Meng Bo 2018-01-03 06:29:46 UTC

Checked on v3.9.0-0.11.0.0, the node can be started normally when using networkpolicy plugin.