Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1523153

Summary: Cannot start atomic-openshift-node when using networkpolicy plugin
Product: OpenShift Container Platform Reporter: Meng Bo <bmeng>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, danw, erich, wjiang, wmeng, xtian
Version: 3.8.0   
Target Milestone: ---   
Target Release: 3.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-05 21:50:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Meng Bo 2017-12-07 10:20:58 UTC
Description of problem:
There is fatal error about "cannot list networkpolicies.networking.k8s.io at the cluster scope", which causes the node cannot be started when using networkpolicy plugin

Version-Release number of selected component (if applicable):
v3.8.7

How reproducible:
always

Steps to Reproduce:
1. Setup multinode env with network policy plugin
2. Try to start the node after the master is running
3.

Actual results:
Node starts failed with fatal error.

Expected results:
Node should be able to run.

Additional info:
Related error in node:
Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: E1207 17:39:25.239301    3472 networkpolicy.go:130] Unable to query NetworkPolicies (networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster) - please ensure your nodes have access to view NetworkPolicy (eg, 'oc adm policy reconcile-cluster-roles')
Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: F1207 17:39:25.239335    3472 network.go:44] SDN node startup failed: networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Failed to start Atomic OpenShift Node.
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Unit atomic-openshift-node.service entered failed state.
Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service failed.

Comment 1 Ben Bennett 2017-12-07 13:55:00 UTC
Weibin: Can you please reproduce this, thanks!

Comment 4 Meng Bo 2017-12-08 02:28:26 UTC
@Ben @weibin

The bug is quite easy to be recreated. The node start will fail directly once the env setup. 
I suspect that the recent api changes cause this. Since there are lots of api changes in the 3.8 branch.

Comment 5 weiwei jiang 2017-12-08 02:30:58 UTC
FYI

# oc policy who-can list networkpolicies
Namespace: default
Verb:      list
Resource:  networkpolicies.extensions

Users:  system:admin
        system:kube-controller-manager
        system:serviceaccount:default:router
        system:serviceaccount:kube-service-catalog:default
        system:serviceaccount:kube-system:generic-garbage-collector
        system:serviceaccount:kube-system:namespace-controller
        system:serviceaccount:kube-system:resourcequota-controller
        system:serviceaccount:management-infra:management-admin
        system:serviceaccount:openshift-ansible-service-broker:asb
        system:serviceaccount:openshift-infra:template-instance-controller

Groups: system:cluster-admins
        system:cluster-readers
        system:masters
        system:nodes

# oc policy who-can list networkpolicies.networking.k8s.io
Namespace: default
Verb:      list
Resource:  networkpolicies.networking.k8s.io

Users:  system:admin
        system:kube-controller-manager
        system:serviceaccount:default:router
        system:serviceaccount:kube-system:generic-garbage-collector
        system:serviceaccount:kube-system:namespace-controller
        system:serviceaccount:kube-system:resourcequota-controller
        system:serviceaccount:management-infra:management-admin

Groups: system:cluster-admins
        system:cluster-readers
        system:masters


# openssl x509 -in /etc/origin/node/system\:node\:ip-172-18-3-251.ec2.internal.crt -noout -subject
subject= /O=system:nodes/CN=system:node:ip-172-18-3-251.ec2.internal

# openshift version 
openshift v3.8.11
kubernetes v1.8.1+0d5291c
etcd 3.2.8

Comment 7 Meng Bo 2017-12-08 02:38:27 UTC
cc @danw

Comment 8 Meng Bo 2017-12-08 03:18:27 UTC
https://github.com/openshift/origin/commit/364615da6cf024eeb3190e531c3314667d9d8caa

Seems the changes above causes the issue.

Comment 9 Dan Winship 2017-12-08 14:55:17 UTC
Fixed by https://github.com/openshift/origin/pull/17549 which should merge soon

Comment 11 Meng Bo 2018-01-03 06:29:46 UTC
Checked on v3.9.0-0.11.0.0, the node can be started normally when using networkpolicy plugin.