Description of problem: There is fatal error about "cannot list networkpolicies.networking.k8s.io at the cluster scope", which causes the node cannot be started when using networkpolicy plugin Version-Release number of selected component (if applicable): v3.8.7 How reproducible: always Steps to Reproduce: 1. Setup multinode env with network policy plugin 2. Try to start the node after the master is running 3. Actual results: Node starts failed with fatal error. Expected results: Node should be able to run. Additional info: Related error in node: Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: E1207 17:39:25.239301 3472 networkpolicy.go:130] Unable to query NetworkPolicies (networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster) - please ensure your nodes have access to view NetworkPolicy (eg, 'oc adm policy reconcile-cluster-roles') Dec 07 17:39:25 ose-node1.bmeng.local atomic-openshift-node[3472]: F1207 17:39:25.239335 3472 network.go:44] SDN node startup failed: networkpolicies.networking.k8s.io is forbidden: User "system:node:ose-node1.bmeng.local" cannot list networkpolicies.networking.k8s.io at the cluster scope: User "system:node:ose-node1.bmeng.local" cannot list all networkpolicies.networking.k8s.io in the cluster Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Failed to start Atomic OpenShift Node. Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: Unit atomic-openshift-node.service entered failed state. Dec 07 17:39:25 ose-node1.bmeng.local systemd[1]: atomic-openshift-node.service failed.
Weibin: Can you please reproduce this, thanks!
@Ben @weibin The bug is quite easy to be recreated. The node start will fail directly once the env setup. I suspect that the recent api changes cause this. Since there are lots of api changes in the 3.8 branch.
FYI # oc policy who-can list networkpolicies Namespace: default Verb: list Resource: networkpolicies.extensions Users: system:admin system:kube-controller-manager system:serviceaccount:default:router system:serviceaccount:kube-service-catalog:default system:serviceaccount:kube-system:generic-garbage-collector system:serviceaccount:kube-system:namespace-controller system:serviceaccount:kube-system:resourcequota-controller system:serviceaccount:management-infra:management-admin system:serviceaccount:openshift-ansible-service-broker:asb system:serviceaccount:openshift-infra:template-instance-controller Groups: system:cluster-admins system:cluster-readers system:masters system:nodes # oc policy who-can list networkpolicies.networking.k8s.io Namespace: default Verb: list Resource: networkpolicies.networking.k8s.io Users: system:admin system:kube-controller-manager system:serviceaccount:default:router system:serviceaccount:kube-system:generic-garbage-collector system:serviceaccount:kube-system:namespace-controller system:serviceaccount:kube-system:resourcequota-controller system:serviceaccount:management-infra:management-admin Groups: system:cluster-admins system:cluster-readers system:masters # openssl x509 -in /etc/origin/node/system\:node\:ip-172-18-3-251.ec2.internal.crt -noout -subject subject= /O=system:nodes/CN=system:node:ip-172-18-3-251.ec2.internal # openshift version openshift v3.8.11 kubernetes v1.8.1+0d5291c etcd 3.2.8
https://github.com/kubernetes/kubernetes/pull/39164 according to https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.8.md
cc @danw
https://github.com/openshift/origin/commit/364615da6cf024eeb3190e531c3314667d9d8caa Seems the changes above causes the issue.
Fixed by https://github.com/openshift/origin/pull/17549 which should merge soon
Checked on v3.9.0-0.11.0.0, the node can be started normally when using networkpolicy plugin.