Cause:
Multiple API servers starting simultaneously with an empty etcd datastore would race to populate the default system policy.
Consequence:
A partially created policy could result, leaving a new cluster with a policy that would forbid system components from making some API calls.
Fix:
The policy APIs were updated to perform the same resourceVersion checking as other APIs, and fault-tolerant logic was added to the initial policy population step.
Result:
New clusters populate default policy as expected.
I've been able to reproduce this with a node that has its inventory name set to an ip address. ie: the second node below fails, the first works.
[nodes]
ose3-master.example.com openshift_node_labels="{'region':'infra','zone':'default'}" openshift_schedulable=true
192.168.122.102 openshift_node_labels="{'region':'primary','zone':'east'}"
`oadm policy reconcile-cluster-role-bindings` fixed the issue, existing nodes immediately registered themselves. Now as to why that's necessary, we're still not sure.
This seems to be the result of 3 API servers starting for the first time at the same time. We can work around this in the installer but it'd be nice if the product itself prevented that from being a problem via some sort of locking mechanism. I'll attach logs.
Ansible work-around https://github.com/openshift/openshift-ansible/pull/2233
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2016:1933
I've been able to reproduce this with a node that has its inventory name set to an ip address. ie: the second node below fails, the first works. [nodes] ose3-master.example.com openshift_node_labels="{'region':'infra','zone':'default'}" openshift_schedulable=true 192.168.122.102 openshift_node_labels="{'region':'primary','zone':'east'}"