1359900 – With multiple API servers they race to bootstrap policy

Bug 1359900 - With multiple API servers they race to bootstrap policy

Summary: With multiple API servers they race to bootstrap policy

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	3.3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jordan Liggitt
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1361313 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-25 16:46 UTC by Jeremy Eder
Modified:	2017-03-08 18:26 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Multiple API servers starting simultaneously with an empty etcd datastore would race to populate the default system policy. Consequence: A partially created policy could result, leaving a new cluster with a policy that would forbid system components from making some API calls. Fix: The policy APIs were updated to perform the same resourceVersion checking as other APIs, and fault-tolerant logic was added to the initial policy population step. Result: New clusters populate default policy as expected.
Clone Of:
Environment:
Last Closed:	2016-09-27 09:41:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
/etc/origin, master log and ansible inventory (880.06 KB, application/x-gzip) 2016-07-27 14:49 UTC, Mike Fiedler	no flags	Details
api server logs (7.22 KB, application/x-gzip) 2016-07-28 15:19 UTC, Scott Dodson	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Comment 1 Scott Dodson 2016-07-26 18:55:25 UTC

I've been able to reproduce this with a node that has its inventory name set to an ip address. ie: the second node below fails, the first works.

[nodes]
ose3-master.example.com openshift_node_labels="{'region':'infra','zone':'default'}" openshift_schedulable=true
192.168.122.102 openshift_node_labels="{'region':'primary','zone':'east'}"

Comment 2 Scott Dodson 2016-07-27 14:30:44 UTC

My test environment was bad. Now that I've re-provisioned the environment I can no longer re-produce this when specifying the inventory name as an ip.

Comment 3 Mike Fiedler 2016-07-27 14:49:14 UTC

Created attachment 1184683 [details]
/etc/origin, master log and ansible inventory

/etc/origin, master log and ansible inventory attached

Comment 4 Scott Dodson 2016-07-27 21:00:17 UTC

`oadm policy reconcile-cluster-role-bindings` fixed the issue, existing nodes immediately registered themselves. Now as to why that's necessary, we're still not sure.

Comment 5 Scott Dodson 2016-07-28 15:13:25 UTC

This seems to be the result of 3 API servers starting for the first time at the same time. We can work around this in the installer but it'd be nice if the product itself prevented that from being a problem via some sort of locking mechanism. I'll attach logs.

Ansible work-around https://github.com/openshift/openshift-ansible/pull/2233

Comment 6 Scott Dodson 2016-07-28 15:19:26 UTC

Created attachment 1185156 [details]
api server logs

Comment 7 Scott Dodson 2016-07-29 13:29:05 UTC

*** Bug 1361313 has been marked as a duplicate of this bug. ***

Comment 8 Jordan Liggitt 2016-08-02 14:58:49 UTC

Fixed upstream in https://github.com/openshift/origin/pull/10099

Comment 9 Troy Dawson 2016-08-03 19:00:17 UTC

This has been merged and is in OSE v3.3.0.14 or newer.

Comment 11 Gan Huang 2016-08-04 06:32:39 UTC

Verified with openshift v3.3.0.14

Successed to install HA env and make S2I build.

Comment 13 errata-xmlrpc 2016-09-27 09:41:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.