Bug 1464569

Summary: custom role bindings did not survive a cluster reboot
Product: OpenShift Container Platform Reporter: Sten Turpin <sten>
Component: apiserver-authAssignee: Jordan Liggitt <jliggitt>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Chuan Yu <chuyu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.1CC: aos-bugs, deads, eparis, jialiu, jliggitt, jokerman, mmccomas, mwhittin, sten, stwalter
Target Milestone: ---Keywords: OpsBlocker
Target Release: 3.4.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-16 22:02:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sten Turpin 2017-06-23 19:22:57 UTC
Description of problem: We patched our OpenShift Dedicated clusters for StackGuard. Since this was a kernel + glibc update, all nodes and masters were rebooted in series. All services should have remained available during the reboots, however after the reboots it appears custom rolebindings are missing. 


Version-Release number of selected component (if applicable): 3.4.1.18-1.git.0.0f9d380.el7


How reproducible: two clusters so far


Steps to Reproduce:
1. Apply RHSA-2017:1481, RHSA-2017:1484; reboot all hosts in series
2. Create application from template

Actual results:
esb-template-nodejs   Service               Warning   CreatingLoadBalancerFailed   {service-controller }   Error creating load balancer (will retry): Failed to create load balancer for service esb-templates-sit/esb-template-nodejs: error describing subnets: error listing AWS subnets: UnauthorizedOperation: You are not authorized to perform this operation.

Expected results:
Operations that worked prior to reboot should continue to work. 

Additional info:

Comment 1 David Eads 2017-06-23 20:04:31 UTC
The openshift apiserver does not modify rolebindings as part of startup if any such resources already exist.  If the cluster is large enough, you may be waiting for the authorization cache to re-prime.

Could you check to see if the rolebinding in question (and its role) still exists?

Comment 2 Jordan Liggitt 2017-06-24 15:04:53 UTC
Can you provide the commands used to create the custom bindings, and the following info about them:

1. were they rolebindings or clusterrolebindings? if rolebindings, in what namespace and to what role?
2. what command/manifest was used to create the binding?
3. can you provide the output of the following from the current cluster:
    oc get clusterpolicy/default -o yaml
    oc get clusterpolicybinding/:default -o yaml

  and if the missing binding was a rolebinding:
    oc get policy/default -o yaml -n <binding-namespace>
    oc get policybinding/:default -o yaml -n <binding-namespace>

are there any role-related commands run as part of a shutdown or bring-up script?

Comment 10 Eric Paris 2017-08-16 22:02:42 UTC
At this point we have been unable to reproduce the problem and do not have adequate data to make further progress. I apologize but am closing this bug as UNSUFFICIENT_DATA since we will be unable to resolve the issue.