Bug 1564128

Summary: Starter-us-east-2 3.7 masters failing to come up with 3.9 predicate in config
Product: OpenShift Online Reporter: Paul Bergene <pbergene>
Component: PodAssignee: Avesh Agarwal <avagarwa>
Status: CLOSED NOTABUG QA Contact: DeShuai Ma <dma>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.xCC: aos-bugs, avagarwa, jchevret, jokerman, mmccomas, rhopp, sthaha
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-09 07:22:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paul Bergene 2018-04-05 12:33:59 UTC
Description of problem:
Masters failing on starter-us-east-2.  Filing a bz after #libra-ops discussion.

Version-Release number of selected component (if applicable):
OpenShift Master: v3.7.23 (online version 3.7.2.1)


The masters on starter-us-east-2 seem to be failing to come up or are in a bad state. Looking at atomic-openshift-master-controllers logs there is the following error: 

Apr 05 11:52:30 ip-172-31-67-105.us-east-2.compute.internal atomic-openshift-master-controllers[25092]: F0405 11:52:30.145529   25092 plugins.go:150] Invalid configuration: Predicate type not found for CheckVolumeBinding

Apr 05 11:52:30 ip-172-31-67-105.us-east-2.compute.internal systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
Apr 05 11:52:30 ip-172-31-67-105.us-east-2.compute.internal systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.

Predicate CheckVolumeBinding seems only to be referenced in 3.9 documentation, suggesting that a 3.9 config had been applied to this cluster.  There was maintenance peformed yesterday evening preceding this issue.

Comment 1 Avesh Agarwal 2018-04-05 13:30:16 UTC
Right now I dont know the root cause, but my suggestion is to remove CheckVolumeBinding from scheduler's policy file, and it wont effect anything because CheckVolumeBinding is a default predicate and is registered by default anyway, so there is not need to put it again in the policy file where it is failing.

Also the feature VolumeScheduling is disabled in 3.9 (alpha) which is used with CheckVolumeBinding, so the predicate CheckVolumeBinding does not do anything anyway.

Comment 2 Avesh Agarwal 2018-04-05 13:31:23 UTC
I will keep looking into it meanwhile why it is happening but i know that it does not happen always.

Comment 3 Avesh Agarwal 2018-04-05 13:39:46 UTC
I did not realize that the version was 3.7, i thought it was 3.9. The predicate CheckVolumeBinding was added in 3.9 so should not affect 3.7.

Please check why the scheduler policy file has CheckVolumeBinding in 3.7?

Comment 4 Paul Bergene 2018-04-06 08:00:45 UTC
This should now be resolved.

Comment 5 Avesh Agarwal 2018-04-06 15:47:43 UTC
can we close this bz if its resolved?