Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1288974

Summary: Out of date etcd event index causes openshift-master startup failure
Product: OpenShift Container Platform Reporter: Andrew Block <andrew.block>
Component: NodeAssignee: Paul Weil <pweil>
Status: CLOSED DUPLICATE QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-04 21:17:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Block 2015-12-07 07:01:42 UTC
Description of problem:

Running OpenShift Enterprise 3.0.2 as a single master/node/etc combo. After a series of reboots, the openshift-master service fails to load. The reboots were due to an unrelated hardware issue, but did occur both while OpenShift was running and at machine boot.


How reproducible:

Depends on reboot timing

Steps to Reproduce:
1. Reboot machine several times.
2. Attempt to start up OpenShift master

Actual results:

The following output is produced when attempting to start the openshift-master service after several unplanned server reboots

Dec 03 11:20:18 c1-n8.cloud1.nmca.demo systemd[1]: Starting OpenShift Master... 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.321725    6498 plugins.go:70] No cloud provider specified. 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.321872    6498 start_master.go:385] Starting master on 0.0.0.0:8443 (v3.0.2.0-20-g656dc3e) 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.321879    6498 start_master.go:386] Public master address is https://c1-n8.cloud1.nmca.demo:8443 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.321891    6498 start_master.go:390] Using images from "openshift3/ose-<component>:v3.0.2.0" 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322367    6498 reflector.go:180] pkg/admission/resourcequota/admission.go:59: Failed to list *api.ResourceQuota: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/resourcequotas: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322400    6498 reflector.go:180] pkg/admission/serviceaccount/admission.go:101: Failed to list *api.ServiceAccount: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/serviceaccounts: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322401    6498 reflector.go:180] pkg/admission/namespace/exists/admission.go:101: Failed to list *api.Namespace: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/namespaces: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322448    6498 reflector.go:180] pkg/admission/limitranger/admission.go:102: Failed to list *api.LimitRange: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/limitranges: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322463    6498 reflector.go:180] pkg/security/admission/admission.go:59: Failed to list *api.SecurityContextConstraints: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/securitycontextconstraints: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322478    6498 reflector.go:180] pkg/admission/namespace/lifecycle/admission.go:107: Failed to list *api.Namespace: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/namespaces: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:19.322485    6498 reflector.go:180] pkg/admission/serviceaccount/admission.go:115: Failed to list *api.Secret: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: resolving c1-n8.cloud1.nmca.demo:7001 to 127.0.0.1:7001 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: resolving c1-n8.cloud1.nmca.demo:4001 to 127.0.0.1:4001 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.424039    6498 server.go:65] etcd: peerTLS: cert = /etc/openshift/master/etcd.server.crt, key = /etc/openshift/master/etcd.server.key, ca = /etc/openshift/master/ca.crt 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.522721    6498 server.go:76] etcd: listening for peers on https://0.0.0.0:7001 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.522736    6498 server.go:87] etcd: clientTLS: cert = /etc/openshift/master/etcd.server.crt, key = /etc/openshift/master/etcd.server.key, ca = /etc/openshift/master/ca.crt 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.621711    6498 server.go:98] etcd: listening for client requests on https://0.0.0.0:4001 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: skipped unexpected non snapshot file 0000000000000008-0000000000471cf2.snap.broken 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: recovered store from snapshot at index 4930700 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: name = openshift.local 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: data dir = /var/lib/openshift/openshift.local.etcd 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: member dir = /var/lib/openshift/openshift.local.etcd/member 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: heartbeat = 100ms 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: election = 1000ms 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: snapshot count = 0 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: advertise client URLs = https://127.0.0.1:4001 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: loaded cluster information from store: <nil> 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: restarting member e14792d3b3fae623 in cluster b8af4046e6363d01 at commit index 4940297 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: e14792d3b3fae623 became follower at term 51 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: newRaft e14792d3b3fae623 [peers: [e14792d3b3fae623], term: 51, commit: 4940297, applied: 4930700, lastindex: 4940297, lastterm: 51] 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: set snapshot count to default 10000 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift[6498]: starting server... [version: 2.1.2, cluster version: 2.1.0] 
Dec 03 11:20:19 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:19.996478    6498 etcd.go:68] Started etcd at c1-n8.cloud1.nmca.demo:4001 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:20.014241    6498 run_components.go:182] Using default project node label selector: 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: E1203 11:20:20.014603    6498 reflector.go:180] pkg/project/cache/cache.go:102: Failed to list *api.Namespace: Get https://c1-n8.cloud1.nmca.demo:8443/api/v1/namespaces: dial tcp 127.0.0.1:8443: connection refused 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: I1203 11:20:20.014743    6498 master.go:268] Setting master service IP to "172.30.0.1" (read-write). 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: W1203 11:20:20.035673    6498 reflector.go:227] pkg/authorization/cache/policy.go:44: watch of *api.Policy ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [1626739/1626175]) [1627738] 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: W1203 11:20:20.035892    6498 reflector.go:227] pkg/authorization/cache/clusterpolicybinding.go:44: watch of *api.ClusterPolicyBinding ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [1626758/1626199]) [1627757] 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: W1203 11:20:20.036092    6498 reflector.go:227] pkg/authorization/cache/clusterpolicy.go:44: watch of *api.ClusterPolicy ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [1626770/1626233]) [1627769] 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: W1203 11:20:20.036328    6498 reflector.go:227] pkg/authorization/cache/policybinding.go:44: watch of *api.PolicyBinding ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [1626782/1626291]) [1627781] 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo openshift-master[6498]: F1203 11:20:20.052590    6498 controller.go:80] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: serviceipallocation "" cannot be updated: the provided resource version does not match 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo systemd[1]: openshift-master.service: main process exited, code=exited, status=255/n/a 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo systemd[1]: Failed to start OpenShift Master. 
Dec 03 11:20:20 c1-n8.cloud1.nmca.demo systemd[1]: Unit openshift-master.service entered failed state. 

Expected results:

openshift-master service starts successfully

Additional info:

Comment 1 Paul Weil 2016-01-04 21:17:37 UTC
This appears to be the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1294864.  The out of sync index is just a warning and shouldn't be causing failures but I notice that you have this failure:

F1203 11:20:20.052590    6498 controller.go:80] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: serviceipallocation "" cannot be updated: the provided resource version does not match

which is the same as the other bug.  Duping for now.  Please comment if you disagree.

*** This bug has been marked as a duplicate of bug 1294864 ***