Bug 1294864

Summary: atomic-openshift-master-api fails when all are restarted together
Product: OpenShift Container Platform Reporter: Ryan Howe <rhowe>
Component: NodeAssignee: Andy Goldstein <agoldste>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: andrew.block, aos-bugs, bleanhar, dma, erich, jokerman, knakayam, misalunk, mmccomas, pep, yoqi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-23 20:31:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1301217    
Bug Blocks: 1267746    

Description Ryan Howe 2015-12-30 18:44:47 UTC
Description of problem:

When restarting atomic-openshift-master-api services on all masters 1 or all will fail with the following error: 


Dec 30 13:06:19 master3.bender.com atomic-openshift-master-api[57882]: F1230 13:06:19.611303   57882 controller.go:83] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: serviceipallocation "" cannot be updated: the provided resource version does not match


Version-Release number of selected component (if applicable):
3,1

How reproducible: 80%


Steps to Reproduce:
1. Reboot all masters at the same time
 or restart all atomic-openshift-master-api.services together

# ansible masters -a "systemctl restart atomic-openshift-master-api"

Actual results:
One fails most of the time with the error below

Expected results:

All to start up with out error


Additional info:


master3.bender.com | FAILED | rc=3 >>
* atomic-openshift-master-api.service - Atomic OpenShift Master API
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-master-api.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2015-12-30 13:16:42 EST; 2min 3s ago
     Docs: https://github.com/openshift/origin
  Process: 59189 ExecStart=/usr/bin/openshift start master api --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
 Main PID: 59189 (code=exited, status=255)


Dec 30 13:16:41 master3.bender.com systemd[1]: Starting Atomic OpenShift Master API...
Dec 30 13:16:41 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:41.729040   59189 start_api.go:96] Using a masterIP override "192.168.110.12"
Dec 30 13:16:41 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:41.729179   59189 start_api.go:102] Using a listen address override "0.0.0.0:8443"
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:42.498804   59189 plugins.go:71] No cloud provider specified.
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:42.626631   59189 start_master.go:388] Starting master on 0.0.0.0:8443 (v3.1.0.4-16-g112fcc4)
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:42.626673   59189 start_master.go:389] Public master address is https://openshift-cluster.bender.com:8443
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:42.626690   59189 start_master.go:393] Using images from "openshift3/ose-<component>:v3.1.0.4"
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:42.650845   59189 run_components.go:183] Using default project node label selector:
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: I1230 13:16:42.814940   59189 master.go:369] Setting master service IP to "172.30.0.1" (read-write).
Dec 30 13:16:42 master3.bender.com atomic-openshift-master-api[59189]: F1230 13:16:42.932154   59189 controller.go:83] Unable to perform initial IP allocation check: unable to persist the updated service IP allocations: serviceipallocation "" cannot be updated: the provid
Dec 30 13:16:42 master3.bender.com systemd[1]: atomic-openshift-master-api.service: main process exited, code=exited, status=255/n/a
Dec 30 13:16:42 master3.bender.com systemd[1]: Failed to start Atomic OpenShift Master API.
Dec 30 13:16:42 master3.bender.com systemd[1]: Unit atomic-openshift-master-api.service entered failed state.
Dec 30 13:16:42 master3.bender.com systemd[1]: atomic-openshift-master-api.service failed.

Comment 1 Paul Weil 2016-01-04 21:17:37 UTC
*** Bug 1288974 has been marked as a duplicate of this bug. ***

Comment 3 Andy Goldstein 2016-01-12 02:16:29 UTC
Not a 3.1.1 blocker

Comment 5 Andy Goldstein 2016-01-20 14:34:40 UTC
*** Bug 1300164 has been marked as a duplicate of this bug. ***

Comment 6 Andy Goldstein 2016-01-20 14:36:01 UTC
Origin PR: https://github.com/openshift/origin/pull/6737

Comment 11 Josep 'Pep' Turro Mauri 2016-02-01 19:32:32 UTC
*** Bug 1301217 has been marked as a duplicate of this bug. ***

Comment 14 DeShuai Ma 2016-02-15 10:03:24 UTC
Test on latest v3.1 Errata puddle atomic-openshift-3.1.1.6-3.git.16.5327e56.el7aos
Can't reproduce this issue. 
Verity this bug.

Steps:
1. Create a HA env
2. Restart atomic-openshift-master-api.services at same time
3. Check master-api service is running.

Comment 16 errata-xmlrpc 2016-02-23 20:31:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0293