Bug 871660

Summary:

Scaled application has "dud" gears when overlapping scale-up requests occur

Product:

OKD

Reporter:

Ram Ranganathan <ramr>

Component:

Pod

Assignee:

Dan McPherson <dmcphers>

Status:

CLOSED DUPLICATE

QA Contact:

libra bugs <libra-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

2.x

CC:

mfisher

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-10-31 13:57:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Double configure error logs	none

Description Ram Ranganathan 2012-10-31 01:29:02 UTC

Created attachment 635887 [details]
Double configure error logs

Description of problem:
I was running Apache benchmark on a scaled application and noticed that there was a "dud" gear in the haproxy configuration. On investigating this, found that  configure was called twice on a gear and that gear's dns was never removed. 

Version-Release number of selected component (if applicable):
Current release (2.0.19)

How reproducible:
Sometimes

Steps to Reproduce:
1.   Create a scaled application
2.   Add mysql to the app 
3.   Use the rails quickstart  - https://github.com/openshift/rails-example
4.   Add a junk folder with 2K files (201 MB worth) -- this is to simulate a customer 
      issue on prod where there were a lot of files and sync was taking a while.
      for i in `seq 2048`; do dd if=/dev/zero of=junk-$i count=100 bs=1024; done
5.  Add, commit and push to the app.
6.  On the devenv, run apache benchmark 
        ab -n 100000 -c 23 http://$app-$namespace.dev.rhcloud.com/
7.  You may optionally manually scale up while a scale up event occurs to 
     reproduce this (seems to happen when scaleups occur in parallel)

  
Actual results:
Should expect a valid configuration and haproxy running correctly

Expected results:
Mismatch between haproxy configs (running + gear registry).

Additional info:

This also leaves a lot of defunct shell process -- run from mcollective. The reason 
is the sync gears takes a while because of the # of files + size and because ab is actually "bashing" the haproxy gear with requests. 

At the very least we should clear up the gear's dns entry -- we do delete the gear. 

In the attached logs -- look for the gear  b1db43744f6a4fd7934c165ba53373c0 and you'll see configure called twice:
at line #s 10391 and 10802 in the mcollective logs: 
10391   {:cartridge=>"ruby-1.9",
10392    :process_results=>true,
10393    :args=>"'b1db43744f' 'rr50' 'b1db43744f6a4fd7934c165ba53373c0'",
10394    :action=>"configure"},

10802   {:cartridge=>"ruby-1.9",
10803    :process_results=>true,
10804    :args=>"'b1db43744f' 'rr50' 'b1db43744f6a4fd7934c165ba53373c0'",
10805    :action=>"configure"},

on the same gear.

Comment 1 Dan McPherson 2012-10-31 13:57:00 UTC

Should be fixed with model refactor

*** This bug has been marked as a duplicate of bug 855307 ***