Bug 871660

Summary: Scaled application has "dud" gears when overlapping scale-up requests occur
Product: OKD Reporter: Ram Ranganathan <ramr>
Component: PodAssignee: Dan McPherson <dmcphers>
Status: CLOSED DUPLICATE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: mfisher
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-31 13:57:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Double configure error logs none

Description Ram Ranganathan 2012-10-31 01:29:02 UTC
Created attachment 635887 [details]
Double configure error logs

Description of problem:
I was running Apache benchmark on a scaled application and noticed that there was a "dud" gear in the haproxy configuration. On investigating this, found that  configure was called twice on a gear and that gear's dns was never removed. 

Version-Release number of selected component (if applicable):
Current release (2.0.19)

How reproducible:
Sometimes

Steps to Reproduce:
1.   Create a scaled application
2.   Add mysql to the app 
3.   Use the rails quickstart  - https://github.com/openshift/rails-example
4.   Add a junk folder with 2K files (201 MB worth) -- this is to simulate a customer 
      issue on prod where there were a lot of files and sync was taking a while.
      for i in `seq 2048`; do dd if=/dev/zero of=junk-$i count=100 bs=1024; done
5.  Add, commit and push to the app.
6.  On the devenv, run apache benchmark 
        ab -n 100000 -c 23 http://$app-$namespace.dev.rhcloud.com/
7.  You may optionally manually scale up while a scale up event occurs to 
     reproduce this (seems to happen when scaleups occur in parallel)

  
Actual results:
Should expect a valid configuration and haproxy running correctly

Expected results:
Mismatch between haproxy configs (running + gear registry).

Additional info:

This also leaves a lot of defunct shell process -- run from mcollective. The reason 
is the sync gears takes a while because of the # of files + size and because ab is actually "bashing" the haproxy gear with requests. 

At the very least we should clear up the gear's dns entry -- we do delete the gear. 

In the attached logs -- look for the gear  b1db43744f6a4fd7934c165ba53373c0 and you'll see configure called twice:
at line #s 10391 and 10802 in the mcollective logs: 
10391   {:cartridge=>"ruby-1.9",
10392    :process_results=>true,
10393    :args=>"'b1db43744f' 'rr50' 'b1db43744f6a4fd7934c165ba53373c0'",
10394    :action=>"configure"},

10802   {:cartridge=>"ruby-1.9",
10803    :process_results=>true,
10804    :args=>"'b1db43744f' 'rr50' 'b1db43744f6a4fd7934c165ba53373c0'",
10805    :action=>"configure"},

on the same gear.

Comment 1 Dan McPherson 2012-10-31 13:57:00 UTC
Should be fixed with model refactor

*** This bug has been marked as a duplicate of bug 855307 ***