Bug 820827 - app gear of a scalable can not be waken up from idle status.
app gear of a scalable can not be waken up from idle status.
Status: CLOSED CURRENTRELEASE
Product: OpenShift Origin
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
high Severity medium
: ---
: ---
Assigned To: Ram Ranganathan
libra bugs
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-11 01:31 EDT by Johnny Liu
Modified: 2015-05-14 18:54 EDT (History)
3 users (show)

See Also:
Fixed In Version: > cartridge-haproxy-0.10.3-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-08 13:59:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Johnny Liu 2012-05-11 01:31:48 EDT
Description of problem:
I have a scalable app created in sprint 10. 
Today, I am testing stage for sprint 11, when I access this scalable app, I just can got proxy-status page, can not get app index page.


The following is some investigation's for this issue.

Check gear's status:
$ curl -k -X GET -H 'Accept: application/xml' --user bmeng+1@redhat.com:xxxxx https://stg.openshift.redhat.com/broker/rest/domains/bmeng1stg/applications/perl1s/gear_groups
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <data>
    <gear-group>
      <gears>
        <gear>
          <state>idle</state>
          <id>085204a040b64223ad28a76ab87514d0</id>
        </gear>
      </gears>
      <name>@@app/comp-web/cart-perl-5.10</name>
      <cartridge>
        <cartridge>
          <name>perl-5.10</name>
        </cartridge>
      </cartridge>
    </gear-group>
    <gear-group>
      <gears>
        <gear>
          <state>started</state>
          <id>64f18266886b430183a1bf632dbb6537</id>
        </gear>
      </gears>
      <name>@@app/comp-proxy/cart-haproxy-1.4</name>
      <cartridge>
        <cartridge>
          <name>perl-5.10</name>
        </cartridge>
        <cartridge>
          <name>haproxy-1.4</name>
        </cartridge>
      </cartridge>
    </gear-group>
    <gear-group>
      <gears>
        <gear>
          <state>idle</state>
          <id>bb0d0835f91248389de853b6f3bc72e6</id>
        </gear>
      </gears>
      <name>@@app/comp-proxy/cart-mysql-5.1/group-mysql</name>
      <cartridge>
        <cartridge>
          <name>mysql-5.1</name>
        </cartridge>
      </cartridge>
    </gear-group>
  </data>
  <supported-api-versions>
    <supported-api-version>1.0</supported-api-version>
  </supported-api-versions>
  <messages/>
  <version>1.0</version>
  <type>gear_groups</type>
  <status>ok</status>
</response>

Found 085204a040b64223ad28a76ab87514d0 gear is still idled, even I try to access this app for several times.

Then I access 085204a040b64223ad28a76ab87514d0 gear's own DNS directly - 085204a040-bmeng1stg.stg.rhcloud.com. Check gear status again.
$ curl -k -X GET -H 'Accept: application/xml' --user bmeng+1@redhat.com:xxxxx https://stg.openshift.redhat.com/broker/rest/domains/bmeng1stg/applications/perl1s/gear_groups
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <data>
    <gear-group>
      <gears>
        <gear>
          <state>started</state>
          <id>085204a040b64223ad28a76ab87514d0</id>
        </gear>
      </gears>
      <name>@@app/comp-web/cart-perl-5.10</name>
      <cartridge>
        <cartridge>
          <name>perl-5.10</name>
        </cartridge>
      </cartridge>
    </gear-group>
    <gear-group>
      <gears>
        <gear>
          <state>started</state>
          <id>64f18266886b430183a1bf632dbb6537</id>
        </gear>
      </gears>
      <name>@@app/comp-proxy/cart-haproxy-1.4</name>
      <cartridge>
        <cartridge>
          <name>perl-5.10</name>
        </cartridge>
        <cartridge>
          <name>haproxy-1.4</name>
        </cartridge>
      </cartridge>
    </gear-group>
    <gear-group>
      <gears>
        <gear>
          <state>idle</state>
          <id>bb0d0835f91248389de853b6f3bc72e6</id>
        </gear>
      </gears>
      <name>@@app/comp-proxy/cart-mysql-5.1/group-mysql</name>
      <cartridge>
        <cartridge>
          <name>mysql-5.1</name>
        </cartridge>
      </cartridge>
    </gear-group>
  </data>
  <supported-api-versions>
    <supported-api-version>1.0</supported-api-version>
  </supported-api-versions>
  <messages/>
  <version>1.0</version>
  <type>gear_groups</type>
  <status>ok</status>
</response>

Now 085204a040b64223ad28a76ab87514d0 gear is "started" status.
Access this scalable app' url (perl1s-bmeng1stg.stg.rhcloud.com) again, this time get app's index page successfully.


Version-Release number of selected component (if applicable):
stage sprint 11

How reproducible:
Always

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Ram Ranganathan 2012-05-14 19:12:10 EDT
Need to check why the idler is not passing requests down/starting the serving gears  -- as per the haproxy config it will send an http request (GET /) and only mark the gear down if it returns a non 2xx or 3xx status code. Some return code is marking that server as down.
Comment 2 Mike McGrath 2012-05-15 13:14:19 EDT
I think the correct fix right now for this is to disable idling for gears that are part of a scaled environment.  We might be able to detect that if gear name == looks like a UUID?  That's kind of hackish.

Raising severity on this to try to get it in to this sprint.  Ram if you won't be able to get it in this sprint let me know.
Comment 3 Ram Ranganathan 2012-05-15 14:55:46 EDT
Or another approach is we could send head requests to the slave gears on startup and wait for 'em to come up and then make haproxy reload the config. 
I'll take a look at this later this week/early next.
Comment 4 Ram Ranganathan 2012-05-22 19:11:07 EDT
Aah, the issue here is that the haproxy.cfg now contains the internal ip address and exposed port # of the serving gears -- so we issue a direct request to that exposed port and bypass the idler wakeup mechanism (restorer.php). This means the idled serving gears are never started up. 

Added a ping-serving-gears task to the haproxy_ctld to run every hour since haproxy_ctld is started [configurable inside the haproxy_ctld code for now] and that will ensure that on startup (on restore back from idling), we restart the serving gears as well.
Comment 5 Ram Ranganathan 2012-05-22 22:31:34 EDT
On second thoughts, the idling is something that's intentionally done -- so fixed this by just issuing a ping/wakeup on the serving gears at startup time. 

Also this means that the entire scalable app should get idled on inactivity -- 
not just the serving gears ... and the wakeup will now be done by 
the haproxy startup.

Fixed with git commit: 2a6c5d0a06be86320ac25607f702408f873fd039
Comment 6 Johnny Liu 2012-05-28 03:06:25 EDT
Verified this bug with devenv_1806, and PASS.

Note You need to log in before you can comment on or make changes to this bug.