Bug 987876

Summary: app's repo in the scaled-up gear is empty and the app becomes uncontrollable after scaling up apps with REST API
Product: OpenShift Online Reporter: Zhe Wang <zhewang>
Component: PodAssignee: Rajat Chopra <rchopra>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 2.xCC: xtian
Target Milestone: ---Keywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-07 22:56:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Corresponding logs of failing to control the scaled app none

Description Zhe Wang 2013-07-24 10:58:33 UTC
Description of problem:
Given a scaling app with auto scaling disabled, the app's repo in its scaled-up gear(s) is empty. As a consequence, it will redirect to the homepage of OpenShift when visiting the app's scaled-up gear.

Version-Release number of selected component (if applicable):
devenv_3546

How reproducible:
always

Steps to Reproduce:
1. create a scaling app
2. disable its auto scaling
3. scale the app up via REST API
4. SSH into the scaled-up gear, and check the app's repo
5. visit the scaled-up gear in a browser

Actual results:
The app's repo in scaled-up gear is empty. Moreover, in step 5, it redirects to the homepage of OpenShift.

Expected results:
The repo in a scaling app's local gear should be synchronized to its scaled-up gears.

Additional info:
When checking scale_events.log, I found no add-gear logs were written in the log:

E, [2013-07-24T06:28:15.503423 #19905] ERROR -- : Failed to get information from haproxy
I, [2013-07-24T06:28:23.466808 #20166]  INFO -- : Starting haproxy_ctld
D, [2013-07-24T06:28:23.471029 #20166] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 120 gear_remove_thresh: 0/20
I, [2013-07-24T06:28:29.969353 #20266]  INFO -- : Starting haproxy_ctld
D, [2013-07-24T06:28:29.971779 #20266] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 120 gear_remove_thresh: 0/20

Comment 1 Mrunal Patel 2013-07-24 23:08:15 UTC
We weren't able to reproduce the issue. Could you please retry?

Comment 2 Zhe Wang 2013-07-26 03:03:20 UTC
Assigned this bug back since it is reproducible in INT(devenv_3561). Take a scalig php-5.3 app as an example, after scaling it up and showing its gear groups, both its local gear and scaled-up gears were displayed:

"gears": [
                {
                    "id": "51f1e37a6cec0e43bf000001", 
                    "ssh_url": "ssh://51f1e37a6cec0e43bf000001.rhcloud.com", 
                    "state": "started"
                }, 
                {
                    "id": "51f1e4c803ef647e3e000002", 
                    "ssh_url": "ssh://51f1e4c803ef647e3e000002.rhcloud.com", 
                    "state": "new"
                }
            ], 

SSH into both gears and check the repos in both gears.

Results:
In the app's local gear (i.e., 51f1e37a6cec0e43bf000001.rhcloud.com), the dirs are complete:

[sphp-zwint.int.rhcloud.com 51f1e37a6cec0e43bf000001]\> ls -l
total 16
drwxr-xr-x.  4 root                     51f1e37a6cec0e43bf000001 4096 Jul 25 22:48 app-root
drwxr-xr-x.  3 root                     root                     4096 Jul 25 22:48 git
drwxr-xr-x. 12 51f1e37a6cec0e43bf000001 51f1e37a6cec0e43bf000001 4096 Jul 25 22:49 haproxy
drwxr-xr-x. 14 51f1e37a6cec0e43bf000001 51f1e37a6cec0e43bf000001 4096 Jul 25 22:48 php

However, in its scaled-up gear (51f1e4c803ef647e3e000002.rhcloud.com), only app-root dir was displayed.

[51f1e4c803ef647e3e000002-zwint.int.rhcloud.com 51f1e4c803ef647e3e000002]\> ls -l
total 4
drwxr-xr-x. 4 root 51f1e4c803ef647e3e000002 4096 Jul 25 22:54 app-root

Moreover, when showing the repo dir under the app-root dir, the content was empty:

[51f1e4c803ef647e3e000002-zwint.int.rhcloud.com 51f1e4c803ef647e3e000002]\> ls app-root/repo/
[51f1e4c803ef647e3e000002-zwint.int.rhcloud.com 51f1e4c803ef647e3e000002]\>

Comment 3 Zhe Wang 2013-07-26 03:05:02 UTC
I have reserved the scaling php-5.3 app in INT(devenv_3561) for your further investigation, and the detail of the app is as below:

        "aliases": [], 
        "app_url": "http://sphp-zwint.int.rhcloud.com/", 
        "build_job_url": null, 
        "building_app": null, 
        "building_with": null, 
        "creation_time": "2013-07-26T02:48:26Z", 
        "domain_id": "zwint", 
        "embedded": {
            "haproxy-1.4": {}
        }, 
        "framework": "php-5.3", 
        "gear_count": 2, 
        "gear_profile": "small", 
        "git_url": "ssh://51f1e37a6cec0e43bf000001.rhcloud.com/~/git/sphp.git/", 
        "health_check_path": "health_check.php", 
        "id": "51f1e37a6cec0e43bf000001", 
        "initial_git_url": null,

Comment 4 Zhe Wang 2013-07-26 05:53:14 UTC
I confirm that this bug only happens if we scale up apps with REST API, there is no such problem when scaling up apps with "rhc cartridge scale". Therefore, I have modified the summary of this bug accordingly.

Comment 5 Zhe Wang 2013-07-26 06:02:21 UTC
More info of this bug what would be helpful:

After scaling up the app with REST API, we can confirm the app has two gears:

[zhe@fedora sphp1]$ rhc app-show --gears -a sphp
ID                       State   Cartridges          Size  SSH URL
------------------------ ------- ------------------- ----- -----------------------------------------------------------------------
51f1e37a6cec0e43bf000001 started php-5.3 haproxy-1.4 small 51f1e37a6cec0e43bf000001.rhcloud.com
51f1e4c803ef647e3e000002 new     php-5.3 haproxy-1.4 small 51f1e4c803ef647e3e000002.rhcloud.com

However, the status of the scaled-up gear is always "new"[1], and the scaled-up gear is not accessible via its URL (you could try with the URL of the scaled-up gear in Comment #2).

[1]https://bugzilla.redhat.com/show_bug.cgi?id=987777

Comment 6 Zhe Wang 2013-07-26 07:31:32 UTC
On both INT(devenv_3561) and devenv_3564, another problem caused by scaling up apps via REST API is that, it will fail to control apps after scaling it up, with the following error (Please refer to the attachment for more info):

[zhe@fedora run106]$ rhc app stop sphp1
Unable to complete the requested operation due to: Failed to correctly execute all parallel operations.
Reference ID: 5479cc0db2eab26bea94d827d94d43cd

Steps to reproduce:
1) create a sclaing app
2) scaling it up with REST API
3) try to control this app, for example, stop it via RHC

Looks like scaling up an app with REST API is not completed, making the app is kind of "locked". If this is true, the reasoning could explain this bug, Bug 98777, and the problem described in this comment.

Comment 7 Zhe Wang 2013-07-26 07:32:33 UTC
Created attachment 778692 [details]
Corresponding logs of failing to control the scaled app

Comment 8 Zhe Wang 2013-07-26 07:34:17 UTC
Sorry, I wanted to refer to Bug 987777 in Comment #6.

Comment 9 Zhe Wang 2013-07-26 07:37:54 UTC
I confirm that before controlling the app in Comment #6, the broker returned the message of successfully scaling up the app.

    "messages": [
        {
            "exit_code": 0, 
            "field": null, 
            "severity": "info", 
            "text": "Application sphp1 has scaled up"
        }
    ],

Comment 10 Zhe Wang 2013-07-29 08:56:52 UTC
The bug has been fixed in devenv_3574.

Steps:
1) create a scaling app
2) scale it up with REST API
3) show the app's gear groups
4) SSH into the scaled-up gear and checking the repo in the scaled-up gear
5) control this app, for example, stop, start, restart, reload

Results:
The scaling-up in Step 2 succeeded, and when showing the app's gear groups, the status of all gears is "started".

            "gears": [
                {
                    "id": "51f62a10fe91e30a98000001", 
                    "ssh_url": "ssh://51f62a10fe91e30a98000001.rhcloud.com", 
                    "state": "started"
                }, 
                {
                    "id": "51f62b4afe91e30a98000004", 
                    "ssh_url": "ssh://51f62b4afe91e30a98000004.rhcloud.com", 
                    "state": "started"
                }
            ], 

Moreover, the copy of the app's repo was synced to its scaled-up gear as well:

[51f62b4afe91e30a98000004-dev3574tst.dev.rhcloud.com 51f62b4afe91e30a98000004]\> ls -l
total 12
drwxr-xr-x.  4 root                     51f62b4afe91e30a98000004 4096 Jul 29 04:43 app-root
drwxr-xr-x.  3 root                     root                     4096 Jul 29 04:44 git
drwxr-xr-x. 16 51f62b4afe91e30a98000004 51f62b4afe91e30a98000004 4096 Jul 29 04:44 python

and, there was no problem to control this app.

Could you please move the bug to ON_QA, and I will verify this bug literally then.

Thanks,
Zhe Wang

Comment 11 Rajat Chopra 2013-07-29 19:06:55 UTC
Fixed with rev#0bc92a9d4b75514bf667eacdf356581b055d6b6c

Comment 12 Zhe Wang 2013-07-30 02:01:41 UTC
Move the bug to VERIFIED referring to Comment #10.