Bug 1153975 - Got "Could not connect to WildFly management interface, skipping deployment verification" during start a stopped aerogear app
Summary: Got "Could not connect to WildFly management interface, skipping deployment v...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Michal Fojtik
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1154651
TreeView+ depends on / blocked
 
Reported: 2014-10-17 08:31 UTC by Yan Du
Modified: 2015-09-08 20:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1154651 (view as bug list)
Environment:
Last Closed: 2015-09-08 20:14:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yan Du 2014-10-17 08:31:42 UTC
Description of problem:

Try to start a stopped aerogear app or git push the app, got the error  "Could not connect to WildFly management interface, skipping deployment verification"

[root@Daphne test]# rhc app stop push1s
RESULT:
push1s stopped

[root@Daphne test]# rhc app start push1s
Could not connect to WildFly management interface, skipping deployment verification
RESULT:
push1s started



remote: Stopping aerogear-push cart
remote: Sending SIGTERM to wildfly:32435 ...
remote: Stopping MySQL 5.5 cartridge
remote: /usr/bin/oo-exec-ruby: line 8: /bin/rpm: Permission denied
remote: Building git ref 'master', commit d66a4b6
remote: Preparing build for deployment
remote: Deployment id is e3425bda
remote: Activating deployment
remote: Starting MySQL 5.5 cartridge
remote: /usr/bin/oo-exec-ruby: line 8: /bin/rpm: Permission denied
remote: Deploying WildFly
remote: ls: cannot access /var/lib/openshift/5440be1dd20b7de1e2000003/app-root/runtime/repo//deployments: No such file or directory
remote: Starting aerogear-push cart
remote: Found 127.1.249.1:8080 listening port
remote: Found 127.1.249.1:9990 listening port
remote: CLIENT_MESSAGE: Could not connect to WildFly management interface, skipping deployment verification
remote: -------------------------
remote: Git Post-Receive Result: success
remote: Activation status: success
remote: Deployment completed with status: success
To ssh://5440be1dd20b7de1e2000003.rhcloud.com/~/git/push1.git/
   22f9be2..d66a4b6  master -> master



Version-Release number of selected component (if applicable):
devenv_5242



How reproducible:
always



Steps to Reproduce:
1. Create aerogear app from website
2. rhc app stop $app
3. rhc app start $app
4. Make some change and git push



Actual results:
Same as description


Expected results:
App could start normally after stopped


Additional info:

Comment 1 Michal Fojtik 2014-10-17 10:22:07 UTC
remote: /usr/bin/oo-exec-ruby: line 8: /bin/rpm: Permission denied

I think this is due to this:

https://github.com/openshift/origin-server/blob/master/util-scl/oo-exec-ruby#L8

We also call oo-exec-ruby when we do oo-erb, which is used to process a lot of gear ERB files.

Adam, can we use something else than RPM to get the system ruby version? (/usr/bin/ruby -v?)

Comment 2 Michal Fojtik 2014-10-17 13:20:03 UTC
This might be related to the bug above, QA can you please re-test?

Comment 3 Yan Du 2014-10-20 04:48:02 UTC
Retest on devenv_5248, the issue still can be reproduced after the "remote: /usr/bin/oo-exec-ruby: line 8: /bin/rpm: Permission denied" bug fixed. (https://bugzilla.redhat.com/show_bug.cgi?id=1153889)

remote: Stopping aerogear-push cart
remote: Sending SIGTERM to wildfly:701 ...
remote: Stopping MySQL 5.5 cartridge
remote: Building git ref 'master', commit fd897c9
remote: Preparing build for deployment
remote: Deployment id is 3a212416
remote: Activating deployment
remote: Starting MySQL 5.5 cartridge
remote: Deploying WildFly
remote: ls: cannot access /var/lib/openshift/5444b91f6f9958d456000003/app-root/runtime/repo//deployments: No such file or directory
remote: Starting aerogear-push cart
remote: Found 127.1.245.1:8080 listening port
remote: Found 127.1.245.1:9990 listening port
remote: CLIENT_MESSAGE: Could not connect to WildFly management interface, skipping deployment verification
remote: -------------------------
remote: Git Post-Receive Result: success
remote: Activation status: success
remote: Deployment completed with status: success
To ssh://5444b91f6f9958d456000003.rhcloud.com/~/git/push1.git/

Comment 4 Michal Fojtik 2014-10-20 08:44:10 UTC
Seems like the commit was not there yet, can you please re-test:

This seems to be merged now:
https://github.com/openshift/origin-server/pull/5883

Comment 5 Michal Fojtik 2014-10-20 09:12:11 UTC
Sorry, wrong BZ ;-)

Comment 6 Michal Fojtik 2014-10-20 11:21:35 UTC
PR (to fix the 'ls' error message).

https://github.com/aerogear/openshift-origin-cartridge-aerogear-push/pull/9

I'm not sure about the Wildfly error. Farah?

Comment 7 Farah Juma 2014-10-20 14:26:34 UTC
The error message [1] just indicates that the deployment scanner hasn't finished running yet. Notice that the getscanconfig method [2] only attempts to get the deployment scanner configuration a certain number of times and if it hasn't finished running by then, the deployment verification step gets skipped. Note though that ag-push.war and auth-server.war do still get deployed successfully though. It looks like increasing the number of attempts made in the getscanconfig method should improve things but it might take some testing to figure out what number of attempts would be good to use.

[1] https://github.com/aerogear/openshift-origin-cartridge-aerogear-push/blob/master/bin/control#L40
[2] https://github.com/aerogear/openshift-origin-cartridge-aerogear-push/blob/master/bin/control#L18

Comment 8 Michal Fojtik 2014-10-21 10:56:01 UTC
I'm fine with increasing the number of scans for now (to fix this issue). Do you want me to do a PR for this?

Also we can make it configurable on the top of the control file:

DEPLOYMENT_SCAN_TIMEOUT=N

Also if this is not an error and the war files get deployed anyway, perhaps we should not show the error to users, just give them warning about scanner was not able to deploy the wars in time.

Comment 9 Farah Juma 2014-10-21 13:43:43 UTC
Thanks, Michal - a PR would be great. A warning message instead of an error message is a good idea as well.

Comment 10 Farah Juma 2014-10-21 14:04:00 UTC
Merged Michal's PR:

https://github.com/aerogear/openshift-origin-cartridge-aerogear-push/pull/9

Comment 11 Yan Du 2014-10-22 06:08:24 UTC
Test on devenv_5256

The PR seems only fixed the issue for git push

remote: Stopping aerogear-push cart
remote: Sending SIGTERM to wildfly:20854 ...
remote: Syncing git content to other proxy gears
remote: Building git ref 'master', commit 21a6dc8
remote: Preparing build for deployment
remote: Deployment id is 87dda726
remote: Activating deployment
remote: HAProxy already running
remote: HAProxy instance is started
remote: Deploying WildFly
remote: WARNING: The ./deployments directory not found, skipping sync.
remote: Starting aerogear-push cart
remote: Found 127.1.245.129:8080 listening port
remote: Found 127.1.245.129:9990 listening port
remote: /var/lib/openshift/54475ea051153f0b85000058/aerogear-push/standalone/deployments /var/lib/openshift/54475ea051153f0b85000058/aerogear-push
remote: /var/lib/openshift/54475ea051153f0b85000058/aerogear-push
remote: CLIENT_MESSAGE: Artifacts deployed: ./auth-server.war ./ag-push.war
remote: -------------------------
remote: Git Post-Receive Result: success
remote: Activation status: success
remote: Deployment completed with status: success
To ssh://54475ea051153f0b85000058.rhcloud.com/~/git/push1s.git/


When start a stopped app, the error still shown.

[root@openshift test]# rhc app stop push1a
RESULT:
push1a stopped
[root@openshift test]# rhc app start push1a
Could not connect to WildFly management interface, skipping deployment verification

RESULT:
push1a started

Comment 12 Michal Fojtik 2014-10-22 08:54:00 UTC
Farah, I think we can consider the above as not an error, right? I don't get why the message is printed out, it seems to use the same logic as for git push. 

Maybe increase the timeout more than 20 seconds?

Comment 13 Farah Juma 2014-10-22 14:42:00 UTC
Yes, I agree that we can consider the above as not an error. The logic being used for git push does seem the same as stopping and starting an app. However, if the deployment verification step consistently gets skipped when stopping and starting an app, increasing the timeout seems reasonable.

Comment 14 Michal Fojtik 2015-07-02 11:57:03 UTC
Yan Du: Given the above, I think we can move this bug to VERIFIED as the stop/start is not a bug (and might be fixed with increasing the timeout).

Comment 15 Yan Du 2015-07-03 06:34:24 UTC
Move bug to verified according the above comments.


Note You need to log in before you can comment on or make changes to this bug.