Description of problem: If try to create a non-scalable jbosseap-6.0 app, it would pop up error "Could not find any OpenShift resource at "http://eab3-mytest1.cdn.com/health" Version-Release number of selected component (if applicable): http://download.lab.bos.redhat.com/rel-eng/OpenShiftEnterprise/1.1/2013-01-14.3/ 2.4.0.Final-v20130114-2102-B98 with eclipse 4.2.0 How reproducible: always Steps to Reproduce: 1.Launch JBT and create a non-scalable jbosseap-6.0 app Actual results: It's failed to create a non-scalable jbosseap app. Please refer to detailed screen shots and log as attached. Expected results: It should create jbosseap app successfully. Additional info: 1. It can create scalable jbosseap-6.0 app. 2. It works well on devenv_2673 with the same JBT version.
Created attachment 678737 [details] jbosseap
Created attachment 678738 [details] jbosseap log
I investigated this issue for a while this afternoon. What appears to be going on is that JBoss EAP 6.0 is not queuing the incoming requests. Here's what happens: 1 Gear is created 2 JBoss starts 3 The healthcheck request comes in 4 JBoss returns a 404 for the healthcheck 5 Wait 10-15 seconds 6 If you manually GET the healthcheck it will return 1 as expected EWS behaves as expected. In the case of a scalable application haproxy is returning the healthcheck. That also behaves as expected. I believe rhc has retry logic for the healthcheck so it also works. Bill, do you know of a way to have JBoss block on startup and wait until all applications are deployed? I understand why that might not be the default for JBoss but I think it would be helpful in this case for OpenShift.
What do you mean by "block on startup until all applications are deployed"? Do you mean not return that it's been created/started until all apps have been deployed? That could easily be several minutes if not more. I'll look into this today. Interesting that scaled works and non-scaled doesn't when the former takes a lot longer to start.
The default app (i.e. ROOT.war) contains the /health jsp. If this is removed then the app will never have a valid health check. But I don't see any difference between scaled and non-scaled. Either one just needs ROOT.war deployed for /health to be valid. Other deployments can slow down the deployment of ROOT.war however. IMO depending on the health check after the initial app creation is fragile as the user could easily remove it from the application. The HAProxy health/status is available at /haproxy-status/
Yes, that's what I was referring to by block. A long time ago I used JBoss AS 4.3 and that was the behavior. It definitely makes sense for a traditional environment running dozens of applications to start as fast as possible and deploy everything in the background however in the case of OpenShift it seems more consistent with all the other cartridges if we block before the application is deployed. If that isn't possible with later versions of JBoss the clients will have to be modified to handle the 404s for the healthcheck. Personally I'm not a big fan of retry logic.
The older AS4/5 and AS7 behavior is essentially the same. Core services are loaded and then the user deployments are loaded in a specific order varying on dependencies. The concept of when AS is started is (e.g. healthcheck) is purely an OpenShift thing. We should not be using the healthcheck for anything past the initial creation/start of AS/EAP as that application/url may not exist past that point and the only application that exists in the initial create/start is the trivial default ROOT.war which takes negligible time to deploy. A safer test to see if the app server is up to see if you can hit http://whatever not http://whatever/health but even that could take several minutes depending on the number and complexity of the app server. Probably the safest thing to do is see if you can make a socket connection to AS/EAP and not rely on there being a deployed webapp. Can we discuss on IRC/phone when you have a chance?
Sure, I'm in Beijing right now so I'm not sure how much time overlap we'll have. I'll be back in the States in a week. We must have done something interesting with our AS setup in IT. I know for a fact that JBoss would not accept requests until all applications were loaded in our environment. Our loadbalancers depended on that fact. It also had it's problems because one application that took a long time to deploy would effectively block all the applications from receiving requests. I agree that the testing for the socket connection is probably the better approach for writers of clients. If that's the simplest approach then that might be what I suggest to the JBT team.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
I think the above comment from 'Development Management' was sent in error. We plan to fix this soon.
we should add https://issues.jboss.org/browse/JBIDE-13569 as external issue tracker to this bugzilla so that the JBT and this bugzilla get synced. Unfortunately I dont have the permissions to do that.
If I create a DIY application I get the required health-check response. If I look into it's content I cant spot anything that would produce this. Isn't the health-response produced outside the DYI cartridge? Isn't that startegy also possible for EAP/AS7 to avoid long bootup times?
For a DIY /health is configured in Apache and points to an html file in the cartridge. If we put the health check outside of the cartridge then it becomes completely useless - it doesn't indicate health of the app at all. We need to get rid of this health check logic.
Maybe we are viewing health check differently here. For me there were in the past three "parts" that could often fail: A) DNS available (the whole infrastructure bit) B) The cartridge running (i.e. php, ruby, eap, as7, etc.) ready to serve content C) the user application deployed/running. For me /health was done to check A+B. C is never possible to reliably check IMO since user can have deployed anything. So if C is what was meant for /health then yes - I agree we should remove it; it has zero reliability.
To be clear - the health check was IMO introduced for us to have way to check the openshift mechanics had compleeted and was ready *without* triggering any logic in the users app.
+1 Each cartridge is deploying a template app on creation that supports /health. This is bogus for what we are trying to do and really gets ugly when we start deploying non-web cartridges. B is guaranteed by the app creation. I believe A is provided by the client (rhc). We could have the java client on app create wait until DNS resolves to return or add another call to confirm that DNS has resolved. The /health check has to go IMO
java client already does DNS check - when it didn't all kind of problems occurs ;) java client actually does the health check waiting for a 200 but for some reason it was changed to repeat on 404 to fail on 404.. but as you say, health is dependent on deployment content which is broken in the world of using other github repos. So yeah, the current health check should go from the client. Any chance that we could make the AS7/EAP cartridge fake a response to /health to make older clients not fail ? Or is that a stretch ? If not we'll have to cosnider JBDS 5 and 6 and forge currently broken for app creation (at least at the times OpenShift is "slow")
The AS/EAP template app does provide /health. Looks like this is a timing issue. There is some evidence that the recent prod push slowed down app creation and deployment of the template app.
Back in the very early stages, before I had the health-check implemented, a lot of weird effects happened without the additional wait. Thus back then the wait brought sanity to the table. I now tried using JBT without the health-check (just kept the DNS-wait) and things looked pretty sane and stable: i could embed jenkins-client to a freshly created eap (and jenkins). I now also tried the very same with some integration tests and the big picture also looks pretty good here. So I dont have any objections to drop the health-check in the upcoming versions of the openshift-java-client. Remains how to deal with our existing JBDS/JBT installations...
Tried on the latest puddle and openshift plugin, found it could work well now. Please get details as below: Build: Openshift Enterprise Punddle 1.1.z/2013-02-18.3/ Eclipse Juno with openshift plugin 2.4.0.Final-v20130221-0317-B118 Steps: 1. Create a jbosseap app via openshift explorer in Eclipse. Actual results: It could be created successfully for cartridge jbosseap-6.0 this time.
Andre Dietisheim <adietish> made a comment on jira JBIDE-13569 removed the wait for "health" from openshift-java-client (IApplication#waitForAccessible). The lib will only wait for successful DNS resolution as the rhc cmd line client does.
I removed the wait for health from the openshift-java-client, so upcoming versions of JBT are safe as long as they use the new library. But the problem is not solved for the existing users with JBT versions that still wait for health that will eventually error when AWS/OpenShift performance is poor. Couldn't we simply offer them a fake health-response served by a proxy as we already do in DIY-apps? So that they dont run into a needless error?
Have moved /health from ROOT.war to the Node Apache for the 4 JBoss carts. Do you need to move it for all carts? Have we seen /health problems with say Ruby? https://github.com/openshift/origin-server/pull/1454
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/2233674c7dc306f3de1447206ef736551dcb37d5 Bug 895507
Bill, I guess that doing it for all apps is a good idea since this would ensure existing JBDS users would be able to work with every app type, regardless the template being used. I guess using the same pattern for all apps would also be beneficial for your code?
There's no code reuse since the /health logic is per independent cartridge. The ability to control Apache is going away with the new cartridge design so as we roll out the new cartridges the /health check is going away too unless it's deployed in the carts themselves which has been the whole problem.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/17526d3866bcdfaadb17ab70253f30df621df366 Merge pull request #1474 from bdecoste/master Bug 913217 895507 [merge]
Andre Dietisheim <adietish> updated the status of jira JBIDE-13569 to Reopened
Andre Dietisheim <adietish> made a comment on jira JBIDE-13569 reopen to add pull-request
Andre Dietisheim <adietish> made a comment on jira JBIDE-13569 pushed to master
Andre Dietisheim <adietish> made a comment on jira JBIDE-13569 related commits in openshift-java-client: * https://github.com/adietish/openshift-java-client/commit/a23f557c0e5e23c8cf996090db97f8276d4d01ad * https://github.com/adietish/openshift-java-client/commit/ce60800517fcd4887a5f64789541abaf1038a137
Andre are you still seeing the base cartridge being returned in version 1.0? Looks like we are - the tests expecting 0 carts are still failing.
@Bill: yes its happening again: https://bugzilla.redhat.com/show_bug.cgi?id=911322#c12 STG and PROD are fine IMHO, it's INT (which has the latest code deployed) which has the bug again.
Verified this bug on build/OpenShiftEnterprise/1.2/2013-05-14.1 with JBDS 7.0.0 Alpha2, it works all when create jbosseap apps or other cartridges app.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1030.html