Bug 826503
Summary: | Log in screen available but user is not able to log in for several minutes after installation | |||
---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Filip Brychta <fbrychta> | |
Component: | Installer | Assignee: | Jay Shaughnessy <jshaughn> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 4.4 | CC: | hrupp, jshaughn, lzoubek | |
Target Milestone: | --- | |||
Target Release: | RHQ 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 828140 (view as bug list) | Environment: | ||
Last Closed: | 2013-08-31 09:55:19 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 828140 |
Description
Filip Brychta
2012-05-30 11:25:03 UTC
Also in this time, when you attempt to download agent, you get 200 from server, but response-length is 0. This is due to the problem described in bug 684459. The problem is that when we switch from the installer to portal war and change the root context we hit that problem. That causes a 200 to be returned (why why why?) even though the server hits an exception. Which makes the installer think that startupServlet is responding. abridged IRC Chat re: the approach to solve: (4:41:55 PM) ips: jshaughn: i think part of the problem is StartupServlet is in portal-war. at this point it really should be in coregui instead (4:42:29 PM) jshaughn: yeah (4:43:05 PM) jshaughn: there are actually two problems here, though (4:43:17 PM) ips: the other way to do it might be for the installer to ping the startupservlet w/ GET requests in a loop (4:43:41 PM) jshaughn: the first is that the jsp is saying we're done. the second is that coregui actually throws up the login box before we can actually log in (4:43:41 PM) ips: and only display the Get Started! link after it gets a 200 response (4:43:53 PM) jshaughn: yeah (4:44:02 PM) jshaughn: the fix is to delay installer (4:44:44 PM) jshaughn: I'm looking at header.jsp (4:46:09 PM) ips: interestingly, here's the mapping for the startupservlet in portal-war's web.xml: (4:46:11 PM) ips: <!-- provides startup progress information (mainly for the installer) --> (4:46:12 PM) ips: <servlet-mapping> (4:46:12 PM) ips: <servlet-name>initializer</servlet-name> (4:46:12 PM) ips: <url-pattern>/startupstatus</url-pattern> (4:46:12 PM) ips: </servlet-mapping> (4:46:39 PM) ips: is the installer actually already pinging /startupstatus as i suggested above? (4:46:43 PM) jshaughn: yeah, it's doing it (4:46:46 PM) jshaughn: see header.jsp (4:46:53 PM) ips: if so, it's not working right :) (4:47:05 PM) jshaughn: it's because of that coyote NPE (4:47:17 PM) jshaughn: thats messing things up (4:47:37 PM) jshaughn: As soon as we hit that the installer says we're done (4:47:54 PM) jshaughn: it must result from our pinging (4:49:43 PM) ips: hmm (4:51:04 PM) mfoley: ah ... the coyote NPE is from the startup servelt ... i did not know that (4:52:18 PM) jshaughn: I'm not sure how it actually manages to pass if an exception is thrown (4:52:30 PM) jshaughn: I'd have more expectd it to stay in the not ready state (4:53:18 PM) ips: yeah, me too. you'd probably have to attach w/ a javascript debugger and step through header.jsp to figure it out (4:54:04 PM) ips: did you google that coyote exception? i vaguely recall we may have traced that to being a jbossweb bug (4:54:16 PM) ips: did you ever find the original bz for this? (4:56:43 PM) ips: yeah, it is ugly in the server log an potentially alarming to users (4:56:55 PM) jshaughn: yeah, it's bad looking (4:57:04 PM) ips: would be nice to clear it up if there's a reasonably easy way (4:57:05 PM) jshaughn: I think we have a release not about it (4:57:36 PM) jshaughn: ips: https://bugzilla.redhat.com/show_bug.cgi?id=684459 (4:58:24 PM) jshaughn: it's still NEW (5:00:09 PM) jshaughn: don't see much help in https://issues.jboss.org/browse/JBWEB-188 (5:00:49 PM) jshaughn: This will likely not go away any time soon (5:10:14 PM) ips: yep. JBWEB-188 is almost 2 years old, so i'm not going to hold my breath for a fix (5:10:34 PM) jshaughn: we'd likely have to go to AS7 (5:10:49 PM) ips: and unfortunately, NPE's in the jon server log make us look bad (5:14:30 PM) jshaughn: it seems to come bak with a 200 (5:14:39 PM) jshaughn: which is bogus (5:14:51 PM) ips: if anything it would just send a 404 or some other error response to the client (5:15:04 PM) ips: yeah if it's sending a 200 that's really wacky (5:15:27 PM) jshaughn: I'm just trying to understand how it gets the code to think we're done (5:15:36 PM) ips: but it does seem that only a 200 or a 401 would get it to show the link (5:20:59 PM) jshaughn: ips, looking at that jbossweb bug, it actually states that a 200 is returned :( (5:25:00 PM) ips: jshaughn: argh (5:27:48 PM) ips: but coyote exception aside, i think we could fix this by returning some other status code other than 200 in startupservlet and then checking for that code in header.jsp (5:28:44 PM) ips: e.g. 204 (5:29:18 PM) jshaughn: hmm, that's a good idea (5:29:27 PM) ips: or even a funky non standard code like 299 :) (5:30:01 PM) jshaughn: How about 188 in honor of jbossweb-188 (5:30:11 PM) jshaughn: that may not be valid (5:30:20 PM) jshaughn: I don't know my response codes (5:31:39 PM) ips: hehee, i like it (5:31:59 PM) jshaughn: ips, so would that be a change it StartupServlet:109? (5:32:23 PM) ips: yeah and in header.jsp (5:32:31 PM) jshaughn: instead of SC_OK (5:32:37 PM) jshaughn: right, and on the receiving end (5:34:50 PM) ips: jshaughn: we also may want to increase the delay in header.jsp a bit to eg 7 or 10s (5:35:03 PM) ips: that may at least reduce the # of Coyote NPE's (5:35:47 PM) ips: then maybe post 3.1 we can look into patching jbossweb (5:37:39 PM) ips: we know it's going to take like 10 mins to fully deploy thanks to the as7 plugin, so we could probably even use a 30s or 60s delay (5:37:57 PM) jshaughn: ips, true (5:38:07 PM) jshaughn: no real reason to rush (5:38:13 PM) jshaughn: although upgrades may go faster (5:38:33 PM) jshaughn: although, not everyone uses AS7 plugin (5:38:38 PM) ips: we can also change it to not do the initial ping until after the delay has transpired (5:39:00 PM) ips: it looks like right now it will send the initial ping as soon as the page has loaded master commit 8de7734a94adcec32de43ca436a12c7edecc48d9 This is due to the fact that the problem in bug 684450 is still unresolved, and will likely remain that way until we move to a new version of AS as the underlying RHQ server platform. When we switch contexts between the installer and portal war we start having the problem described in https://issues.jboss.org/browse/JBWEB-188. The fact that it returns a 200 response code even though the request fails is what fools the installer into thinking that the server is ready. So: - Change Startup servlet to pass back an unusual response code (288)on success - Change the installer's header.jsp to look for 288 as opposed to 200. - Change StartupServlet the polling time from 5 to 15 seconds to reduce the number of coyote exceptions. - Update the installer message to let users know that it could be several minutes in this state. As it turns out this should only affect JON and not RHQ (at least not at this time: RHQ 4.4). The reason is that JON uses a later version of jboss-web, which has the problem described above. RHQ does not. This is due to bug 773739 and 773737. So, QA for Master/RHQ should simply confirm that the installer still works as expected. Verified on Version: 4.9.0-SNAPSHOT Build Number: d428376 |