project_key: JBPAPP6 As a follow up on * [JBPAPP-9195] mod_cluster: HTTP 503 on node shutdown with pure IPv6 setup I have tried this mod_cluster + httpd bundle featuring *Apache/2.2.21* (Unix) *mod_cluster/1.2.1.Final* (unlike in [JBPAPP-9195] where we used Apache/2.2.17 (Unix) DAV/2 mod_cluster/1.2.1.Beta2) * [mod_cluster-1.2.1.Final-linux2-x64.tar.gz|http://hudson.qa.jboss.com/hudson/view/Mod_cluster/job/mod_cluster-linux-x86_64-rhel6/47/artifact/jbossnative/build/unix/output/mod_cluster-1.2.1.Final-linux2-x64.tar.gz] the result is surprising: Very frequent HTTP 404 errors on node shutdown. h3. Http client I have a curl client issuing requests to [2620:52:0:105f::ffff:c]:8888/SessionTest/hostname periodically, delay being 1 s. Note that there is always a new session for each request (no JSESSIONID stuff anywhere). There are two nodes I switch off and on randomly, always giving enough time so as the starting one may take off safely. {noformat} Wed May 30 17:00:13 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0 +++ No errors in meanwhile +++ Wed May 30 17:05:24 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0 Wed May 30 17:05:25 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server. +++ HTTP 404 errors keep showing up every second +++ Wed May 30 17:05:58 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server. Wed May 30 17:05:59 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0 +++ No errors in meanwhile +++ Wed May 30 17:06:03 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0 Wed May 30 17:06:04 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server. +++ HTTP 404 errors keep showing up every second +++ Wed May 30 17:06:08 EDT 2012 404 Not Found The requested URL /SessionTest/hostname was not found on this server. Wed May 30 17:06:09 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0 +++ No errors in meanwhile +++ Wed May 30 17:06:25 EDT 2012 [2620:52:0:105f::ffff:c]:8888 0 {noformat} please, note the time stamps marking HTTP 404 errors, we will match them against the attached debug logs. h4. IO error (i) *Note:* At *17:05:24* node vmg36 was switched off and vmg35 (up and running by that time) was supposed to take over. What actually happened with *vmg35* was the undermentioned *IO error sending command CONFIG to proxy* exception at *17:05:29*, which is 5 seconds after the vmg36's shutdown. Hmmm...was httpd somehow too busy to accept the command? (i) *Note:* Does the fact that nodes are talking via proxy-01.mw.lab.eng.bos.redhat.com (squid/3.1.10) anything to do with the problem on hand? h3. Worker nodes The configuration is exactly the same as in [JBPAPP-9195], I just swapped the balancer. If you take a look at the attached * node-vmg35-Ctrl+C-log.zip * node-vmg36-Ctrl+C-log.zip you may observe the shutdown time stamps ( *^C* ) as well as several exceptions: *vmg35, IP:2620:52:0:105f:0:0:ffff:c, JvmRoute:f49689d6-cdbb-3015-a642-f8200ea456ff* * 17:04:26,550 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Problems unmarshalling remote command from byte buffer: java.lang.NullPointerException * 17:05:29,133 INFO [org.jboss.modcluster.mcmp.impl.DefaultMCMPHandler] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) IO error sending command CONFIG to proxy 2620:52:0:105f:0:0:ffff:c/2620:52:0:105f:0:0:ffff:c:8888: java.net.SocketTimeoutException: Read timed out *vmg36, IP:2620:52:0:105f::ffff:0, JvmRoute:dc7bd552-a020-3d08-acee-4ae3e0f178a8* * 17:03:36,275 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] Problems unmarshalling remote command from byte buffer: java.lang.NullPointerException h3. Httpd There is the attached *error_log_report.zip* I am about to investigate. I have not managed to see what was wrong yet. The promising reading probably lay between *17:05:24* and *17:05:29* throughout to the glitch at *17:05:59* and *17:05:58*. (i) *Note:* I have not yet carried the IPv4/IPv6 comparison out, the fact that this issue is IPv6 / network related is just a suspicion. To be continued...
Link: Added: This issue Cloned from MODCLUSTER-314
Link: Added: This issue is related to JBPAPP-8466
Link: Added: This issue is related to JBPAPP-9195
Security: Added: Public Docs QE Status: Added: NEW
I have cloned the Issue so as to have it in JBPAPP space as well.
As it is JBPAPP-8466.
Changing the JIRA title as per Jean-Frederic's comment on the upstream JIRA "BTW: I don't think it is related to IPv6 I managed to have it on IPv4 but it is really seldom, it is probably related to some timing issues."
Link: Added: This issue is a dependency of JBPAPP-9188
According to the trace it seems there is a network issue during the test, that is why we see 404. from 17:04:44 to ~ 17:05:59 AS7 is not able to send the CONFIG+ENABLE-APP that explains the 404. I have tried something similar on IPv4 yesterday I can't reproduce it.
I might have misunderstood Jean-Frederic's comment when I changed the JIRA title to remove ipv6 The bug was filed for ipv6, but was verified as not reproducible on ipv4? Michal, can you comment if this issue is specific to ipv6 or applies to ipv4 as well? Could you also look at the potential network issue in the above comment?
I can't reproduce the issue here using 3 boxes directly connected. tried IPv6 and IPv4.
Docs QE Status: Removed: NEW
It's been impossible to reproduce it since EAP 6.3.