This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 963723 - mod_cluster: proxy DNS lookup failure with IPv6 on Solaris
mod_cluster: proxy DNS lookup failure with IPv6 on Solaris
Status: VERIFIED
Product: JBoss Enterprise Web Server 2
Classification: JBoss
Component: mod_cluster (Show other bugs)
2.0.1
Unspecified Unspecified
unspecified Severity high
: ER02
: 2.1.0
Assigned To: Jean-frederic Clere
Michal Karm Babacek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-16 09:19 EDT by Michal Karm Babacek
Modified: 2015-09-30 14:03 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously in JBoss Enterprise Web Server, Java returned IPv6 with a zone value similar to <literal>2001:db8:0:f101::1%2</literal>. Subsequently, when returning a node address, the <parameter>modcluster</parameter> subsystem sent the IPv6 information as it existed in Java. In Solaris, <methodname>apr_sockaddr_info_get()</methodname> did not support the returned format and failed to resolve the IP as a host name. As a result, httpd mod_cluster did not work as expected IPv6 node addresses. This issue is fixed in JBoss Enterprise Web Server 2.1.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker MODCLUSTER-339 Critical Closed "proxy: DNS lookup failure" with IPv6 on Solaris 2016-07-31 15:12 EDT

  None (edit)
Description Michal Karm Babacek 2013-05-16 09:19:04 EDT
https://issues.jboss.org/browse/MODCLUSTER-339
Comment 1 JBoss JIRA Server 2013-05-16 10:39:55 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

As a comparison, here is a healthy debug log from a mod_cluster IPv6 test on RHEL [^error_log-mod_cluster-RHEL].
Comment 2 JBoss JIRA Server 2013-05-16 11:22:33 EDT
Jean-Frederic Clere <jfclere@jboss.org> made a comment on jira MODCLUSTER-339

%5B2620%3A52%3A0%3A105f%3A0%3A0%3Affff%3A50%252%5D
that is [ ... %2] that is not a valid address.
what is configured on the AS7 side?
Comment 3 JBoss JIRA Server 2013-05-16 12:10:15 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

{code}
<interfaces>
    <interface name="management">
        <inet-address value="2620:52:0:105f::ffff:50"/>
    </interface>
    <interface name="public">
        <inet-address value="2620:52:0:105f::ffff:50"/>
    </interface>
    <interface name="unsecure">
        inet-address value="${jboss.bind.address.unsecure:127.0.0.1}"/>
    </interface>
</interfaces>
<socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
    <socket-binding name="management-native" interface="management" port="${jboss.management.native.port:9999}"/>
    <socket-binding name="management-http" interface="management" port="${jboss.management.http.port:9990}"/>
    <socket-binding name="management-https" interface="management" port="${jboss.management.https.port:9443}"/>
    <socket-binding name="ajp" port="8009"/>
    <socket-binding name="http" port="8080"/>
    <socket-binding name="https" port="8443"/>
    <socket-binding name="jgroups-mping" port="0" multicast-address="ff01::3" multicast-port="45700"/>
    <socket-binding name="jgroups-tcp" port="7600"/>
    <socket-binding name="jgroups-tcp-fd" port="57600"/>
    <socket-binding name="jgroups-udp" port="55200" multicast-address="ff01::3" multicast-port="45688"/>
    <socket-binding name="jgroups-udp-fd" port="54200"/>
    <socket-binding name="modcluster" port="0" multicast-address="ff01::7" multicast-port="23964"/>
    <socket-binding name="remoting" port="4447"/>
    <socket-binding name="txn-recovery-environment" port="4712"/>
    <socket-binding name="txn-status-manager" port="4713"/>
    <outbound-socket-binding name="mail-smtp">
    <remote-destination host="localhost" port="25"/>
    </outbound-socket-binding>
</socket-binding-group>
{code}
Comment 4 JBoss JIRA Server 2013-05-16 12:46:18 EDT
Jean-Frederic Clere <jfclere@jboss.org> made a comment on jira MODCLUSTER-339

Ok it seems EAP/AS adds the %2 which causes problem on solaris in the URL. That needs to be fixed.
Comment 5 JBoss JIRA Server 2013-05-17 07:31:16 EDT
Jean-Frederic Clere <jfclere@jboss.org> made a comment on jira MODCLUSTER-339

It looks like apr behaves differently on Solaris and  Linux:
    rv = apr_sockaddr_info_get(&sa, "2001:db8:0:f101::1%2", APR_UNSPEC, 80, 0, p);
works on Linux but not on Solaris. It seems the Solaris doesn't like the %.
Comment 6 JBoss JIRA Server 2013-05-24 08:46:48 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

h3. Thinking aloud
I do not understand why should we put zone there at all. What should httpd, as a server, do with it?
I had tried to look up some httpd tests with IPv6, and I found only this, not using zone id:
[httpd-2.2.23/srclib/apr/test/testsock.c:314|https://gist.github.com/Karm/5642351#file-testsock-c-L314]
 
Furthermore, I examined the functions in {{httpd-2.2.23/srclib/apr/network_io/unix/sockaddr.c}} leading to {{getaddrinfo(hostname, servname, &hints, &ai_list);}}

Solaris POSIX mambo-jambo reveals a nice doc for [getaddrinfo()|http://docs.oracle.com/cd/E23823_01/html/816-5170/getaddrinfo-3socket.html#scrolltoc] 

{quote}
The {{nodename}} can also be an IPv6 zone-id in the form:
{code}
<address>%<zone-id>
{code}
The address is the literal IPv6 link-local address or host name of the destination. The zone-id is the interface ID of the IPv6 link used to send the packet. The zone-id can either be a numeric value, indicating a literal zone value, or an interface name such as hme0.
{quote}

OK, we should be able to put %num there, still, why should be httpd interested in worker's interface zone id? It is not going to be binding to it...
I guess there is even a room for a nasty error where, given that zone id has a priority over the actual address, httpd will try to use a specific interface just because it was given an unnecessary zone id... Dunno :-(

h3. Toss % out
How about stripping the %num from the CONFIG message on the native side? As I stated above, it's IMHO useless there anyhow.

{code:title=RHEL with zone %666|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1}
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180%666]:8009
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(549): proxy: initialized single connection worker 1 in child 10070 for (2620:52:0:102f:221:5eff:fe96:8180%666)
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180%666]:8009 1 (status): 129
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
{code}

OK, RHEL can handle it, SOLARIS can't. On the other hand:

{code:title=RHEL without any zone in the message|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1}
[Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180]:8009
[Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(549): proxy: initialized single connection worker 1 in child 9967 for (2620:52:0:102f:221:5eff:fe96:8180)
[Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180]:8009 1 (status): 129
{code}
Omitting the zone from the CONFIG message seems to be doing no harm.

Solaris up and running: :-)
{code:title=SOLARIS without any zone in the message|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1}
[Fri May 24 08:25:15 2013] [debug] mod_manager.c(1923): manager_trans CONFIG (/)
[Fri May 24 08:25:15 2013] [debug] mod_manager.c(2598): manager_handler CONFIG (/) processing: "JVMRoute=FakeNode&Host=%5B2620%3A52%3A0%3A105f%3A%3Affff%3A60%5D&Maxattempts=1&Port=8009&Type=ajp&ping=100\r\n"
[Fri May 24 08:25:15 2013] [debug] mod_manager.c(2647): manager_handler CONFIG  OK
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(532): proxy: initialized worker 1 in child 19207 for (2620:52:0:105f::ffff:60) min=0 max=25 smax=25
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 1 (status): 1
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2011): proxy: ajp: has acquired connection for (2620:52:0:105f::ffff:60)
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2067): proxy: connecting ajp://[2620:52:0:105f::ffff:60]:8009/ to 2620:52:0:105f::ffff:60:8009
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(532): proxy: initialized worker 1 in child 19208 for (2620:52:0:105f::ffff:60) min=0 max=25 smax=25
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 1 (status): 1
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2193): proxy: connected / to 2620:52:0:105f::ffff:60:8009
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2444): proxy: ajp: fam 26 socket created to connect to 2620:52:0:105f::ffff:60
{code}

Without *%something* in the Host attribute of the CONFIG message, there is no nasty *DNS lookup failure* and everything seems to be cool (not yet thoroughly tested though).

The aforementioned log was produced with this fake message:

{code}
{ echo "CONFIG / HTTP/1.0"; echo "Content-length: 108"; echo ""; echo "JVMRoute=FakeNode&Host=%5B2620%3A52%3A0%3A105f%3A%3Affff%3A60%5D&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep 1; } | telnet 2620:52:0:105f::ffff:60 6666
{code}

What do you think about it?
Comment 7 JBoss JIRA Server 2013-05-24 15:22:52 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

Regarding the idea of removing the zone id, how about this: [https://github.com/modcluster/mod_cluster/pull/20/] ?
Comment 8 Jean-frederic Clere 2013-05-28 11:41:30 EDT
Not in EAP6.1 = not EWS 2.0.1 = unknown bug.
Comment 9 Michal Karm Babacek 2013-05-28 12:01:33 EDT
(In reply to Jean-frederic Clere from comment #8)
> Not in EAP6.1 = not EWS 2.0.1 = unknown bug.

??? Um...the bug is present both in EAP6.1 and EWS 2.0.1 :-)
Comment 10 Jean-frederic Clere 2013-05-29 02:38:02 EDT
Yes the bug is both in EWS 2.0.1 and EAP 6.1 has we share the same sources.
Comment 11 Michal Karm Babacek 2013-05-29 04:45:26 EDT
(In reply to Jean-frederic Clere from comment #10)
> Yes the bug is both in EWS 2.0.1 and EAP 6.1 has we share the same sources.

I am lost. Why did you send this comment then?

Jean-frederic Clere 2013-05-28 11:41:30 EDT
Not in EAP6.1 = not EWS 2.0.1 = unknown bug.
Flags: devel_ack-

?
Comment 12 Jean-frederic Clere 2013-05-29 05:50:02 EDT
Not fixed  in EAP6.1 = not fixed EWS 2.0.1 = known bug.
Comment 16 Libor Fuka 2013-05-30 09:40:50 EDT
i set requires_doc_text to ?
Comment 17 Libor Fuka 2013-05-30 09:42:48 EDT
It needs to be in release notes
Comment 18 Jean-frederic Clere 2013-05-30 09:55:15 EDT
Cause: 

Java returns IPv6 with a zone like "2001:db8:0:f101::1%2" when returning a node address the modcluser subsystem sends the IPv6 has it sees it in Java. Solaris  apr_sockaddr_info_get() doesn't support that format and tries (and fails) to resolve the IP as hostname.


Consequence: 

httpd mod_cluster won't work with nodes with IPv6 addresses.

Fix: Use Ipv4 address for nodes when httpd runs on Solaris.

Result:
Comment 19 Jean-frederic Clere 2013-05-30 10:04:56 EDT
The work-around is to use a address="hostname" in the connector in the web subsystem.
Comment 20 JBoss JIRA Server 2013-05-30 11:28:07 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

[~jfclere] I have been investigating further and you might find these notes useful:
h4. IPv6 works if we remove % and zone id
The "fix", or rather a workaround, in [/pull/20/|https://github.com/modcluster/mod_cluster/pull/20/] really made IPv6 work on Soalris 11 SPARC64. I tested with attached [^mod_manager.so] (built from [/pull/20/|https://github.com/modcluster/mod_cluster/pull/20/] sources for sparc64, *apxs* from httpd-2.2.23). Here is the debug log from the successful test: [^error_log_pull20].

h4. Actual apr_sockaddr_info_get source code
I wondered what is the actual difference between Solaris's and Fedora's {{apr_sockaddr_info_get}}, but I am bewildered with all these macros. What I did is to run a preprocessor, so as I can compare the actual C code that is to be compiled on Fedora and Solaris.
{noformat}
/tmp/native/httpd/httpd-2.2.23/srclib/apr
gcc -E -P -g -Wall -Wmissing-prototypes -Wstrict-prototypes -Wmissing-declarations -m64 -DSSL_EXPERIMENTAL -DSSL_ENGINE -DHAVE_CONFIG_H -DSOLARIS2=11 -D_POSIX_PTHREAD_SEMANTICS -D_REENTRANT -I./include -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include/arch/unix -I./include/arch/unix -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include/arch/unix -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include -o network_io/unix/sockaddr.lo -c network_io/unix/sockaddr.c
{noformat}

One may find resulting files attached as [^sockaddr.lo_fedora18_x86_64], [^sockaddr.lo_solaris11_sparc64].
I took a look at differences in
 * {{static apr_status_t find_addresses(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p)}}
 * {{call_resolver(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p)}}

but it all boils down to the system's:
 {{getaddrinfo(hostname, servname, &hints, &ai_list);}}
that, as far as I was able to look up, [supports %zoneid syntax|http://docs.oracle.com/cd/E23823_01/html/816-5170/getaddrinfo-3socket.html#scrolltoc]...

So, I can't really see how could {{apr_sockaddr_info_get}} fail us? There is not much code in it:
Solaris 11 SPARC64:
{code}
apr_status_t apr_sockaddr_info_get(apr_sockaddr_t **sa,
                                                const char *hostname,
                                                apr_int32_t family, apr_port_t port,
                                                apr_int32_t flags, apr_pool_t *p)
{
    apr_int32_t masked;
    *sa = 0L;
    if ((masked = flags & (0x01 | 0x02))) {
        if (!hostname ||
            family != 0 ||
            masked == (0x01 | 0x02)) {
            return 22;
        }
    }
    return find_addresses(sa, hostname, family, port, flags, p);
}
{code}
the only difference from Fedora build being on line 7, {{*sa = ((void *)0);}}.

uh...
Comment 21 Misha H. Ali 2013-06-02 21:37:02 EDT
Thank you, Jean-Frederic. Documenting this as a known issue for 2.0.1.
Comment 22 JBoss JIRA Server 2013-08-29 04:29:52 EDT
Jean-Frederic Clere <jfclere@jboss.org> updated the status of jira MODCLUSTER-339 to Resolved
Comment 23 JBoss JIRA Server 2013-09-19 11:50:46 EDT
Michal Babacek <mbabacek@redhat.com> updated the status of jira MODCLUSTER-339 to Closed
Comment 24 JBoss JIRA Server 2013-09-19 11:50:46 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

Verified with mod_cluster 1.2.6 :-)
Comment 25 Michal Karm Babacek 2014-06-11 11:39:36 EDT
Let's switch it to ON_QA...
Comment 26 Michal Karm Babacek 2014-06-11 11:42:40 EDT
...and let's flip it to VERIFIED :-)

Solaris 10 sparc
[Wed Jun 11 06:14:34 2014] [debug] mod_manager.c(2623): manager_handler CONFIG (/) processing: "JVMRoute=jboss-eap-6.3&Host=%5B2620%3A52%3A0%3A105f%3A0%3A0%3Affff%3Af8%252%5D&Maxattempts=1&Port=8009&StickySessionForce=No&Type=ajp&ping=10"
[Wed Jun 11 06:14:34 2014] [debug] mod_manager.c(2672): manager_handler CONFIG  OK

EWS 2.1.0.ER2
Comment 27 Mandar Joshi 2014-08-08 07:55:26 EDT
Changed Doc Type to Bug Fix.

Modified Doc Text: 
Updated EWS version to 2.1.0
Added sentence: This issue is fixed in EWS 2.1.0/

Note You need to log in before you can comment on or make changes to this bug.