Bug 963720 - mod_cluster: proxy DNS lookup failure with IPv6 on Solaris
mod_cluster: proxy DNS lookup failure with IPv6 on Solaris
Status: CLOSED CURRENTRELEASE
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: mod_cluster (Show other bugs)
6.1.0
Unspecified Unspecified
unspecified Severity unspecified
: ER2
: EAP 6.2.0
Assigned To: Jean-frederic Clere
Michal Karm Babacek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-16 09:16 EDT by Michal Karm Babacek
Modified: 2013-12-15 11:21 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In previous versions of EAP 6 it was found that attempting to use IPv6 addresses within a Solaris system would result in a DNS lookup failure. The source of this issue was traced to the IPv6 zone-id string of IPv6 adresses. Since this information is of no use to the HTTPD, the string is no longer used and mod_cluster now operates as expected on Solaris systems.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-15 11:21:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker MODCLUSTER-339 Critical Closed "proxy: DNS lookup failure" with IPv6 on Solaris 2015-08-14 11:29:57 EDT

  None (edit)
Description Michal Karm Babacek 2013-05-16 09:16:46 EDT
https://issues.jboss.org/browse/MODCLUSTER-339
Comment 1 JBoss JIRA Server 2013-05-16 10:39:54 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

As a comparison, here is a healthy debug log from a mod_cluster IPv6 test on RHEL [^error_log-mod_cluster-RHEL].
Comment 2 JBoss JIRA Server 2013-05-16 11:22:33 EDT
Jean-Frederic Clere <jfclere@jboss.org> made a comment on jira MODCLUSTER-339

%5B2620%3A52%3A0%3A105f%3A0%3A0%3Affff%3A50%252%5D
that is [ ... %2] that is not a valid address.
what is configured on the AS7 side?
Comment 3 JBoss JIRA Server 2013-05-16 12:10:14 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

{code}
<interfaces>
    <interface name="management">
        <inet-address value="2620:52:0:105f::ffff:50"/>
    </interface>
    <interface name="public">
        <inet-address value="2620:52:0:105f::ffff:50"/>
    </interface>
    <interface name="unsecure">
        inet-address value="${jboss.bind.address.unsecure:127.0.0.1}"/>
    </interface>
</interfaces>
<socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
    <socket-binding name="management-native" interface="management" port="${jboss.management.native.port:9999}"/>
    <socket-binding name="management-http" interface="management" port="${jboss.management.http.port:9990}"/>
    <socket-binding name="management-https" interface="management" port="${jboss.management.https.port:9443}"/>
    <socket-binding name="ajp" port="8009"/>
    <socket-binding name="http" port="8080"/>
    <socket-binding name="https" port="8443"/>
    <socket-binding name="jgroups-mping" port="0" multicast-address="ff01::3" multicast-port="45700"/>
    <socket-binding name="jgroups-tcp" port="7600"/>
    <socket-binding name="jgroups-tcp-fd" port="57600"/>
    <socket-binding name="jgroups-udp" port="55200" multicast-address="ff01::3" multicast-port="45688"/>
    <socket-binding name="jgroups-udp-fd" port="54200"/>
    <socket-binding name="modcluster" port="0" multicast-address="ff01::7" multicast-port="23964"/>
    <socket-binding name="remoting" port="4447"/>
    <socket-binding name="txn-recovery-environment" port="4712"/>
    <socket-binding name="txn-status-manager" port="4713"/>
    <outbound-socket-binding name="mail-smtp">
    <remote-destination host="localhost" port="25"/>
    </outbound-socket-binding>
</socket-binding-group>
{code}
Comment 4 JBoss JIRA Server 2013-05-16 12:46:17 EDT
Jean-Frederic Clere <jfclere@jboss.org> made a comment on jira MODCLUSTER-339

Ok it seems EAP/AS adds the %2 which causes problem on solaris in the URL. That needs to be fixed.
Comment 5 JBoss JIRA Server 2013-05-17 07:31:15 EDT
Jean-Frederic Clere <jfclere@jboss.org> made a comment on jira MODCLUSTER-339

It looks like apr behaves differently on Solaris and  Linux:
    rv = apr_sockaddr_info_get(&sa, "2001:db8:0:f101::1%2", APR_UNSPEC, 80, 0, p);
works on Linux but not on Solaris. It seems the Solaris doesn't like the %.
Comment 6 Jean-frederic Clere 2013-05-22 04:13:12 EDT
Cause: 

Solaris doesn't support IPv6 zone (%n) in apr_sockaddr_info_get()


Consequence: 

mod_cluster can't work with nodes with IPv6 addresses on Solaris

Workaround (if any): 

None

Result: 
.
Comment 7 JBoss JIRA Server 2013-05-24 08:46:46 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

h3. Thinking aloud
I do not understand why should we put zone there at all. What should httpd, as a server, do with it?
I had tried to look up some httpd tests with IPv6, and I found only this, not using zone id:
[httpd-2.2.23/srclib/apr/test/testsock.c:314|https://gist.github.com/Karm/5642351#file-testsock-c-L314]
 
Furthermore, I examined the functions in {{httpd-2.2.23/srclib/apr/network_io/unix/sockaddr.c}} leading to {{getaddrinfo(hostname, servname, &hints, &ai_list);}}

Solaris POSIX mambo-jambo reveals a nice doc for [getaddrinfo()|http://docs.oracle.com/cd/E23823_01/html/816-5170/getaddrinfo-3socket.html#scrolltoc] 

{quote}
The {{nodename}} can also be an IPv6 zone-id in the form:
{code}
<address>%<zone-id>
{code}
The address is the literal IPv6 link-local address or host name of the destination. The zone-id is the interface ID of the IPv6 link used to send the packet. The zone-id can either be a numeric value, indicating a literal zone value, or an interface name such as hme0.
{quote}

OK, we should be able to put %num there, still, why should be httpd interested in worker's interface zone id? It is not going to be binding to it...
I guess there is even a room for a nasty error where, given that zone id has a priority over the actual address, httpd will try to use a specific interface just because it was given an unnecessary zone id... Dunno :-(

h3. Toss % out
How about stripping the %num from the CONFIG message on the native side? As I stated above, it's IMHO useless there anyhow.

{code:title=RHEL with zone %666|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1}
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180%666]:8009
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(549): proxy: initialized single connection worker 1 in child 10070 for (2620:52:0:102f:221:5eff:fe96:8180%666)
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180%666]:8009 1 (status): 129
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
{code}

OK, RHEL can handle it, SOLARIS can't. On the other hand:

{code:title=RHEL without any zone in the message|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1}
[Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180]:8009
[Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(549): proxy: initialized single connection worker 1 in child 9967 for (2620:52:0:102f:221:5eff:fe96:8180)
[Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180]:8009 1 (status): 129
{code}
Omitting the zone from the CONFIG message seems to be doing no harm.

Solaris up and running: :-)
{code:title=SOLARIS without any zone in the message|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1}
[Fri May 24 08:25:15 2013] [debug] mod_manager.c(1923): manager_trans CONFIG (/)
[Fri May 24 08:25:15 2013] [debug] mod_manager.c(2598): manager_handler CONFIG (/) processing: "JVMRoute=FakeNode&Host=%5B2620%3A52%3A0%3A105f%3A%3Affff%3A60%5D&Maxattempts=1&Port=8009&Type=ajp&ping=100\r\n"
[Fri May 24 08:25:15 2013] [debug] mod_manager.c(2647): manager_handler CONFIG  OK
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(532): proxy: initialized worker 1 in child 19207 for (2620:52:0:105f::ffff:60) min=0 max=25 smax=25
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 1 (status): 1
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2011): proxy: ajp: has acquired connection for (2620:52:0:105f::ffff:60)
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2067): proxy: connecting ajp://[2620:52:0:105f::ffff:60]:8009/ to 2620:52:0:105f::ffff:60:8009
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(532): proxy: initialized worker 1 in child 19208 for (2620:52:0:105f::ffff:60) min=0 max=25 smax=25
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 1 (status): 1
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting
[Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2193): proxy: connected / to 2620:52:0:105f::ffff:60:8009
[Fri May 24 08:25:15 2013] [debug] proxy_util.c(2444): proxy: ajp: fam 26 socket created to connect to 2620:52:0:105f::ffff:60
{code}

Without *%something* in the Host attribute of the CONFIG message, there is no nasty *DNS lookup failure* and everything seems to be cool (not yet thoroughly tested though).

The aforementioned log was produced with this fake message:

{code}
{ echo "CONFIG / HTTP/1.0"; echo "Content-length: 108"; echo ""; echo "JVMRoute=FakeNode&Host=%5B2620%3A52%3A0%3A105f%3A%3Affff%3A60%5D&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep 1; } | telnet 2620:52:0:105f::ffff:60 6666
{code}

What do you think about it?
Comment 8 JBoss JIRA Server 2013-05-24 15:22:51 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

Regarding the idea of removing the zone id, how about this: [https://github.com/modcluster/mod_cluster/pull/20/] ?
Comment 9 JBoss JIRA Server 2013-05-30 11:28:05 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

[~jfclere] I have been investigating further and you might find these notes useful:
h4. IPv6 works if we remove % and zone id
The "fix", or rather a workaround, in [/pull/20/|https://github.com/modcluster/mod_cluster/pull/20/] really made IPv6 work on Soalris 11 SPARC64. I tested with attached [^mod_manager.so] (built from [/pull/20/|https://github.com/modcluster/mod_cluster/pull/20/] sources for sparc64, *apxs* from httpd-2.2.23). Here is the debug log from the successful test: [^error_log_pull20].

h4. Actual apr_sockaddr_info_get source code
I wondered what is the actual difference between Solaris's and Fedora's {{apr_sockaddr_info_get}}, but I am bewildered with all these macros. What I did is to run a preprocessor, so as I can compare the actual C code that is to be compiled on Fedora and Solaris.
{noformat}
/tmp/native/httpd/httpd-2.2.23/srclib/apr
gcc -E -P -g -Wall -Wmissing-prototypes -Wstrict-prototypes -Wmissing-declarations -m64 -DSSL_EXPERIMENTAL -DSSL_ENGINE -DHAVE_CONFIG_H -DSOLARIS2=11 -D_POSIX_PTHREAD_SEMANTICS -D_REENTRANT -I./include -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include/arch/unix -I./include/arch/unix -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include/arch/unix -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include -o network_io/unix/sockaddr.lo -c network_io/unix/sockaddr.c
{noformat}

One may find resulting files attached as [^sockaddr.lo_fedora18_x86_64], [^sockaddr.lo_solaris11_sparc64].
I took a look at differences in
 * {{static apr_status_t find_addresses(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p)}}
 * {{call_resolver(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p)}}

but it all boils down to the system's:
 {{getaddrinfo(hostname, servname, &hints, &ai_list);}}
that, as far as I was able to look up, [supports %zoneid syntax|http://docs.oracle.com/cd/E23823_01/html/816-5170/getaddrinfo-3socket.html#scrolltoc]...

So, I can't really see how could {{apr_sockaddr_info_get}} fail us? There is not much code in it:
Solaris 11 SPARC64:
{code}
apr_status_t apr_sockaddr_info_get(apr_sockaddr_t **sa,
                                                const char *hostname,
                                                apr_int32_t family, apr_port_t port,
                                                apr_int32_t flags, apr_pool_t *p)
{
    apr_int32_t masked;
    *sa = 0L;
    if ((masked = flags & (0x01 | 0x02))) {
        if (!hostname ||
            family != 0 ||
            masked == (0x01 | 0x02)) {
            return 22;
        }
    }
    return find_addresses(sa, hostname, family, port, flags, p);
}
{code}
the only difference from Fedora build being on line 7, {{*sa = ((void *)0);}}.

uh...
Comment 10 JBoss JIRA Server 2013-08-29 04:29:48 EDT
Jean-Frederic Clere <jfclere@jboss.org> updated the status of jira MODCLUSTER-339 to Resolved
Comment 11 JBoss JIRA Server 2013-09-19 11:50:43 EDT
Michal Babacek <mbabacek@redhat.com> updated the status of jira MODCLUSTER-339 to Closed
Comment 12 JBoss JIRA Server 2013-09-19 11:50:43 EDT
Michal Babacek <mbabacek@redhat.com> made a comment on jira MODCLUSTER-339

Verified with mod_cluster 1.2.6 :-)
Comment 13 Michal Karm Babacek 2013-09-19 12:25:04 EDT
Splendid :-) mod_cluster 1.2.6 works on Solaris with IPv6 like a charm. Only httpd does not: [BZ 1009987]
Comment 14 Scott Mumford 2013-11-20 16:35:43 EST
So what did we wind up doing to resolve this? None of the linked tickets explicitly state what the fix was (that I could see, anyway).

Did we remove the problematic zone? Or find a way to make Solaris play nice with it?

Need to know how we fixed this for the Release Note.
Comment 15 Michal Karm Babacek 2013-11-20 18:09:10 EST
Resolution was this: https://github.com/modcluster/mod_cluster/pull/20/
In my own words: An unnecessary zone string is removed from the received message.
Comment 16 Scott Mumford 2013-11-20 19:11:14 EST
Thanks for that Michal.

I still can't see where that's stated in the link. I guess I need to get better at reading pull requests.

Have added Doc Text and marked for inclusion in EAP 6.2 Release Notes.

Note You need to log in before you can comment on or make changes to this bug.