Bug 963720
Summary: | mod_cluster: proxy DNS lookup failure with IPv6 on Solaris | ||
---|---|---|---|
Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Michal Karm Babacek <mbabacek> |
Component: | mod_cluster | Assignee: | Jean-frederic Clere <jclere> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michal Karm Babacek <mbabacek> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.1.0 | CC: | jclere, rdickens, smumford |
Target Milestone: | ER2 | ||
Target Release: | EAP 6.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
In previous versions of EAP 6 it was found that attempting to use IPv6 addresses within a Solaris system would result in a DNS lookup failure.
The source of this issue was traced to the IPv6 zone-id string of IPv6 adresses.
Since this information is of no use to the HTTPD, the string is no longer used and mod_cluster now operates as expected on Solaris systems.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-12-15 16:21:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michal Karm Babacek
2013-05-16 13:16:46 UTC
Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-339 As a comparison, here is a healthy debug log from a mod_cluster IPv6 test on RHEL [^error_log-mod_cluster-RHEL]. Jean-Frederic Clere <jfclere> made a comment on jira MODCLUSTER-339 %5B2620%3A52%3A0%3A105f%3A0%3A0%3Affff%3A50%252%5D that is [ ... %2] that is not a valid address. what is configured on the AS7 side? Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-339 {code} <interfaces> <interface name="management"> <inet-address value="2620:52:0:105f::ffff:50"/> </interface> <interface name="public"> <inet-address value="2620:52:0:105f::ffff:50"/> </interface> <interface name="unsecure"> inet-address value="${jboss.bind.address.unsecure:127.0.0.1}"/> </interface> </interfaces> <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}"> <socket-binding name="management-native" interface="management" port="${jboss.management.native.port:9999}"/> <socket-binding name="management-http" interface="management" port="${jboss.management.http.port:9990}"/> <socket-binding name="management-https" interface="management" port="${jboss.management.https.port:9443}"/> <socket-binding name="ajp" port="8009"/> <socket-binding name="http" port="8080"/> <socket-binding name="https" port="8443"/> <socket-binding name="jgroups-mping" port="0" multicast-address="ff01::3" multicast-port="45700"/> <socket-binding name="jgroups-tcp" port="7600"/> <socket-binding name="jgroups-tcp-fd" port="57600"/> <socket-binding name="jgroups-udp" port="55200" multicast-address="ff01::3" multicast-port="45688"/> <socket-binding name="jgroups-udp-fd" port="54200"/> <socket-binding name="modcluster" port="0" multicast-address="ff01::7" multicast-port="23964"/> <socket-binding name="remoting" port="4447"/> <socket-binding name="txn-recovery-environment" port="4712"/> <socket-binding name="txn-status-manager" port="4713"/> <outbound-socket-binding name="mail-smtp"> <remote-destination host="localhost" port="25"/> </outbound-socket-binding> </socket-binding-group> {code} Jean-Frederic Clere <jfclere> made a comment on jira MODCLUSTER-339 Ok it seems EAP/AS adds the %2 which causes problem on solaris in the URL. That needs to be fixed. Jean-Frederic Clere <jfclere> made a comment on jira MODCLUSTER-339 It looks like apr behaves differently on Solaris and Linux: rv = apr_sockaddr_info_get(&sa, "2001:db8:0:f101::1%2", APR_UNSPEC, 80, 0, p); works on Linux but not on Solaris. It seems the Solaris doesn't like the %. Cause: Solaris doesn't support IPv6 zone (%n) in apr_sockaddr_info_get() Consequence: mod_cluster can't work with nodes with IPv6 addresses on Solaris Workaround (if any): None Result: . Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-339 h3. Thinking aloud I do not understand why should we put zone there at all. What should httpd, as a server, do with it? I had tried to look up some httpd tests with IPv6, and I found only this, not using zone id: [httpd-2.2.23/srclib/apr/test/testsock.c:314|https://gist.github.com/Karm/5642351#file-testsock-c-L314] Furthermore, I examined the functions in {{httpd-2.2.23/srclib/apr/network_io/unix/sockaddr.c}} leading to {{getaddrinfo(hostname, servname, &hints, &ai_list);}} Solaris POSIX mambo-jambo reveals a nice doc for [getaddrinfo()|http://docs.oracle.com/cd/E23823_01/html/816-5170/getaddrinfo-3socket.html#scrolltoc] {quote} The {{nodename}} can also be an IPv6 zone-id in the form: {code} <address>%<zone-id> {code} The address is the literal IPv6 link-local address or host name of the destination. The zone-id is the interface ID of the IPv6 link used to send the packet. The zone-id can either be a numeric value, indicating a literal zone value, or an interface name such as hme0. {quote} OK, we should be able to put %num there, still, why should be httpd interested in worker's interface zone id? It is not going to be binding to it... I guess there is even a room for a nasty error where, given that zone id has a priority over the actual address, httpd will try to use a specific interface just because it was given an unnecessary zone id... Dunno :-( h3. Toss % out How about stripping the %num from the CONFIG message on the native side? As I stated above, it's IMHO useless there anyhow. {code:title=RHEL with zone %666|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1} [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180%666]:8009 [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(549): proxy: initialized single connection worker 1 in child 10070 for (2620:52:0:102f:221:5eff:fe96:8180%666) [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180%666]:8009 1 (status): 129 [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting [Fri May 24 06:44:25 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done {code} OK, RHEL can handle it, SOLARIS can't. On the other hand: {code:title=RHEL without any zone in the message|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1} [Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180]:8009 [Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(549): proxy: initialized single connection worker 1 in child 9967 for (2620:52:0:102f:221:5eff:fe96:8180) [Fri May 24 06:37:47 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:102f:221:5eff:fe96:8180]:8009 1 (status): 129 {code} Omitting the zone from the CONFIG message seems to be doing no harm. Solaris up and running: :-) {code:title=SOLARIS without any zone in the message|borderStyle=solid|borderColor=#ccc| titleBGColor=#F7D6C1} [Fri May 24 08:25:15 2013] [debug] mod_manager.c(1923): manager_trans CONFIG (/) [Fri May 24 08:25:15 2013] [debug] mod_manager.c(2598): manager_handler CONFIG (/) processing: "JVMRoute=FakeNode&Host=%5B2620%3A52%3A0%3A105f%3A%3Affff%3A60%5D&Maxattempts=1&Port=8009&Type=ajp&ping=100\r\n" [Fri May 24 08:25:15 2013] [debug] mod_manager.c(2647): manager_handler CONFIG OK [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(655): add_balancer_node: Create balancer balancer://qacluster [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(532): proxy: initialized worker 1 in child 19207 for (2620:52:0:105f::ffff:60) min=0 max=25 smax=25 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 1 (status): 1 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done [Fri May 24 08:25:15 2013] [debug] proxy_util.c(2011): proxy: ajp: has acquired connection for (2620:52:0:105f::ffff:60) [Fri May 24 08:25:15 2013] [debug] proxy_util.c(2067): proxy: connecting ajp://[2620:52:0:105f::ffff:60]:8009/ to 2620:52:0:105f::ffff:60:8009 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(426): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(532): proxy: initialized worker 1 in child 19208 for (2620:52:0:105f::ffff:60) min=0 max=25 smax=25 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(601): Created: worker for ajp://[2620:52:0:105f::ffff:60]:8009 1 (status): 1 [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1010): update_workers_node starting [Fri May 24 08:25:15 2013] [debug] mod_proxy_cluster.c(1025): update_workers_node done [Fri May 24 08:25:15 2013] [debug] proxy_util.c(2193): proxy: connected / to 2620:52:0:105f::ffff:60:8009 [Fri May 24 08:25:15 2013] [debug] proxy_util.c(2444): proxy: ajp: fam 26 socket created to connect to 2620:52:0:105f::ffff:60 {code} Without *%something* in the Host attribute of the CONFIG message, there is no nasty *DNS lookup failure* and everything seems to be cool (not yet thoroughly tested though). The aforementioned log was produced with this fake message: {code} { echo "CONFIG / HTTP/1.0"; echo "Content-length: 108"; echo ""; echo "JVMRoute=FakeNode&Host=%5B2620%3A52%3A0%3A105f%3A%3Affff%3A60%5D&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep 1; } | telnet 2620:52:0:105f::ffff:60 6666 {code} What do you think about it? Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-339 Regarding the idea of removing the zone id, how about this: [https://github.com/modcluster/mod_cluster/pull/20/] ? Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-339 [~jfclere] I have been investigating further and you might find these notes useful: h4. IPv6 works if we remove % and zone id The "fix", or rather a workaround, in [/pull/20/|https://github.com/modcluster/mod_cluster/pull/20/] really made IPv6 work on Soalris 11 SPARC64. I tested with attached [^mod_manager.so] (built from [/pull/20/|https://github.com/modcluster/mod_cluster/pull/20/] sources for sparc64, *apxs* from httpd-2.2.23). Here is the debug log from the successful test: [^error_log_pull20]. h4. Actual apr_sockaddr_info_get source code I wondered what is the actual difference between Solaris's and Fedora's {{apr_sockaddr_info_get}}, but I am bewildered with all these macros. What I did is to run a preprocessor, so as I can compare the actual C code that is to be compiled on Fedora and Solaris. {noformat} /tmp/native/httpd/httpd-2.2.23/srclib/apr gcc -E -P -g -Wall -Wmissing-prototypes -Wstrict-prototypes -Wmissing-declarations -m64 -DSSL_EXPERIMENTAL -DSSL_ENGINE -DHAVE_CONFIG_H -DSOLARIS2=11 -D_POSIX_PTHREAD_SEMANTICS -D_REENTRANT -I./include -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include/arch/unix -I./include/arch/unix -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include/arch/unix -I/tmp/native/httpd/httpd-2.2.23/srclib/apr/include -o network_io/unix/sockaddr.lo -c network_io/unix/sockaddr.c {noformat} One may find resulting files attached as [^sockaddr.lo_fedora18_x86_64], [^sockaddr.lo_solaris11_sparc64]. I took a look at differences in * {{static apr_status_t find_addresses(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p)}} * {{call_resolver(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p)}} but it all boils down to the system's: {{getaddrinfo(hostname, servname, &hints, &ai_list);}} that, as far as I was able to look up, [supports %zoneid syntax|http://docs.oracle.com/cd/E23823_01/html/816-5170/getaddrinfo-3socket.html#scrolltoc]... So, I can't really see how could {{apr_sockaddr_info_get}} fail us? There is not much code in it: Solaris 11 SPARC64: {code} apr_status_t apr_sockaddr_info_get(apr_sockaddr_t **sa, const char *hostname, apr_int32_t family, apr_port_t port, apr_int32_t flags, apr_pool_t *p) { apr_int32_t masked; *sa = 0L; if ((masked = flags & (0x01 | 0x02))) { if (!hostname || family != 0 || masked == (0x01 | 0x02)) { return 22; } } return find_addresses(sa, hostname, family, port, flags, p); } {code} the only difference from Fedora build being on line 7, {{*sa = ((void *)0);}}. uh... Jean-Frederic Clere <jfclere> updated the status of jira MODCLUSTER-339 to Resolved Michal Babacek <mbabacek> updated the status of jira MODCLUSTER-339 to Closed Michal Babacek <mbabacek> made a comment on jira MODCLUSTER-339 Verified with mod_cluster 1.2.6 :-) Splendid :-) mod_cluster 1.2.6 works on Solaris with IPv6 like a charm. Only httpd does not: [BZ 1009987] So what did we wind up doing to resolve this? None of the linked tickets explicitly state what the fix was (that I could see, anyway). Did we remove the problematic zone? Or find a way to make Solaris play nice with it? Need to know how we fixed this for the Release Note. Resolution was this: https://github.com/modcluster/mod_cluster/pull/20/ In my own words: An unnecessary zone string is removed from the received message. Thanks for that Michal. I still can't see where that's stated in the link. I guess I need to get better at reading pull requests. Have added Doc Text and marked for inclusion in EAP 6.2 Release Notes. |