Bug 1130328 - subversion won't fall back to IPv4 on IPv6 failure
Summary: subversion won't fall back to IPv4 on IPv6 failure
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: libserf
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Igor Gnatenko
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1238745 (view as bug list)
Depends On:
Blocks: dualstack
TreeView+ depends on / blocked
 
Reported: 2014-08-14 21:33 UTC by Pavel Šimerda (pavlix)
Modified: 2019-04-24 07:55 UTC (History)
8 users (show)

Fixed In Version: libserf-1.3.9-12.fc31
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-21 15:03:30 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Apache JIRA SERF-190 None None None 2019-04-16 09:19:01 UTC

Description Pavel Šimerda (pavlix) 2014-08-14 21:33:39 UTC
Software connecting to internet services that are on both IPv6 and IPv4 must fallback to the IPv4 address after failing with IPv6. When working on a network with defunct IPv6 connectivity, I found out that subversion fails in this respect.


How to test:

1) You probably must have an IPv6 address to fool the getaddrinfo function (didn't try without).

If not, you can run the following:

DEVICE=...
ip address add 2001:1:2:3:4:5:6:7/64 dev $DEVICE

2) Your IPv6 connectivity must be broken.

If not, you can run the following:

ip6tables -A OUTPUT -o $DEVICE -j REJECT

3) Use subversion to check out a repository available over both protocols.

For example:

svn co http://www.nlnetlabs.nl/svn/dnssec-trigger/trunk


Expected result:

The subversion client should first try to connect via IPv6 and after failure fallback to IPv4.


Actual result:

svn: E000111: Unable to connect to a repository at URL 'http://www.nlnetlabs.nl/svn/dnssec-trigger/trunk'
svn: E000111: Error running context: Connection refused


More details:

I used strace to debug the issue using an upstream package (so the issue is not specific to Fedora but we are trying to have good IPv4 and IPv6 support) and in the source code and it is clear that subversion is prepared to work in dualstack environments it just fails at some point.

My best bet is that it treats 0 status from connect() as success even though it's non-blocking but it's just a wild guess. I'll be happy to continue researching the issue and work on the solution if needed.

Comment 1 Michele Baldessari 2014-08-14 21:41:10 UTC
I saw the same issue on my system:
apr-1.5.1-2.fc21.x86_64
subversion-1.8.9-2.fc21.x86_64

hth,
Michele

Comment 2 Pavel Šimerda (pavlix) 2014-08-14 22:07:27 UTC
From the strace (empty lines are just for easier reading, this is a continuous log):

19766 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 8
19766 connect(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("185.49.140.10")}, 16) = 0
19766 getsockname(8, {sa_family=AF_INET, sin_port=htons(58343), sin_addr=inet_addr("84.246.161.86")}, [16]) = 0
19766 close(8)                          = 0

Attempt to connect to the IPv4 address but see the IPPROTO_IP and getsockname. Did it just check which source address would be used and then closed the socket?

19766 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 8
19766 connect(8, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "2a04:b900::1:0:0:10", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
19766 getsockname(8, {sa_family=AF_INET6, sin6_port=htons(45861), inet_pton(AF_INET6, "2a00:1268:1ff:f001:21f:3cff:fe1b:9e5e", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
19766 close(8)                          = 0

Same for IPv6.

19766 socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 8
19766 fcntl(8, F_GETFL)                 = 0x2 (flags O_RDWR)
19766 fcntl(8, F_SETFL, O_RDWR|O_NONBLOCK) = 0
19766 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
19766 connect(8, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "2a04:b900::1:0:0:10", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)

Now the real connect but with a non-blocking socket, so no final result.

19766 epoll_ctl(7, EPOLL_CTL_DEL, 8, {0, {u32=0, u64=0}}) = -1 ENOENT (No such file or directory)
19766 epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLOUT, {u32=23311088, u64=23311088}}) = 0
19766 epoll_wait(7, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=23311088, u64=23311088}}}, 16, 500) = 1
19766 read(8, 0x16460c4, 8000)          = -1 ECONNREFUSED (Connection refused)

And here it is, ECONNREFUSED upon read but no attempt to retry with IPv4.

19766 epoll_ctl(7, EPOLL_CTL_DEL, 8, {0, {u32=0, u64=0}}) = 0
19766 brk(0x166e000)                    = 0x166e000
19766 close(8)                          = 0
19766 close(-1)                         = -1 EBADF (Bad file descriptor)
19766 close(7)                          = 0
19766 write(2, "Connection refused: Unable to co"..., 216) = 216

The error message.

Comment 3 Pavel Šimerda (pavlix) 2014-08-14 23:06:21 UTC
The calls found in the above strace seem to be done by libserf.

Comment 4 Joe Orton 2014-08-15 06:43:16 UTC
There is code in serf which tries to iterate through the address list on failure.  There should be a getsockopt() call in there to retrieve the "real" connect() error if the code is working correctly - looks like that is not showing up in your strace?

Might be worth checking with upstream.

Comment 5 Pavel Šimerda (pavlix) 2014-08-15 07:55:24 UTC
(In reply to Joe Orton from comment #4)
> There is code in serf which tries to iterate through the address list on
> failure.

+1 for the switched component, I still wasn't sure whether the problem is in serf or in apr.

> There should be a getsockopt() call in there to retrieve the
> "real" connect() error if the code is working correctly

I'm curious. What getsockopt() are you talking about and how can you retrieve a state of a non-blocking socket before trying to use it?

> looks like that is not showing up in your strace?

Nope.

> Might be worth checking with upstream.

Definitely.

Comment 6 Joe Orton 2014-08-15 10:25:17 UTC
On getsockopt() I meant this stuff:

https://code.google.com/p/serf/source/browse/trunk/outgoing.c#1381

When the connect fails the epoll_wait should return the error then serf should use getsockopt/SO_ERROR to retrieve the error for the failure of the non-blocking connect, rather than attempting I/O on the connection and *then* seeing the error.

Do you have time to chase this upstream?

Comment 7 Joe Orton 2014-08-15 10:27:36 UTC
One other thing: there was a rebase of libserf just this week in Fedora so make sure you have 1.3.7.

Comment 8 Pavel Šimerda (pavlix) 2014-08-19 10:23:17 UTC
(In reply to Joe Orton from comment #7)
> One other thing: there was a rebase of libserf just this week in Fedora so
> make sure you have 1.3.7.

I originally found the bug with Gentoo and libserf 1.3.7.

Comment 9 Pavel Šimerda (pavlix) 2014-08-19 10:25:32 UTC
(In reply to Joe Orton from comment #6)
> On getsockopt() I meant this stuff:
> 
> https://code.google.com/p/serf/source/browse/trunk/outgoing.c#1381
> 
> When the connect fails the epoll_wait should return the error then serf
> should use getsockopt/SO_ERROR to retrieve the error for the failure of the
> non-blocking connect, rather than attempting I/O on the connection and
> *then* seeing the error.

So the expected behavior is to call getsockopt instead of read/write.

> Do you have time to chase this upstream?

Yep, I will find some, should I assign the bug to myself for now?

Comment 10 Pavel Šimerda (pavlix) 2015-10-10 23:46:02 UTC
*** Bug 1238745 has been marked as a duplicate of this bug. ***

Comment 11 Pavel Šimerda (pavlix) 2015-10-10 23:49:14 UTC
It doesn't make sense to keep it with F21, moving to rawhide for now but we can change it later if needed.

Comment 12 Jan Kurik 2016-02-24 13:15:53 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 13 Fedora Admin XMLRPC Client 2016-02-26 17:40:53 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 14 Fedora Admin XMLRPC Client 2016-04-09 12:55:00 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 15 Igor Gnatenko 2017-06-21 15:03:30 UTC
please, can you report this to upstream? RHBZ is not appropriate place for upstream bugs.

Comment 16 Pavel (pavlix) Šimerda 2017-06-21 15:38:00 UTC
I have no plan to report upstream myself right now. This bug was created as part of a project to improve Fedora IPv6 support. Just for your information, I'm not taking any steps right now.

Comment 17 Tomáš Hozza 🤓 2017-06-22 12:18:15 UTC
Igor, I would like to ask you as the package maintainer to report this to upstream, as Fedora users are usually not interacting with upstream directly. This is an issue in the Fedora version. It is not up to the reporter, but up to the maintainer to work with the upstream to forward them the bug report. Also CLOSED UPSTREAM is used for closing bugs which were reported to upstream and are tracked there. Please provide a pointer to upstream bug, until then I'm reopening this bug.

Thanks.

Comment 18 Jan Kurik 2017-08-15 09:03:38 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 19 Martin Sehnoutka 2018-02-26 13:18:23 UTC
Any update on this bug? I did not find any upstream issue.

Comment 20 Ben Cotton 2018-11-27 14:54:21 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Petr Menšík 2019-04-16 08:46:34 UTC
Any update on this bug? It is annoying if upstream has ipv6 support in addresses, but network does not allow it. Cannot be overriden even by svn parameters. nlnetlabs uses it still for unbound and ldns project, but we have to use netresolve to work around it.

Comment 22 Petr Menšík 2019-04-16 09:19:01 UTC
Related upstream issue might be [1], but that should be already fixed in 1.3.3.

1. https://issues.apache.org/jira/browse/SERF-129?jql=text%20~%20%22ipv6%20serf%22

Unbound error:
svn co https://nlnetlabs.nl/svn/unbound/trunk
svn: E170013: Unable to connect to a repository at URL 'https://nlnetlabs.nl/svn/unbound/trunk'
svn: E000113: Error running context: No route to host

But it does not try IPv4 address at all.

Filled a new issue on upstream tracker, https://issues.apache.org/jira/browse/SERF-190

Comment 23 Joe Orton 2019-04-16 10:48:07 UTC
Package: libserf-1.3.9-12.fc31

Comment 24 Joe Orton 2019-04-16 10:50:26 UTC
Petr any chance can you try -31 with SVN?  https://koji.fedoraproject.org/koji/buildinfo?buildID=1250807

Comment 25 Petr Menšík 2019-04-17 14:14:53 UTC
Unfortunately, I do not have any virtual with working IPv6 connection to reproduce this issue. My virtual rawhide works even without this upgrade, but it has different network configuration. Sorry, not able to test it yet.

Comment 26 Petr Menšík 2019-04-17 17:30:07 UTC
I have created copr build [1] for fixed version, without subversion rebuild. Unfortunately, the issue is still the same.

$ rpm -q subversion libserf
subversion-1.11.1-1.fc29.x86_64
libserf-1.3.9-12.fc29.x86_64

$ svn co https://nlnetlabs.nl/svn/unbound/trunk
svn: E170013: Unable to connect to a repository at URL 'https://nlnetlabs.nl/svn/unbound/trunk'
svn: E000113: Error running context: No route to host

1. https://copr.fedorainfracloud.org/coprs/pemensik/subversion/

Comment 27 Joe Orton 2019-04-18 07:33:29 UTC
Can you capture strace for that?

Comment 29 Joe Orton 2019-04-24 07:55:48 UTC
Thank Petr, I can see the problem - it is catching POLLIN as well and the code is significantly different here on trunk (where my patch works) to 1.3.9 to distinguish this case.  I'm going to have to wait for upstream to chime in, not trivial to backport the trunk code to 1.3.9

epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT, {u32=2678110760, u64=93877778299432}}) = 0
epoll_wait(3, [{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=2678110760, u64=93877778299432}}], 16, 500) = 1
read(4, 0x55619fa16814, 8000)           = -1 EHOSTUNREACH (No route to host)

I'll revert my patch since it doesn't help and might have other regressions.


Note You need to log in before you can comment on or make changes to this bug.