Bug 781708 - Concurrency problem between network and sshd service
Concurrency problem between network and sshd service
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: openssh (Show other bugs)
16
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Petr Lautrbach
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-14 09:55 EST by Corinna Vinschen
Modified: 2013-02-13 16:23 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 782042 (view as bug list)
Environment:
Last Closed: 2013-02-13 16:23:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corinna Vinschen 2012-01-14 09:55:42 EST
Description of problem:

This is a problem which started to show up today.  After reboot, sshd
failed to listen on my local IPv6 address.

I have a static network configuration using the network init script.
I have a sshd configuration which defines various ListenAddresses
in /etc/ssh/sshd_config, like this:

  # localhost
  ListenAddress 127.0.0.1:22
  ListenAddress [::1]:22
  # local only addresses
  ListenAddress 192.168.1.1:22
  ListenAddress [fc00::1]:22
  # external address
  ListenAddress a.b.c.d:12345

The last time before today I rebooted the machine was 2 days ago.  This
was right after I updated to the new 3.1.7 kernel via yum.  Everything
worked fine.

Today I installed the last set of updates via yum and rebooted again.
This time I was not able to connect to that machine via IPv6.  So
I tried netstat:

  $ netstat -tnl | grep :22
  tcp   0  0 127.0.0.1:22      0.0.0.0:*   LISTEN
  tcp   0  0 192.168.1.1:22    0.0.0.0:*   LISTEN
  tcp   0  0 ::1:22            :::*        LISTEN

Where is fc00::1?  ifconfig showed clearly that the fc00::1 address
was configured and ready.  `systemctl restart sshd.service' worked fine
and afterwards sshd was listening on fc00::1:22 as well.

I had a look into /var/log/secure:

  12:32:24 sshd[1360]: Server listening on a.b.c.d port 12345.
  12:32:24 sshd[1360]: error: Bind to port 22 on fc00::1 failed:
                       Cannot assign requested address.
  12:32:24 sshd[1360]: Server listening on 192.168.1.1 port 22.
  12:32:24 sshd[1360]: Server listening on ::1 port 22.
  12:32:24 sshd[1360]: Server listening on 127.0.0.1 port 22.

Ok, so it was no problem to set up the listening sockets on any other
requested IP address, just trying to lsten on fc00::1 failed with
"Cannot assign requested address".  So what about setting up the network?

  12:32:22 avahi-daemon[1092]: Registering new address record for
                               fe80::6250:40ff:fe30:2010 on br0.*.
  12:32:22 avahi-daemon[1092]: Joining mDNS multicast group on interface
                               br0.IPv4 with address 192.168.1.1.
  12:32:22 avahi-daemon[1092]: New relevant interface br0.IPv4 for mDNS.
  12:32:22 avahi-daemon[1092]: Registering new address record for
                               192.168.1.1 on br0.IPv4.
  12:32:22 avahi-daemon[1092]: Registering new address record for
                               a.b.c.d on br0.IPv4.
  12:32:22 avahi-daemon[1092]: Withdrawing address record for a.b.c.d on br0.
  12:32:22 avahi-daemon[1092]: Registering new address record for
                               a.b.c.d on br0.IPv4.
  12:32:22 network[962]: Bringing up interface br0:  [  OK  ]
  12:32:24 avahi-daemon[1092]: Registering new address record for
                               fc00::1 on br0.*.
  12:32:24 avahi-daemon[1092]: Withdrawing address record for
                               fe80::6250:40ff:fe30:2010 on br0.

So the network is supposed to be up 2 seconds before sshd tries to
create a listener on these addresses.  There's no good reason that
it should fail for the IPv6 address, except that avahi-daemon
is apparently doing "something" with the IPv6 address at this time.
Could that be the problem?  And if so, why?  And does anybody know
how to workaround this problem?


Thanks in advance,
Corinna


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Corinna Vinschen 2012-01-14 10:04:02 EST
Grr, I pressed the return key in the wrong spot so I failed to
edit the message before sending it.

Here's what's missing in the above report:

- The sshd.service file is untouched.  The dependencies are

   After=syslog.target network.target auditd.service

- I could workaround the issue by adding another dependency:

   After=syslog.target network.target auditd.service vncserver@:1.service

  Which is kind of surprising, because vncserver@:1.service depends only
  on a subset of the sshd.service dependencies:

    After=syslog.target network.target


Version-Release number of selected component (if applicable):

  systemd-37-3.fc16.x86_64
  openssh-server-5.8p2-23.fc16.x86_64
  avahi-0.6.30-4.fc16.x86_64

Expected results:

  The dependency on network.target should be enough to avoid this issue.
  avahi-daemon (*if* it's the culprit here) should not disallow to bind
  to an address while its registering it.


Corinna
Comment 2 Michal Schmidt 2012-01-14 16:42:07 EST
(In reply to comment #0)
>   12:32:22 network[962]: Bringing up interface br0:  [  OK  ]
>   12:32:24 avahi-daemon[1092]: Registering new address record for
>                                fc00::1 on br0.*.
>   12:32:24 avahi-daemon[1092]: Withdrawing address record for
>                                fe80::6250:40ff:fe30:2010 on br0.
> 
> So the network is supposed to be up 2 seconds before sshd tries to
> create a listener on these addresses.  There's no good reason that
> it should fail for the IPv6 address, except that avahi-daemon
> is apparently doing "something" with the IPv6 address at this time.

No, avahi is not the problem. In fact, avahi just reported correctly when the IPv6 address really becomes available to applications.

When an IPv6 address is assigned to an interface, it does not become usable immediately. It is in a "tentative" state for a moment for the purpose of a mandatory duplicate address detection.

You can see it with an experiment:
 ifdown eth0
 ifup eth0; ip addr show dev eth0
You'll see something like:
 ...
 inet6 fc00::1/64 scope global tentative
 ...
When you ask ip the same question again a little later, the "tentative" flag will be gone.

Ideally sshd would use IP_FREEBIND to avoid the need for ordering.
Reassigning to openssh.

These discussion threads are relevant:
http://lists.fedoraproject.org/pipermail/devel/2011-May/151272.html
http://lists.debian.org/debian-devel/2011/05/msg00801.html
Comment 3 Corinna Vinschen 2012-01-15 05:56:04 EST
Ah, I see.  But I don't think openssh is the culprit in the first place,
even if it might make sense to patch IP_FREEBIND into it.

The underlying problem is that the dependency mechanism is broken by the
fact that the initscripts network script reports success before all
addresses are actually available.  From my POV network should only
report success when all network addresses are available *and* left the
tentative state.  This would do the right thing in non-NetworkManager
scenarios which don't have to cope with dynamic addresses.


Corinna
Comment 4 Michal Schmidt 2012-01-16 06:58:24 EST
Cloned for initscripts as bug 782042.
Comment 5 Fedora End Of Life 2013-01-16 12:02:08 EST
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 6 Fedora End Of Life 2013-02-13 16:23:39 EST
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.