Bug 1313979 - openssh-server 7.1p2 4.fc23: No login possible
Summary: openssh-server 7.1p2 4.fc23: No login possible
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 23
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Hannes Frederic Sowa
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-02 18:16 UTC by Felix Homann
Modified: 2016-04-25 09:52 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-04 13:04:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sshd_config (4.24 KB, text/plain)
2016-03-03 10:15 UTC, Felix Homann
no flags Details
Log of sshd -Dddd -p 2222 (2.22 KB, text/plain)
2016-03-03 10:17 UTC, Felix Homann
no flags Details
perf.data as requeseted (1.44 MB, application/octet-stream)
2016-03-10 14:04 UTC, Felix Homann
no flags Details
perf script output (29.98 KB, text/plain)
2016-03-10 14:43 UTC, Felix Homann
no flags Details
perf script output as requested in comment #24 (24.87 KB, text/plain)
2016-03-11 16:25 UTC, Felix Homann
no flags Details
tcpdump output as requested in comment #24 (4.15 KB, text/plain)
2016-03-11 16:26 UTC, Felix Homann
no flags Details

Description Felix Homann 2016-03-02 18:16:10 UTC
Description of problem:

I can't login to any of my machines running openssh-server 7.1p2 4.fc23. When I try from another machine all I get is:

[fex@yerbouti openssh]$ ssh bootsy
ssh_exchange_identification: read: Connection reset by peer

Version-Release number of selected component (if applicable):
7.1p2 4.fc23

How reproducible:

Always.



Steps to Reproduce:
1. Try to ssh into a machine running openssh-server 7.1p2 4.fc23

Actual results:

See above.


Expected results:

Login.


Additional info:

Comment 1 Jakub Jelen 2016-03-03 08:08:03 UTC
Please, provide more information, at least logs from server (/var/log/secure). Also verbose logs (-vvv) from client might help to see what is going on there.

Are you able to connect if you downgrade to some older version?

Comment 2 Felix Homann 2016-03-03 08:14:15 UTC
Here's some more info:

1. Downgrading the openssh pacakges to their respective 7.1p1-3.fc23 fixed my issues. So it's indeed openssh 7.1p2-4.fc23 related and not something else.
2. Here's a more verbose output of a failed attempt to log in to a remote machine running 7.1p2-4.fc23:

[fex@yerbouti packages]$ ssh -vvv bootsy
OpenSSH_7.1p1, OpenSSL 1.0.2f-fips  28 Jan 2016
debug1: Reading configuration data /home/fex/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 56: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to bootsy [fd00::d27e:35ff:feb5:e6c2] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/fex/.ssh/id_rsa.fexnetz type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/fex/.ssh/id_rsa.fexnetz-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.1
ssh_exchange_identification: read: Connection reset by peer

Comment 3 Jakub Jelen 2016-03-03 08:33:31 UTC
Can you provide the server logs? Journal? Or logs with increased verbosity?

I expected that the client log will look exactly like this, but I wanted to make sure. But unfortunately there is nothing helpful.

I checked what changes in the code were made about pre-authentication, but I don't see any which could cause this behaviour.

Comment 4 Felix Homann 2016-03-03 08:44:35 UTC
tail -f /var/log/secure shows absolutely no activity when I try to login. What else might be helpful?

Comment 5 Felix Homann 2016-03-03 09:28:10 UTC
For testing purposes I fired up a second sshd in debug mode. Sadly, it didn't show anything either.

Comment 6 Jakub Jelen 2016-03-03 09:33:40 UTC
The easiest thing is   LogLevel DEBUG3   in   sshd_config, restart server and check what will fire in the /var/log/secure (or in   journalct -b).

With running server on debug mode, you need to make sure you start it such as

    /usr/sbin/sshd -Dddd

(and preferably on different port  -p 2222  )

Do you have any non-default configuration on your server? I still can't reproduce it.

Comment 7 Felix Homann 2016-03-03 10:15:08 UTC
Created attachment 1132714 [details]
sshd_config

Comment 8 Felix Homann 2016-03-03 10:17:15 UTC
Created attachment 1132719 [details]
Log of sshd -Dddd -p 2222

Comment 9 Felix Homann 2016-03-03 10:20:03 UTC
(In reply to Jakub Jelen from comment #6)

> With running server on debug mode, you need to make sure you start it such as
> 
>     /usr/sbin/sshd -Dddd
> 
> (and preferably on different port  -p 2222  )

See the log I've sent as an attachment. There's absolutely nothing when I try to log in.


> 
> Do you have any non-default configuration on your server?

I don't think so. I've attached the sshd_config


> I still can't reproduce it.

That's strange. I have 4 machines affected which I can't log in  to from any client (I have a couple of Debian systems here, too, which could not connect to the affected machines either.)

Comment 10 Jakub Jelen 2016-03-03 10:20:04 UTC
Comment on attachment 1132719 [details]
Log of sshd -Dddd -p 2222

I meant of course with an failed attempt to log in to the server on such port. In the configuration I don't see anything non-standard so far.

Comment 11 Felix Homann 2016-03-03 12:19:25 UTC
(In reply to Jakub Jelen from comment #10)
>
> I meant of course with an failed attempt to log in to the server on such
> port. In the configuration I don't see anything non-standard so far.


That one *is* with a failed attempt to log in. As I said: Absolutely nothing showing up.

Comment 12 Jakub Jelen 2016-03-03 13:09:06 UTC
That looks super-weird. Bumping log level of your regular running service also does not help to reveal some information in journal/log?

Are you able to connect locally from that machine to eliminate network issue, such as

    ssh -vvv localhost

from that machine?

Did you restart recently? If none of the above helped, then the last possibility is to run server under "strace -f". The connection is established so some process has to pick it up before the connection is reset.

Comment 13 Felix Homann 2016-03-03 18:12:20 UTC
(In reply to Jakub Jelen from comment #12)
> That looks super-weird. Bumping log level of your regular running service
> also does not help to reveal some information in journal/log?

I can't try at the moment. I'll let you know when I'm back at my machines.

> 
> Are you able to connect locally from that machine to eliminate network
> issue, such as
> 
>     ssh -vvv localhost
> 
> from that machine?

I can connect locally without any issues. Do you need the output anyway?

> 
> Did you restart recently?

Yes. More than once while trying to deal with this issue.


> If none of the above helped, then the last
> possibility is to run server under "strace -f".

I'll try.


Kind regards,
Felix

Comment 14 Felix Homann 2016-03-05 17:27:35 UTC
OK, it is *not* an openssh issue! 

In fact it is a kernel issue as this only happens on 4.4.x kernels no matter which version of openssh I have installed. On the other hand there's no issue on 4.3.5 kernels with either version of openssh.

I must have accidently booted a 4.3.5 kernel after downgrading the openssh packages so it seemed that this was an openssh issue. Sorry, for the noise!
What's the right way to deal with it now? Close this and file another issue report on the kernel or can it be moved somehow?

Kind regards,
Felix

Comment 15 Felix Homann 2016-03-05 17:33:12 UTC
Probably related to this:

https://bugzilla.kernel.org/show_bug.cgi?id=111041

Comment 16 Jakub Jelen 2016-03-06 14:37:26 UTC
The easiest is to change component from openssh to kernel. The kernel maintainers should take care of it (or close it if they have this issue already tracked somewhere).

I appreciate the hint not to reboot to new kernel :)

Comment 17 Felix Homann 2016-03-06 21:59:27 UTC
Thanks for taking care of the component change!

BTW, I have tested this with Fedora kernels 4.4.2 and 4.4.3 on x86_64. Both break ssh connections on all of my machines.

Comment 18 Josh Boyer 2016-03-07 14:48:00 UTC
Hannes, have you followed up this this recently at all?

Comment 19 Hannes Frederic Sowa 2016-03-10 10:46:36 UTC
Hi Josh,

no, not really so far.

Hi Felix,

It would be interesting where the packet drop happens. Would someone who could reproduce this be able to trace the connection initiation on the server?

perf record -e skb:kfree_skb -a -R -g sleep 100

It would be best if the system is isolated as much as possible during the test and then simply try some connection attempts towards the system which is running perf.

I hope I could get some more infos out of the perf script output.

Thanks!

Comment 20 Felix Homann 2016-03-10 14:04:12 UTC
Created attachment 1134896 [details]
perf.data as requeseted

Comment 21 Felix Homann 2016-03-10 14:05:42 UTC
(In reply to Hannes Frederic Sowa from comment #19)
> 
> It would be interesting where the packet drop happens. Would someone who
> could reproduce this be able to trace the connection initiation on the
> server?
> 
> perf record -e skb:kfree_skb -a -R -g sleep 100
> 

I've attached perf.data as requested. Please, let me know if you need more info.

Comment 22 Hannes Frederic Sowa 2016-03-10 14:27:10 UTC
Hello Felix,

please attach the perf report --stdio or perf script output as text file, as I don't have your kernel debug symbols handy thus can't decode them.

Thanks,
Hannes

Comment 23 Felix Homann 2016-03-10 14:43:11 UTC
Created attachment 1134909 [details]
perf script output

OK, sorry, I have absolutely no knowledge about perf. This is the output of 'perf script'.

Comment 24 Hannes Frederic Sowa 2016-03-10 15:26:59 UTC
Can you send me a tcpdump while the session breaks?

Also could you additionally do:

perf probe -a tcp_v4_send_reset
perf probe -a tcp_v6_send_reset
perf record -e skb:kfree_skb -e probe:* -a -R -g sleep 100

The last recordings didn't show that much, I need to narrow it down.

Thanks,
Hannes

Comment 25 Felix Homann 2016-03-11 11:16:06 UTC
OK, I've tried something I should have tried before: Using the ip address instead of the host name in the ssh command works.

So the issue is/seems to be be that my routers DHCP/DNS server gets confused when I switch from using a kernel <= 4.3.x to a kernel 4.4.x.
I don't know what exactly happens. In one case, though, I had a double entry for the respective machine's host name. Deleting the wrong one fixed it. But after booting into the 4.3.x kernel and then into the 4.4.x resulted in the same issue again. The other direction 4.4.x -> 4.3.x works without issues.

I've never had a similar issue with a new kernel. What has changed in the 4.4. in this respect?

Comment 26 Felix Homann 2016-03-11 16:18:36 UTC
(In reply to Felix Homann from comment #25)
> Deleting the wrong one fixed it.

This "fix" didn't survive a couple of reboots. I'm back to the original issues without double entries in the router's list of known devices.

I'll send the requested logs in a couple of minutes.

Comment 27 Felix Homann 2016-03-11 16:25:39 UTC
Created attachment 1135299 [details]
perf script output as requested in comment #24

Comment 28 Felix Homann 2016-03-11 16:26:32 UTC
Created attachment 1135300 [details]
tcpdump output as requested in comment #24

Comment 29 Hannes Frederic Sowa 2016-03-11 16:49:01 UTC
Thanks for the debugging output, I am looking over it right now. Can you also check if this same behavior happens with IPv4 instead of IPv6?

Thanks,
Hannes

Comment 30 Felix Homann 2016-03-11 17:17:06 UTC
(In reply to Hannes Frederic Sowa from comment #29)
> Thanks for the debugging output, I am looking over it right now. Can you
> also check if this same behavior happens with IPv4 instead of IPv6?

'ssh -6 [hostname]' shows the exact same behaviour.

Just to reemphasize: If I use the ip address instead of the hostname everything's running fine. It's only when using the hostname that the issue occurs. IP lookup seems to be OK though, as 'host [hostname]' shows the right ip address.

One more note: As mentioned in the kernel.org bug report I saw a very few occasional succesfull connections even running the 4.4.x kernel. I could not get "two in a row" though.

Regards,
Felix

Comment 31 Hannes Frederic Sowa 2016-03-11 20:48:45 UTC
(In reply to Felix Homann from comment #30)
> (In reply to Hannes Frederic Sowa from comment #29)
> > Thanks for the debugging output, I am looking over it right now. Can you
> > also check if this same behavior happens with IPv4 instead of IPv6?
> 
> 'ssh -6 [hostname]' shows the exact same behaviour.
> 
> Just to reemphasize: If I use the ip address instead of the hostname
> everything's running fine. It's only when using the hostname that the issue
> occurs. IP lookup seems to be OK though, as 'host [hostname]' shows the
> right ip address.

This is highly strange.

Can you strace -e connect ssh .... with IP address and hostname and check for differences.

> One more note: As mentioned in the kernel.org bug report I saw a very few
> occasional succesfull connections even running the 4.4.x kernel. I could not
> get "two in a row" though.

Okay, thanks.

The traces show that most probably your socket wasn't found in the kernel and thus a reset was generated. This is highly strange.

Comment 32 Felix Homann 2016-03-16 13:40:08 UTC
(In reply to Hannes Frederic Sowa from comment #31)
> (In reply to Felix Homann from comment #30)
> 
> Can you strace -e connect ssh .... with IP address and hostname and check
> for differences.

Here's a sample session:

[fex@yerbouti ~]$ host bootsy
bootsy.fritz.box has address 192.168.178.103
bootsy.fritz.box has IPv6 address fd00::d27e:35ff:feb5:e6c2
[fex@yerbouti ~]$ strace -e connect ssh bootsy
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("192.168.178.103")}, 16) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
ssh_exchange_identification: read: Connection reset by peer
+++ exited with 255 +++
[fex@yerbouti ~]$ strace -e connect ssh 192.168.178.103
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("192.168.178.103")}, 16) = 0
connect(4, {sa_family=AF_LOCAL, sun_path="/tmp/ssh-Ko149SqCI4cV/agent.5175"}, 110) = 0
Last login: Wed Mar 16 14:37:11 2016 from 192.168.178.71
[fex@bootsy ~]$ 


Does this help?

Comment 33 Hannes Frederic Sowa 2016-03-16 14:02:25 UTC
(In reply to Felix Homann from comment #32)
> [fex@yerbouti ~]$ host bootsy
> bootsy.fritz.box has address 192.168.178.103
> bootsy.fritz.box has IPv6 address fd00::d27e:35ff:feb5:e6c2
> [fex@yerbouti ~]$ strace -e connect ssh bootsy
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_INET, sin_port=htons(53),
> sin_addr=inet_addr("192.168.178.1")}, 16) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(22),
> sin_addr=inet_addr("192.168.178.103")}, 16) = 0
> connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6,
> "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},
> 28) = 0
> connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6,
> "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0},
> 28) = 0
> ssh_exchange_identification: read: Connection reset by peer
> +++ exited with 255 +++
> [fex@yerbouti ~]$ strace -e connect ssh 192.168.178.103
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1
> ENOENT (No such file or directory)
> connect(3, {sa_family=AF_INET, sin_port=htons(22),
> sin_addr=inet_addr("192.168.178.103")}, 16) = 0
> connect(4, {sa_family=AF_LOCAL,
> sun_path="/tmp/ssh-Ko149SqCI4cV/agent.5175"}, 110) = 0
> Last login: Wed Mar 16 14:37:11 2016 from 192.168.178.71
> [fex@bootsy ~]$ 
> 
> 
> Does this help?

Okay, this basically shows that IPv6 connectivity is broken but IPv4 isn't. The first connects use an IPv6 address, your second test an IPv4 address.

Is basic IPv6 connectivity given between both hosts? E.g. with ping6?

ping6 fd00::d27e:35ff:feb5:e6c2

Thanks,
Hannes

Comment 34 Felix Homann 2016-03-16 14:26:27 UTC
(In reply to Hannes Frederic Sowa from comment #33)
> 
> Is basic IPv6 connectivity given between both hosts? E.g. with ping6?
> 
> ping6 fd00::d27e:35ff:feb5:e6c2

There's no problem with ping6.

Kind regards,
Felix

Comment 35 Hannes Frederic Sowa 2016-03-24 13:28:24 UTC
Hello Felix,

sorry for the delay.

can you please update your kernel. I think the problem had to do with redirects and this should be fixed in 4.4.6.

Thanks,
Hannes

Comment 36 Felix Homann 2016-03-24 14:48:53 UTC
(In reply to Hannes Frederic Sowa from comment #35)
>
> can you please update your kernel. I think the problem had to do with
> redirects and this should be fixed in 4.4.6.

Yes, seems to be fixed! I could only check on one machine, though, so maybe wait another day before closing this issue.

Kind regards,
Felix

Comment 37 Hannes Frederic Sowa 2016-03-24 16:43:09 UTC
Thanks for checking. Let's wait until after the weekend. ;)

For reference:
commit 9cf7490360bf2c46a16b7525f899e4970c5fc144
Author: Eric Dumazet <edumazet>
Date:   Tue Feb 2 19:31:12 2016 -0800

    tcp: do not drop syn_recv on all icmp reports

Comment 38 Felix Homann 2016-04-04 12:15:58 UTC
OK, it's fixed on all of my machines with 4.4.6 kernel.

Comment 39 Hannes Frederic Sowa 2016-04-04 13:04:42 UTC
Thanks for your feedback!

Comment 40 Andrej Podzimek 2016-04-25 09:04:12 UTC
Is there a known workaround for a chicken-and-egg situation, i.e., a distant machine that can't be upgraded without SSH access?

Comment 41 Hannes Frederic Sowa 2016-04-25 09:52:05 UTC
Sorry, no, not as I know of.


Note You need to log in before you can comment on or make changes to this bug.