Description of problem: I can't login to any of my machines running openssh-server 7.1p2 4.fc23. When I try from another machine all I get is: [fex@yerbouti openssh]$ ssh bootsy ssh_exchange_identification: read: Connection reset by peer Version-Release number of selected component (if applicable): 7.1p2 4.fc23 How reproducible: Always. Steps to Reproduce: 1. Try to ssh into a machine running openssh-server 7.1p2 4.fc23 Actual results: See above. Expected results: Login. Additional info:
Please, provide more information, at least logs from server (/var/log/secure). Also verbose logs (-vvv) from client might help to see what is going on there. Are you able to connect if you downgrade to some older version?
Here's some more info: 1. Downgrading the openssh pacakges to their respective 7.1p1-3.fc23 fixed my issues. So it's indeed openssh 7.1p2-4.fc23 related and not something else. 2. Here's a more verbose output of a failed attempt to log in to a remote machine running 7.1p2-4.fc23: [fex@yerbouti packages]$ ssh -vvv bootsy OpenSSH_7.1p1, OpenSSL 1.0.2f-fips 28 Jan 2016 debug1: Reading configuration data /home/fex/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 56: Applying options for * debug2: ssh_connect: needpriv 0 debug1: Connecting to bootsy [fd00::d27e:35ff:feb5:e6c2] port 22. debug1: Connection established. debug1: key_load_public: No such file or directory debug1: identity file /home/fex/.ssh/id_rsa.fexnetz type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/fex/.ssh/id_rsa.fexnetz-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.1 ssh_exchange_identification: read: Connection reset by peer
Can you provide the server logs? Journal? Or logs with increased verbosity? I expected that the client log will look exactly like this, but I wanted to make sure. But unfortunately there is nothing helpful. I checked what changes in the code were made about pre-authentication, but I don't see any which could cause this behaviour.
tail -f /var/log/secure shows absolutely no activity when I try to login. What else might be helpful?
For testing purposes I fired up a second sshd in debug mode. Sadly, it didn't show anything either.
The easiest thing is LogLevel DEBUG3 in sshd_config, restart server and check what will fire in the /var/log/secure (or in journalct -b). With running server on debug mode, you need to make sure you start it such as /usr/sbin/sshd -Dddd (and preferably on different port -p 2222 ) Do you have any non-default configuration on your server? I still can't reproduce it.
Created attachment 1132714 [details] sshd_config
Created attachment 1132719 [details] Log of sshd -Dddd -p 2222
(In reply to Jakub Jelen from comment #6) > With running server on debug mode, you need to make sure you start it such as > > /usr/sbin/sshd -Dddd > > (and preferably on different port -p 2222 ) See the log I've sent as an attachment. There's absolutely nothing when I try to log in. > > Do you have any non-default configuration on your server? I don't think so. I've attached the sshd_config > I still can't reproduce it. That's strange. I have 4 machines affected which I can't log in to from any client (I have a couple of Debian systems here, too, which could not connect to the affected machines either.)
Comment on attachment 1132719 [details] Log of sshd -Dddd -p 2222 I meant of course with an failed attempt to log in to the server on such port. In the configuration I don't see anything non-standard so far.
(In reply to Jakub Jelen from comment #10) > > I meant of course with an failed attempt to log in to the server on such > port. In the configuration I don't see anything non-standard so far. That one *is* with a failed attempt to log in. As I said: Absolutely nothing showing up.
That looks super-weird. Bumping log level of your regular running service also does not help to reveal some information in journal/log? Are you able to connect locally from that machine to eliminate network issue, such as ssh -vvv localhost from that machine? Did you restart recently? If none of the above helped, then the last possibility is to run server under "strace -f". The connection is established so some process has to pick it up before the connection is reset.
(In reply to Jakub Jelen from comment #12) > That looks super-weird. Bumping log level of your regular running service > also does not help to reveal some information in journal/log? I can't try at the moment. I'll let you know when I'm back at my machines. > > Are you able to connect locally from that machine to eliminate network > issue, such as > > ssh -vvv localhost > > from that machine? I can connect locally without any issues. Do you need the output anyway? > > Did you restart recently? Yes. More than once while trying to deal with this issue. > If none of the above helped, then the last > possibility is to run server under "strace -f". I'll try. Kind regards, Felix
OK, it is *not* an openssh issue! In fact it is a kernel issue as this only happens on 4.4.x kernels no matter which version of openssh I have installed. On the other hand there's no issue on 4.3.5 kernels with either version of openssh. I must have accidently booted a 4.3.5 kernel after downgrading the openssh packages so it seemed that this was an openssh issue. Sorry, for the noise! What's the right way to deal with it now? Close this and file another issue report on the kernel or can it be moved somehow? Kind regards, Felix
Probably related to this: https://bugzilla.kernel.org/show_bug.cgi?id=111041
The easiest is to change component from openssh to kernel. The kernel maintainers should take care of it (or close it if they have this issue already tracked somewhere). I appreciate the hint not to reboot to new kernel :)
Thanks for taking care of the component change! BTW, I have tested this with Fedora kernels 4.4.2 and 4.4.3 on x86_64. Both break ssh connections on all of my machines.
Hannes, have you followed up this this recently at all?
Hi Josh, no, not really so far. Hi Felix, It would be interesting where the packet drop happens. Would someone who could reproduce this be able to trace the connection initiation on the server? perf record -e skb:kfree_skb -a -R -g sleep 100 It would be best if the system is isolated as much as possible during the test and then simply try some connection attempts towards the system which is running perf. I hope I could get some more infos out of the perf script output. Thanks!
Created attachment 1134896 [details] perf.data as requeseted
(In reply to Hannes Frederic Sowa from comment #19) > > It would be interesting where the packet drop happens. Would someone who > could reproduce this be able to trace the connection initiation on the > server? > > perf record -e skb:kfree_skb -a -R -g sleep 100 > I've attached perf.data as requested. Please, let me know if you need more info.
Hello Felix, please attach the perf report --stdio or perf script output as text file, as I don't have your kernel debug symbols handy thus can't decode them. Thanks, Hannes
Created attachment 1134909 [details] perf script output OK, sorry, I have absolutely no knowledge about perf. This is the output of 'perf script'.
Can you send me a tcpdump while the session breaks? Also could you additionally do: perf probe -a tcp_v4_send_reset perf probe -a tcp_v6_send_reset perf record -e skb:kfree_skb -e probe:* -a -R -g sleep 100 The last recordings didn't show that much, I need to narrow it down. Thanks, Hannes
OK, I've tried something I should have tried before: Using the ip address instead of the host name in the ssh command works. So the issue is/seems to be be that my routers DHCP/DNS server gets confused when I switch from using a kernel <= 4.3.x to a kernel 4.4.x. I don't know what exactly happens. In one case, though, I had a double entry for the respective machine's host name. Deleting the wrong one fixed it. But after booting into the 4.3.x kernel and then into the 4.4.x resulted in the same issue again. The other direction 4.4.x -> 4.3.x works without issues. I've never had a similar issue with a new kernel. What has changed in the 4.4. in this respect?
(In reply to Felix Homann from comment #25) > Deleting the wrong one fixed it. This "fix" didn't survive a couple of reboots. I'm back to the original issues without double entries in the router's list of known devices. I'll send the requested logs in a couple of minutes.
Created attachment 1135299 [details] perf script output as requested in comment #24
Created attachment 1135300 [details] tcpdump output as requested in comment #24
Thanks for the debugging output, I am looking over it right now. Can you also check if this same behavior happens with IPv4 instead of IPv6? Thanks, Hannes
(In reply to Hannes Frederic Sowa from comment #29) > Thanks for the debugging output, I am looking over it right now. Can you > also check if this same behavior happens with IPv4 instead of IPv6? 'ssh -6 [hostname]' shows the exact same behaviour. Just to reemphasize: If I use the ip address instead of the hostname everything's running fine. It's only when using the hostname that the issue occurs. IP lookup seems to be OK though, as 'host [hostname]' shows the right ip address. One more note: As mentioned in the kernel.org bug report I saw a very few occasional succesfull connections even running the 4.4.x kernel. I could not get "two in a row" though. Regards, Felix
(In reply to Felix Homann from comment #30) > (In reply to Hannes Frederic Sowa from comment #29) > > Thanks for the debugging output, I am looking over it right now. Can you > > also check if this same behavior happens with IPv4 instead of IPv6? > > 'ssh -6 [hostname]' shows the exact same behaviour. > > Just to reemphasize: If I use the ip address instead of the hostname > everything's running fine. It's only when using the hostname that the issue > occurs. IP lookup seems to be OK though, as 'host [hostname]' shows the > right ip address. This is highly strange. Can you strace -e connect ssh .... with IP address and hostname and check for differences. > One more note: As mentioned in the kernel.org bug report I saw a very few > occasional succesfull connections even running the 4.4.x kernel. I could not > get "two in a row" though. Okay, thanks. The traces show that most probably your socket wasn't found in the kernel and thus a reset was generated. This is highly strange.
(In reply to Hannes Frederic Sowa from comment #31) > (In reply to Felix Homann from comment #30) > > Can you strace -e connect ssh .... with IP address and hostname and check > for differences. Here's a sample session: [fex@yerbouti ~]$ host bootsy bootsy.fritz.box has address 192.168.178.103 bootsy.fritz.box has IPv6 address fd00::d27e:35ff:feb5:e6c2 [fex@yerbouti ~]$ strace -e connect ssh bootsy connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("192.168.178.103")}, 16) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 ssh_exchange_identification: read: Connection reset by peer +++ exited with 255 +++ [fex@yerbouti ~]$ strace -e connect ssh 192.168.178.103 connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("192.168.178.103")}, 16) = 0 connect(4, {sa_family=AF_LOCAL, sun_path="/tmp/ssh-Ko149SqCI4cV/agent.5175"}, 110) = 0 Last login: Wed Mar 16 14:37:11 2016 from 192.168.178.71 [fex@bootsy ~]$ Does this help?
(In reply to Felix Homann from comment #32) > [fex@yerbouti ~]$ host bootsy > bootsy.fritz.box has address 192.168.178.103 > bootsy.fritz.box has IPv6 address fd00::d27e:35ff:feb5:e6c2 > [fex@yerbouti ~]$ strace -e connect ssh bootsy > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_INET, sin_port=htons(53), > sin_addr=inet_addr("192.168.178.1")}, 16) = 0 > connect(3, {sa_family=AF_INET, sin_port=htons(22), > sin_addr=inet_addr("192.168.178.103")}, 16) = 0 > connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, > "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, > 28) = 0 > connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, > "fd00::d27e:35ff:feb5:e6c2", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, > 28) = 0 > ssh_exchange_identification: read: Connection reset by peer > +++ exited with 255 +++ > [fex@yerbouti ~]$ strace -e connect ssh 192.168.178.103 > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 > ENOENT (No such file or directory) > connect(3, {sa_family=AF_INET, sin_port=htons(22), > sin_addr=inet_addr("192.168.178.103")}, 16) = 0 > connect(4, {sa_family=AF_LOCAL, > sun_path="/tmp/ssh-Ko149SqCI4cV/agent.5175"}, 110) = 0 > Last login: Wed Mar 16 14:37:11 2016 from 192.168.178.71 > [fex@bootsy ~]$ > > > Does this help? Okay, this basically shows that IPv6 connectivity is broken but IPv4 isn't. The first connects use an IPv6 address, your second test an IPv4 address. Is basic IPv6 connectivity given between both hosts? E.g. with ping6? ping6 fd00::d27e:35ff:feb5:e6c2 Thanks, Hannes
(In reply to Hannes Frederic Sowa from comment #33) > > Is basic IPv6 connectivity given between both hosts? E.g. with ping6? > > ping6 fd00::d27e:35ff:feb5:e6c2 There's no problem with ping6. Kind regards, Felix
Hello Felix, sorry for the delay. can you please update your kernel. I think the problem had to do with redirects and this should be fixed in 4.4.6. Thanks, Hannes
(In reply to Hannes Frederic Sowa from comment #35) > > can you please update your kernel. I think the problem had to do with > redirects and this should be fixed in 4.4.6. Yes, seems to be fixed! I could only check on one machine, though, so maybe wait another day before closing this issue. Kind regards, Felix
Thanks for checking. Let's wait until after the weekend. ;) For reference: commit 9cf7490360bf2c46a16b7525f899e4970c5fc144 Author: Eric Dumazet <edumazet> Date: Tue Feb 2 19:31:12 2016 -0800 tcp: do not drop syn_recv on all icmp reports
OK, it's fixed on all of my machines with 4.4.6 kernel.
Thanks for your feedback!
Is there a known workaround for a chicken-and-egg situation, i.e., a distant machine that can't be upgraded without SSH access?
Sorry, no, not as I know of.