Bug 2140380

Summary: ssh times out connecting to github.com, due to QoS
Product: [Fedora] Fedora Reporter: Yedidyah Bar David <didi>
Component: opensshAssignee: Dmitry Belyavskiy <dbelyavs>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: crypto-team, dbelyavs, dwalsh, jjelen, lkundrak, mattias.ellert, tm
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-25 15:51:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2022-11-06 08:33:28 UTC
Description of problem:

$ ssh -v github.com
OpenSSH_8.8p1, OpenSSL 3.0.5 5 Jul 2022
debug1: Reading configuration data /home/bardavid/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: configuration requests final Match pass
debug1: re-parsing configuration
debug1: Reading configuration data /home/bardavid/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: Connecting to github.com [140.82.121.4] port 22.
debug1: connect to address 140.82.121.4 port 22: Connection timed out
ssh: connect to host github.com port 22: Connection timed out

Version-Release number of selected component (if applicable):
openssh-8.8p1-1.fc36.1.x86_64

How reproducible:
Not sure, currently always - tried from two machines

Steps to Reproduce:
1. ssh -v github.com
2.
3.

Actual results:
Times out

Expected results:
Connects

Additional info:

I ran into this while trying to add an ssh remote to an existing git repo and failed using it. Searched the net through a ton of suggested solutions that did not work, including trying port 443. Then tried 'telnet github.com 22' and it did connect. This made me suspect that the problem is actually with the ssh binary itself. Then tried the following that does work for me:

Workaround:

$ cat ~/.ssh/config 
Host github.com
        Hostname ssh.github.com
        ProxyCommand nc %h %p

Meaning, if the actual connection is done using nc, and not the ssh binary itself, it succeeds.

I didn't get new entries in audit.log.

strace didn't help me much - I didn't go through all of its output, but it did try to connect to the correct IP address (well, the same one that nc/telnet do manage to connect to - it has a short TTL and changes constantly):

15357 10:23:10.668414 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("140.82.121.4")}, 16) = -1 ETIMEDOUT (Connection timed out)

ssh to other machines (tried only ones on the local net) does work.

I did find one similar report [1] - where ssh fails and telnet succeeds - but he later wrote that it was due to invalid routing by his ISP (and I personally can't understand how invalid routing can affect only ssh and not telnet).

I should add, though, that I switched ISPs some months ago, and that this (git remote to github/ssh) did work previously on the same machine. So in theory might be related (but I never ran into any other similar issue with the new ISP). Perhaps this is due to some low-level difference between how ssh and nc/telnet behave? I didn't try to sniff and compare (yet).

[1] https://stackoverflow.com/questions/71190590/ssh-connect-to-host-github-com-port-22-and-also-443-connection-timed-out

Comment 1 Yedidyah Bar David 2022-11-06 08:38:28 UTC
To clarify: The workaround does work also without pointing at ssh.github.com. This works:

Host github.com
        ProxyCommand nc %h %p

Sorry for the noise.

Comment 2 Yedidyah Bar David 2022-11-06 09:40:56 UTC
Also: A RHEL 8 machine with openssh-8.0p1-13.el8.x86_64, on the same LAN, does work.

Comment 3 Dmitry Belyavskiy 2022-11-08 12:23:31 UTC
Could you please check that it wasn't a temporary issue? If not, could you please attach a more verbose log?

Comment 4 Yedidyah Bar David 2022-11-08 15:30:41 UTC
(In reply to Dmitry Belyavskiy from comment #3)
> Could you please check that it wasn't a temporary issue?

Sadly, it wasn't.

> If not, could you
> please attach a more verbose log?

$ ssh -vvv github.com
OpenSSH_8.8p1, OpenSSL 3.0.5 5 Jul 2022
debug1: Reading configuration data /home/bardavid/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug2: checking match for 'final all' host github.com originally github.com
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: not matched 'final'
debug2: match not found
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-]
debug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512]
debug1: configuration requests final Match pass
debug1: re-parsing configuration
debug1: Reading configuration data /home/bardavid/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug2: checking match for 'final all' host github.com originally github.com
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: matched 'final'
debug2: match found
debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-]
debug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512]
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/home/bardavid/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/home/bardavid/.ssh/known_hosts2'
debug2: resolving "github.com" port 22
debug3: resolve_host: lookup github.com:22
debug3: ssh_connect_direct: entering
debug1: Connecting to github.com [140.82.121.4] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48

Comment 5 Jakub Jelen 2022-11-08 16:26:59 UTC
Did you try to change the IPQoS? If there is some not-much-clever router on the way, which might be rejecting some of the packets. Similar issue is described for example here:

https://github.com/tangowhisky37/LetsHack/blob/master/howto/SSH-Hangs-On-Launch.md

Comment 6 Yedidyah Bar David 2022-11-09 06:47:59 UTC
(In reply to Jakub Jelen from comment #5)
> Did you try to change the IPQoS? If there is some not-much-clever router on
> the way, which might be rejecting some of the packets. Similar issue is
> described for example here:
> 
> https://github.com/tangowhisky37/LetsHack/blob/master/howto/SSH-Hangs-On-
> Launch.md

Bingo :-). Thanks!

Keeping the bug open, because I think we might want to consider changing the default, until the Internet-at-Large works better with custom QoS.

Comment 7 Yedidyah Bar David 2022-11-09 06:50:11 UTC
I now tried this from a CentOS Stream 9 machine inside the Red Hat network, and it's the same, so not only my own or my ISP's routers - plain ssh to github.com gets stuck, with '-o IPQoS=0x00' it connects. I can provide access details to this machine in private, if needed.

Comment 8 Yedidyah Bar David 2022-11-09 06:55:21 UTC
Sorry, please ignore comment 7. It does connect, only slightly slowly, in some attempts. I might try later to create a CS9 machine at home, but I suspect it will be the same - at least the manpages of fc36 and cs9 say the defaults are the same.

Comment 9 Jakub Jelen 2022-11-09 08:30:18 UTC
I am not sure what are defaults now and what defaults were in the previous versions, but I remember there were some changes and some people were hitting this previously. I do not think we will want to revert as it really works for the majority of well-configured networks, but I will leave this up to Dima to decide as he is in charge of OpenSSH now.

Comment 10 Yedidyah Bar David 2022-11-09 10:16:13 UTC
(In reply to Jakub Jelen from comment #9)
> I am not sure what are defaults now

According to strace, in both fc36 and cs9, the default is 0x48 (72 decimal, as strace shows it).

IIUC that's IPTOS_DSCP_AF21, indeed corresponding to what the man page says ("The default is af21 (Low-Latency Data) for interactive sessions").

> and what defaults were in the previous
> versions,

According to strace, on RHEL8, setsockopt was not called on the socket with SOL_IP, thus not setting the TOS field at all - so no default.

That said, the config item IPQoS does appear in the man pages, and gets parsed - from both the config and the command-line - if I pass a bad value I do get an error, e.g.:

/etc/ssh/ssh_config.d/99-test.conf line 1: Bad IPQoS value: xyz

Checking further, I found:

https://github.com/openssh/openssh-portable/commit/33313ebc1c7135085676db62189e3520341d6b73

Apparently, the code actually setting TOS was not called before that (search for IP_TOS_IS_BROKEN). This patch is only included in openssh 8.5, so not in RHEL 8 (which has 8.0). 8.5 was released (there), it seems, only on March 2021, not that long ago.

> but I remember there were some changes and some people were
> hitting this previously. I do not think we will want to revert as it really
> works for the majority of well-configured networks,

The question is whether we want the risk of stumbling on this bug on a broken network, if the main advantage is potentially slightly better interactivity on congested networks (no idea if anyone actually checked this on the live Internet over enough different links/conditions to get meaningful statistics).

> but I will leave this up
> to Dima to decide as he is in charge of OpenSSH now.

OK. Thanks for commenting!

Comment 11 Yedidyah Bar David 2022-11-09 10:46:42 UTC
(In reply to Yedidyah Bar David from comment #10)
> https://github.com/openssh/openssh-portable/commit/
> 33313ebc1c7135085676db62189e3520341d6b73
> 
> Apparently, the code actually setting TOS was not called before that (search
> for IP_TOS_IS_BROKEN).

Most likely I got it wrong - this logic still exists in current master, in misc.c:set_sock_tos. Still can't figure out why I can't make ssh on RHEL8 set the socket option.

Comment 12 Dmitry Belyavskiy 2023-03-16 15:49:27 UTC
Is this issue still relevant or can it be closed?

Comment 13 Ben Cotton 2023-04-25 18:09:51 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 14 Yedidyah Bar David 2023-04-27 06:40:18 UTC
I didn't yet try to upgrade to 38, but the bug still happens on 37.

The fact that there are no reports here from other people means it's likely a rather local issue with my ISP, or something like that.

Also, that probably few people use ssh directly over the Internet these days - as opposed to over a VPN - with github being a significant exception.

I tried for a few minutes finding information about problems with QoS over the Internet and failed to find a comprehensive report.

So if you prefer to close this bug, I'll understand.

I am also changing its subject for clarity.

Thanks!

Comment 15 Yedidyah Bar David 2023-04-27 06:44:44 UTC
That said, I suppose a better solution, even if quite a lot more complex, would be to patch ssh to try resetting IPQoS to 0, if it times out trying to connect with != 0. And also make it log this at a higher logging level, so it's more visible.

Comment 16 Ludek Smid 2023-05-25 15:51:38 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.