Bug 2140380
| Summary: | ssh times out connecting to github.com, due to QoS | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Yedidyah Bar David <didi> |
| Component: | openssh | Assignee: | Dmitry Belyavskiy <dbelyavs> |
| Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 36 | CC: | crypto-team, dbelyavs, dwalsh, jjelen, lkundrak, mattias.ellert, tm |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-25 15:51:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
To clarify: The workaround does work also without pointing at ssh.github.com. This works:
Host github.com
ProxyCommand nc %h %p
Sorry for the noise.
Also: A RHEL 8 machine with openssh-8.0p1-13.el8.x86_64, on the same LAN, does work. Could you please check that it wasn't a temporary issue? If not, could you please attach a more verbose log? (In reply to Dmitry Belyavskiy from comment #3) > Could you please check that it wasn't a temporary issue? Sadly, it wasn't. > If not, could you > please attach a more verbose log? $ ssh -vvv github.com OpenSSH_8.8p1, OpenSSL 3.0.5 5 Jul 2022 debug1: Reading configuration data /home/bardavid/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0 debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf debug2: checking match for 'final all' host github.com originally github.com debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: not matched 'final' debug2: match not found debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only) debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-] debug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512] debug1: configuration requests final Match pass debug1: re-parsing configuration debug1: Reading configuration data /home/bardavid/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug3: /etc/ssh/ssh_config line 55: Including file /etc/ssh/ssh_config.d/50-redhat.conf depth 0 debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf debug2: checking match for 'final all' host github.com originally github.com debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 3: matched 'final' debug2: match found debug3: /etc/ssh/ssh_config.d/50-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config debug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-] debug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512] debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/home/bardavid/.ssh/known_hosts' debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/home/bardavid/.ssh/known_hosts2' debug2: resolving "github.com" port 22 debug3: resolve_host: lookup github.com:22 debug3: ssh_connect_direct: entering debug1: Connecting to github.com [140.82.121.4] port 22. debug3: set_sock_tos: set socket 3 IP_TOS 0x48 Did you try to change the IPQoS? If there is some not-much-clever router on the way, which might be rejecting some of the packets. Similar issue is described for example here: https://github.com/tangowhisky37/LetsHack/blob/master/howto/SSH-Hangs-On-Launch.md (In reply to Jakub Jelen from comment #5) > Did you try to change the IPQoS? If there is some not-much-clever router on > the way, which might be rejecting some of the packets. Similar issue is > described for example here: > > https://github.com/tangowhisky37/LetsHack/blob/master/howto/SSH-Hangs-On- > Launch.md Bingo :-). Thanks! Keeping the bug open, because I think we might want to consider changing the default, until the Internet-at-Large works better with custom QoS. I now tried this from a CentOS Stream 9 machine inside the Red Hat network, and it's the same, so not only my own or my ISP's routers - plain ssh to github.com gets stuck, with '-o IPQoS=0x00' it connects. I can provide access details to this machine in private, if needed. Sorry, please ignore comment 7. It does connect, only slightly slowly, in some attempts. I might try later to create a CS9 machine at home, but I suspect it will be the same - at least the manpages of fc36 and cs9 say the defaults are the same. I am not sure what are defaults now and what defaults were in the previous versions, but I remember there were some changes and some people were hitting this previously. I do not think we will want to revert as it really works for the majority of well-configured networks, but I will leave this up to Dima to decide as he is in charge of OpenSSH now. (In reply to Jakub Jelen from comment #9) > I am not sure what are defaults now According to strace, in both fc36 and cs9, the default is 0x48 (72 decimal, as strace shows it). IIUC that's IPTOS_DSCP_AF21, indeed corresponding to what the man page says ("The default is af21 (Low-Latency Data) for interactive sessions"). > and what defaults were in the previous > versions, According to strace, on RHEL8, setsockopt was not called on the socket with SOL_IP, thus not setting the TOS field at all - so no default. That said, the config item IPQoS does appear in the man pages, and gets parsed - from both the config and the command-line - if I pass a bad value I do get an error, e.g.: /etc/ssh/ssh_config.d/99-test.conf line 1: Bad IPQoS value: xyz Checking further, I found: https://github.com/openssh/openssh-portable/commit/33313ebc1c7135085676db62189e3520341d6b73 Apparently, the code actually setting TOS was not called before that (search for IP_TOS_IS_BROKEN). This patch is only included in openssh 8.5, so not in RHEL 8 (which has 8.0). 8.5 was released (there), it seems, only on March 2021, not that long ago. > but I remember there were some changes and some people were > hitting this previously. I do not think we will want to revert as it really > works for the majority of well-configured networks, The question is whether we want the risk of stumbling on this bug on a broken network, if the main advantage is potentially slightly better interactivity on congested networks (no idea if anyone actually checked this on the live Internet over enough different links/conditions to get meaningful statistics). > but I will leave this up > to Dima to decide as he is in charge of OpenSSH now. OK. Thanks for commenting! (In reply to Yedidyah Bar David from comment #10) > https://github.com/openssh/openssh-portable/commit/ > 33313ebc1c7135085676db62189e3520341d6b73 > > Apparently, the code actually setting TOS was not called before that (search > for IP_TOS_IS_BROKEN). Most likely I got it wrong - this logic still exists in current master, in misc.c:set_sock_tos. Still can't figure out why I can't make ssh on RHEL8 set the socket option. Is this issue still relevant or can it be closed? This message is a reminder that Fedora Linux 36 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '36'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 36 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. I didn't yet try to upgrade to 38, but the bug still happens on 37. The fact that there are no reports here from other people means it's likely a rather local issue with my ISP, or something like that. Also, that probably few people use ssh directly over the Internet these days - as opposed to over a VPN - with github being a significant exception. I tried for a few minutes finding information about problems with QoS over the Internet and failed to find a comprehensive report. So if you prefer to close this bug, I'll understand. I am also changing its subject for clarity. Thanks! That said, I suppose a better solution, even if quite a lot more complex, would be to patch ssh to try resetting IPQoS to 0, if it times out trying to connect with != 0. And also make it log this at a higher logging level, so it's more visible. Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16. Fedora Linux 36 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. |
Description of problem: $ ssh -v github.com OpenSSH_8.8p1, OpenSSL 3.0.5 5 Jul 2022 debug1: Reading configuration data /home/bardavid/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config debug1: configuration requests final Match pass debug1: re-parsing configuration debug1: Reading configuration data /home/bardavid/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config debug1: Connecting to github.com [140.82.121.4] port 22. debug1: connect to address 140.82.121.4 port 22: Connection timed out ssh: connect to host github.com port 22: Connection timed out Version-Release number of selected component (if applicable): openssh-8.8p1-1.fc36.1.x86_64 How reproducible: Not sure, currently always - tried from two machines Steps to Reproduce: 1. ssh -v github.com 2. 3. Actual results: Times out Expected results: Connects Additional info: I ran into this while trying to add an ssh remote to an existing git repo and failed using it. Searched the net through a ton of suggested solutions that did not work, including trying port 443. Then tried 'telnet github.com 22' and it did connect. This made me suspect that the problem is actually with the ssh binary itself. Then tried the following that does work for me: Workaround: $ cat ~/.ssh/config Host github.com Hostname ssh.github.com ProxyCommand nc %h %p Meaning, if the actual connection is done using nc, and not the ssh binary itself, it succeeds. I didn't get new entries in audit.log. strace didn't help me much - I didn't go through all of its output, but it did try to connect to the correct IP address (well, the same one that nc/telnet do manage to connect to - it has a short TTL and changes constantly): 15357 10:23:10.668414 connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("140.82.121.4")}, 16) = -1 ETIMEDOUT (Connection timed out) ssh to other machines (tried only ones on the local net) does work. I did find one similar report [1] - where ssh fails and telnet succeeds - but he later wrote that it was due to invalid routing by his ISP (and I personally can't understand how invalid routing can affect only ssh and not telnet). I should add, though, that I switched ISPs some months ago, and that this (git remote to github/ssh) did work previously on the same machine. So in theory might be related (but I never ran into any other similar issue with the new ISP). Perhaps this is due to some low-level difference between how ssh and nc/telnet behave? I didn't try to sniff and compare (yet). [1] https://stackoverflow.com/questions/71190590/ssh-connect-to-host-github-com-port-22-and-also-443-connection-timed-out