Bug 1624437 - 7.8p1-1.fc28 breaks ssh connections to other systems when client is running on VMware Player/Workstation
Summary: 7.8p1-1.fc28 breaks ssh connections to other systems when client is running o...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: openssh
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1651763
TreeView+ depends on / blocked
 
Reported: 2018-08-31 14:53 UTC by Florian Bezdeka
Modified: 2019-05-28 23:47 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1651763 (view as bug list)
Environment:
Last Closed: 2019-05-28 23:47:19 UTC
Type: Bug
Embargoed:
yanjin: needinfo-


Attachments (Terms of Use)
tcpdump with connection reset (deleted)
2018-08-31 14:53 UTC, Florian Bezdeka
no flags Details
tcpdump of old version (working) (14.36 KB, application/octet-stream)
2018-08-31 14:54 UTC, Florian Bezdeka
no flags Details
tcpdump of new version (bug) (9.20 KB, application/octet-stream)
2018-08-31 20:00 UTC, Florian Bezdeka
no flags Details

Description Florian Bezdeka 2018-08-31 14:53:01 UTC
Description of problem:

I updated from openssh-clients-7.7p1 to 7.8p1-1.fc28 yesterday.
I'm no longer able to open ssh connections to other systems.
Especially connections to Debian systems and Fedora systems are not possible.
Connections to CentOS systems still work.

I use public key authentication.

Version-Release number of selected component (if applicable):
Working: openssh-clients-7.7p1
Broken: 7.8p1-1.fc28

How reproducible:
console: ssh <debian-system> => broken
console: ssh <centos-system> => OK


Actual results:
packet_write_wait: Connection to 192.168.58.10 port 22: Broken pipe

Expected results:
SSH session.

Additional info:
The TCP session is resetted by the destination system.
I attached two tcpdumps:
bug.trace: A tcpdump of the new package
working.trace: A tcpdump of the old package

Both dumps were created using "tcpdump -i any -w /output/path.trace"

Comment 1 Florian Bezdeka 2018-08-31 14:54:03 UTC
Created attachment 1480102 [details]
tcpdump of old version (working)

Comment 2 Florian Bezdeka 2018-08-31 14:55:25 UTC
Workarround:

sudo yum downgrade openssh-clients-7.7p1

This will downgrade openssh packages to the old version.

Comment 3 Florian Bezdeka 2018-08-31 20:00:05 UTC
Created attachment 1480157 [details]
tcpdump of new version (bug)

Comment 4 Ron Lovell 2018-09-01 00:26:27 UTC
I get the same symptom on Rawhide, but for different conditions.

I'm running my Linux nodes as VMware Workstation Player VMs hosted on Win7. On Win7, I have the Cygwin64 sshd listening on Win's interface to the restricted (no forwarding) VMNET1 virtual LAN. Normally all hosts can talk SSH to each other.

From my Rawhide host I get immediate broken pipe connecting to the Cygwin64 OpenSSH 7.8p1 server running on Win7. I have no problem connecting to Debian Sid (Unstable) nor to openSUSE Tumbleweed, nor through loopback to the same Rawhide host.

Using verbose options to ssh and comparing to working clients, it appears I'm getting past authentication and it should be about at the point where a PTY would be set up on the server end. We're not getting a type 99 packet that I see on working clients.

For what it's worth, I get exactly the same symptom with the Debin Sid OpenSSH 7.8p1 client, which was just upgraded yesterday. In the Debian case, I know ssh client 7.7p1 was working just the day before the upgrade, so it appears to be a 7.8p1 client issue.

On the other hand, my Cygwin64 OpenSSH has also recently upgraded to 7.8p1, and I have no problems connecting from Win7 to any of the Linux VMs.

Same broken pipe using scp(1) and sftp(1) as well as ssh(1) in both login mode and remote command execution mode.

Comment 5 Florian Bezdeka 2018-09-01 06:37:56 UTC
I downgraded openssh package and it worked again so I stopped investigating further. But when I read the comment from Ron, who mentioned VMware Workstation Player I did some tests on a native Fedora installation.

The problem does not occur (with same destination systems) when trying to connect from a native Fedora host.

On the system where I can reproduce the problem I'm running Win7 as hostsystem with VMWare Workstation. The ssh client system is connected to the destination network using NAT. Looks like both mentioned setups are nearly identical.

Unsure how to proceed. Close this bug report and file a new one at VMWare?

Btw: The destination system logs something like "pubkey for user xy accepted". So the problem occurs in the post-authentication phase.

Comment 6 Florian Bezdeka 2018-09-01 07:00:50 UTC
Workarround 2:

Add the following to the ssh client configuration (e.g. ~/.ssh/config)

Host *
    IPQoS lowdelay throughput 

It seems really to be a bug on the VMWare side, especially in the NAT implementation.

References:
https://groups.google.com/forum/#!msg/opensshunixdev/5FK67SCpPg8/Y_QsxNkQBAAJ
https://groups.google.com/forum/#!msg/opensshunixdev/uNd48nGOe7A/EgZPg2CvDgAJ

Comment 7 Ron Lovell 2018-09-02 00:52:14 UTC
Florian,

Thanks for posting the workaround to ~/.ssh/config. Restoring the pre-7.8p1 defaults does work around the issue on my 7.8p1 clients. And thanks for the references to ongoing discussions, which I read. It does look like a VMware-specific issue triggered by the DSCP defaults change in OpenSSH 7.8p1.

Best regards,
Ron

Comment 8 Jakub Jelen 2018-09-03 07:44:31 UTC
Sorry about that. This is indeed a change in OpenSSH, but the bug is in WMWare, which is not willing to handle the new (valid) IPQoS configuration. I will leave this bug open as a landing page for others who will probably hit the same issue, but I don't think it makes sense to revert this change from upstream since it works in the rest of the world.

Comment 9 Ron Lovell 2018-09-03 23:01:04 UTC
Agreed. Presumably the upstream change does some good. It will be good to have an active bug which supplies the workaround for those afflicted. Thanks.

Comment 10 Kurt Bechstein 2018-09-21 19:35:24 UTC
Thank you indeed for leaving this bug in place.  I definitely ran into this issue recently while running F28 as a VM in VMware Workstation 12.  The workaround indeed worked for me as well.  Thank you again!

Comment 11 Tomas 2018-09-24 10:02:24 UTC
Thanks alot for posting the workaround, I ran into this on F28 running in VMWare Workstation 14.

Comment 12 Tomas 2018-09-24 10:31:28 UTC
Posted bugreport about this to vmware community:
https://communities.vmware.com/message/2803219#2803219

Comment 13 Seesan 2018-11-08 16:02:14 UTC
Disabling IPQoS server side also seems to be a valid workaround. 

Having all the clients who connect to our servers change their local config was not ideal. So server side workaround it is.

1) Adding "IPQoS 0x00" to /etc/ssh/sshd_config
2) restart sshd

Comment 14 ldu 2018-11-27 08:41:51 UTC
Hi John,
Could you help create a internal VMware bug to track this?
as this is a vmware side issue.
Thanks

Comment 15 Christopher Meng 2019-03-10 20:31:46 UTC
The issue still persists in Fedora 29 + openssh-7.9p1-4.fc29 + VMWare 15.0.2.

Should someone bump the version here?

Comment 16 Ron Lovell 2019-03-11 04:08:45 UTC
I can confirm it's still an issue for Fedora Rawhide, Debian Sid (Unstable), and
openSUSE Tumbleweed (all using OpenSSH 7.1p1), when connecting from a VMware guest
to the VMware host system's interface (in my case the Win7 interface on vmnet1).
That's as far as my Linux guests can see in my config, but I believe several others
have reported a more general issue connecting from VMware guests to nodes outside
the LAN provided by VMware. I'm currently working around it with the client-side
IPQoS workaround. That's not an issue for me because I control all the affected
clients. For other users/owners it could be quite a burden.

Thank you Fedora team for pursuing what appears to be an area where VMware has fallen
behind on support of current DSCP configurations.

Comment 17 Yan Jin 2019-03-15 02:50:49 UTC
I have filed VMware internal bug (bug#2307798)

Comment 18 Ron Lovell 2019-03-15 10:11:59 UTC
Since this report seems to be useful for those seeking a workaround, it should be mentioned in this report that the referenced discussions indicate that the issue is with NAT configurations of the vmnet subnet, and bridged mode works. See the https://communities.vmware.com/message/2803219#2803219 discussion. I haven't actually tried that myself.

Comment 19 Ron Lovell 2019-03-15 10:49:12 UTC
(In reply to Ron Lovell from comment #16)
> I can confirm it's still an issue for Fedora Rawhide, Debian Sid (Unstable),
> and
> openSUSE Tumbleweed (all using OpenSSH 7.1p1), when connecting from a VMware
> guest
> to the VMware host system's interface (in my case the Win7 interface on
> vmnet1).
> That's as far as my Linux guests can see in my config, but I believe several
> others
> have reported a more general issue connecting from VMware guests to nodes
> outside
> the LAN provided by VMware. I'm currently working around it with the
> client-side
> IPQoS workaround. That's not an issue for me because I control all the
> affected
> clients. For other users/owners it could be quite a burden.
> 
> Thank you Fedora team for pursuing what appears to be an area where VMware
> has fallen
> behind on support of current DSCP configurations.

Correction to typo: I'm using the OpenSSH 7.9p1 client on all hosts, not 7.1p1.

Comment 20 Ben Cotton 2019-05-02 19:47:08 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Ben Cotton 2019-05-28 23:47:19 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.