Description of problem: After upgrading my workstation from F26 to F27 I'm no longer able to download files using scp, rsync, ... due to SSH connecting breaking with the following error: ssh_dispatch_run_fatal: Connection to <server-ip> port 22: message authentication code incorrect Version-Release number of selected component (if applicable): openssh-clients-7.6p1-3.fc27.x86_64 How reproducible: Always. Steps to Reproduce: - scp <remote-server>:<path-to-a-larger-file> ./ or: - rsync --partial --progress <remote-server>:<path-to-a-larger-file> ./ Actual results: Download breaks at random points, for example: [tadej@toronto production-dbs]$ scp <remote-server>:/home/genialis/genialis_base_dump-20180123-043002.gz ./ genialis_base_dump-20180123-043002.gz 0% 0 0.0KB/s --:-- ETA ssh_dispatch_run_fatal: Connection to <server-ip> port 22: message authentication code incorrect lost connection [tadej@toronto production-dbs]$ scp <remote-server>:/home/genialis/genialis_base_dump-20180123-043002.gz ./ genialis_base_dump-20180123-043002.gz 4% 4736KB 1.3MB/s 01:18 ETA ssh_dispatch_run_fatal: Connection to <server-ip> port 22: message authentication code incorrect lost connection Expected results: Download would complete normally. Additional info: I've tried downloading from a number of different remote servers, all of them running the latest versions of CentOS or RHEL 7.4 with openssh-server-7.4p1-13.el7_4.x86_64. Same errors occurred with all of them. I've also tried downloading from an old Debian server. There, I get "Corrupted MAC on input." error before the ssh_dispatch_run_fatal error: [tadej@toronto production-dbs]$ scp <remote-server>:/home/genialis/genialis_base_dump-20180123-043002.gz ./ genialis_base_dump-20180123-043002.gz 2% 8144KB 8.0MB/s 00:35 ETA Corrupted MAC on input. ssh_dispatch_run_fatal: Connection to <server-ip> port 22: message authentication code incorrect lost connection If you need more assistance in debugging the issue, I'm happy to help.
I did not notice this with Fedora 27 using every day, but yes, I am not using it to transfer very large amounts of data. First of all, seeing the debug log (with -vvv arguments to scp for example) should give us some idea what is going on. Second thing, I would, check if the old version of OpenSSH still works (either by downgrading to the older F27 packages or to F26 version). This error message looks like something is inspecting the packets and is modifying them on the network. Do you see these problems even if you try to transfer files to "localhost"?
Thanks for such a quick response! (In reply to Jakub Jelen from comment #1) > > First of all, seeing the debug log (with -vvv arguments to scp for example) > should give us some idea what is going on. No problem, I'll attach scp's debug log. > Second thing, I would, check if > the old version of OpenSSH still works (either by downgrading to the older > F27 packages or to F26 version). In terms of bisection, I went straight to F26's latest version: openssh-clients.x86_64 7.5p1-4.fc26 I was able to reproduce the problem there also. I'll attach the output of two runs with F26's openssh, one for a successful download and one for an unsuccessful download. > This error message looks like something is inspecting the packets and is > modifying them on the network. Do you see these problems even if you try to > transfer files to "localhost"? I couldn't reproduce the issue when attempting to transfer a 2GB file through SSH server on the localhost 5 times. I have a secondary machine that still runs F26 and I could connect it in the same way as I have the main F27 machine. Would that be some useful info if I try the transfers there?
Created attachment 1384931 [details] Debug log of failed scp transfer with F27 openssh
Created attachment 1384933 [details] Debug log of failed scp transfer with F26 openssh
Created attachment 1384934 [details] Debug log of successful scp transfer with F26 openssh
(In reply to Tadej Janež from comment #2) > > I was able to reproduce the problem there also. I'll attach the output of > two runs with F26's openssh, one for a successful download and one for an > unsuccessful download. FWIW, I was also able to successfully download the file with F27's openssh.
Do I understand it right, that downgraded Fedora 26 package on the Fedora 27 fails the same way as the new one, but the Fedora 26 box on the same network works? In that case, it sounds like a bug in kernel, network or some hardware issue. Is it normal LAN, Wi-Fi, or something special? If so, there is no way to fix it in openssh. The debug logs do not show anything wrong. I saw similar issues, which ended up as hardware errors [1], but that is hard to verify unless you try to replace the network card, or try different. [1] https://unix.stackexchange.com/a/288550/121504
(In reply to Jakub Jelen from comment #7) > Do I understand it right, that downgraded Fedora 26 package on the Fedora 27 > fails the same way as the new one, but the Fedora 26 box on the same network > works? Yes, that is the case. > In that case, it sounds like a bug in kernel, network or some hardware > issue. Is it normal LAN, Wi-Fi, or something special? I was using ordinary LAN of the Dell ThunderBolt TB16 docking station connected to Dell XPS 15 9560 laptop. > If so, there is no way to fix it in openssh. The debug logs do not show > anything wrong. > > I saw similar issues, which ended up as hardware errors [1], but that is > hard to verify unless you try to replace the network card, or try different. > > [1] https://unix.stackexchange.com/a/288550/121504 You are right. If I downloaded the files through laptop's WiFi or another Ethernet device connected through USB, things work ok. So, I have to debug further to see if this is a bug in kernel or some hardware issue. Thanks for your help!
I confirm I am facing the same issue with a Precision 5520, when docked on a TB16 docking. dnf list installed | grep openssh openssh.x86_64 7.6p1-5.fc27 @updates openssh-askpass.x86_64 7.6p1-5.fc27 @updates openssh-clients.x86_64 7.6p1-5.fc27 @updates openssh-server.x86_64 7.6p1-5.fc27 @updates The problem goes away when I choose a different NIC (aka on board wifi). Occasionally, I see from (what I think is) the TB16 docking station this on dmesg: Feb 15 11:52:09 slartibartfast3 kernel: pcieport 0000:00:1d.6: [12] Replay Timer Timeout Feb 15 11:52:09 slartibartfast3 kernel: pcieport 0000:00:1d.6: device [8086:a11e] error status/mask=00001000/00002000 Feb 15 11:52:09 slartibartfast3 kernel: pcieport 0000:00:1d.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00ee(Transmitter ID) might be relevant, might not. The issue also has become worse since I upgraded to kernel 4.14.18-300.fc27.x86_64, not good. My Thunderbolt config below, from lspci -v: 06:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 126 Bus: primary=06, secondary=07, subordinate=3e, sec-latency=0 I/O behind bridge: None Memory behind bridge: d4000000-ea0fffff [size=353M] Prefetchable memory behind bridge: 0000000090000000-00000000b1ffffff [size=544M] Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 07:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 127 Bus: primary=07, secondary=08, subordinate=08, sec-latency=0 I/O behind bridge: None Memory behind bridge: ea000000-ea0fffff [size=1M] Prefetchable memory behind bridge: None Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 07:01.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 128 Bus: primary=07, secondary=09, subordinate=3d, sec-latency=0 I/O behind bridge: None Memory behind bridge: d4000000-e9efffff [size=351M] Prefetchable memory behind bridge: 0000000090000000-00000000b1ffffff [size=544M] Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 07:02.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 129 Bus: primary=07, secondary=3e, subordinate=3e, sec-latency=0 I/O behind bridge: None Memory behind bridge: e9f00000-e9ffffff [size=1M] Prefetchable memory behind bridge: None Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 08:00.0 System peripheral: Intel Corporation DSL6340 Thunderbolt 3 NHI [Alpine Ridge 2C 2015] Subsystem: Device 2222:1111 Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at ea000000 (32-bit, non-prefetchable) [size=256K] Memory at ea040000 (32-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: thunderbolt Kernel modules: thunderbolt 09:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 130 Bus: primary=09, secondary=0a, subordinate=3d, sec-latency=0 I/O behind bridge: 00002000-00002fff [size=4K] Memory behind bridge: d4000000-e9efffff [size=351M] Prefetchable memory behind bridge: 0000000090000000-00000000b1ffffff [size=544M] Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 0a:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 131 Bus: primary=0a, secondary=0b, subordinate=0b, sec-latency=0 I/O behind bridge: None Memory behind bridge: None Prefetchable memory behind bridge: None Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 0a:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 132 Bus: primary=0a, secondary=0c, subordinate=3d, sec-latency=0 I/O behind bridge: 00002000-00002fff [size=4K] Memory behind bridge: d4000000-e9efffff [size=351M] Prefetchable memory behind bridge: 0000000090000000-00000000b1ffffff [size=544M] Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 0c:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 133 Bus: primary=0c, secondary=0d, subordinate=3d, sec-latency=0 I/O behind bridge: 00002000-00002fff [size=4K] Memory behind bridge: d4000000-e9efffff [size=351M] Prefetchable memory behind bridge: 0000000090000000-00000000b1ffffff [size=544M] Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 0d:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 134 Bus: primary=0d, secondary=0e, subordinate=0e, sec-latency=0 I/O behind bridge: None Memory behind bridge: d4000000-d40fffff [size=1M] Prefetchable memory behind bridge: None Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp 0d:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 135 Bus: primary=0d, secondary=0f, subordinate=3d, sec-latency=0 I/O behind bridge: 00002000-00002fff [size=4K] Memory behind bridge: d4100000-e9efffff [size=350M] Prefetchable memory behind bridge: 0000000090000000-00000000b1ffffff [size=544M] Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp
That is certainly not an OpenSSH bug. I am moving it to the kernel, which is probably responsible for the hardware support. Hopefully, they will be able to figure out more.
This sounds like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1460789, can you see if this issue is still present the 4.15 kernel? It should be in updates-testing at the moment. Thanks!
I cannot personally move to the 4.15 kernel at the moment, as I am running bumblebee on that system and would like to do more testing before I do so (one thing is to mess up thunderbolt wired networking and another to do this *and* mess up my cuda/nvidia setup :-) ) . What I can confirm is that the workaround from 1460789 does work, so chances are it's the same bug because doing a: ethtool --offload $DEVNAME rx off does indeed work and I am able to run with the 4.14.18-300.fc27.x86_64 kernel with the wired interface. Good for people to verify this works for them and if it does, you folks please patch the 4.14.x kernels before you push the 4.15 into production. People that run complex setups known (such as bumblebee) will feel safer and thank you for this IMHO. Cheers, GM
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.17.7-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.