Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Bad packet length error in SSH|
|Product:||[Fedora] Fedora||Reporter:||Phil Oester <bugzilla>|
|Component:||openssh||Assignee:||Tomas Mraz <tmraz>|
|Status:||CLOSED DUPLICATE||QA Contact:||Brian Brock <bbrock>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2005-08-17 04:03:51 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Phil Oester 2005-08-02 18:18:44 EDT
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6 Description of problem: SSH sessions (both interactive and rsync) regularly disconnect with error 'Bad packet length X' Sample error (using ssh -vvv): Received disconnect from x.x.x.x: 2: Bad packet length 1951400441. rsync: connection unexpectedly closed (481545707 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(420) rsync: connection unexpectedly closed (77800 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(420) Version-Release number of selected component (if applicable): openssh-4.1p1-3.1 How reproducible: Sometimes Steps to Reproduce: 1. Start an SSH transfer 2. wait around for it to crash...anywhere from 5 - 20 minutes 3. Additional info:
Comment 1 Tomas Mraz 2005-08-03 03:46:25 EDT
This can happen if you have some broken router in between the server and client machines. What exact versions of openssh are on the server and client? What if you try to for example downgrade them to the openssh version from FC3?
Comment 2 Phil Oester 2005-08-03 13:54:05 EDT
Version-Release number of selected component (if applicable): openssh-4.1p1-3.1 Could you elaborate on how my router could be broken? The two boxes are on an internal WAN, separated by a few cisco routers. Downgrading to FC3 openssh isn't really an option for me -- but what specific change between FC3 and FC4 openssh do you suspect?
Comment 3 Tomas Mraz 2005-08-03 14:18:45 EDT
So on both server and client machines there is the same openssh version. Both machines are the same arch (i386) and they are connected by a WAN with a few Ciscos on the route and nothing else. Then (if ssh localhost on both server and client works fine which I suppose) the problem must be in the WAN - most probably in the routers. The packets are somehow mangled and ssh as it is secure encrypted protocol easily detects it. This is my theory. I cannot reproduce your problem so it's only guessing. You will have to do some experiments if you want that problem resolved.
Comment 4 Phil Oester 2005-08-03 21:22:52 EDT
Yes, on the problematic transfers, both client and server have the same SSH version and same arch. However, I have experimented with some different versions (FC1 and FC4 boxes): openssh-3.6.1p2-19 -> openssh-4.1p1-3.1 = FAIL openssh-4.1p1-3.1 -> openssh-3.6.1p2-19 = OK openssh-4.1p1-3.1 -> openssh-4.1p1-3.1 = FAIL So the problem appears to be server based. I've collected a tcpdump which is interesting -- it is on an intermediate firewall between an east an west coast box. It shows the disconnect occurring, seemingly just for a single lost packet. <attachment in next email>
Comment 5 Phil Oester 2005-08-03 21:24:20 EDT
Created attachment 117427 [details] tcpdump sesion annotated tcpdump
Comment 6 Tomas Mraz 2005-08-04 04:24:37 EDT
Are the FC1 a FC4 server boxes on the same LAN so when you're connecting from the box in another LAN only connection to FC4 server fails? One dropped packet in the TCP connection cannot make the connection fail as TCP is a reliable connection protocol with resending lost data and so on. Of course there is a theoretical possibility of a bug in the TCP implementation in the kernel on the FC4 box (you could try to upgrade kernel to some newer one from testing updates) but I don't think it's probable. At least you could try to upgrade the openssh on the FC1 box although you'd have to rebuild it from SRPM and see if the failures start to appear there.
Comment 7 Phil Oester 2005-08-05 13:12:51 EDT
Yes, the FC1 and FC4 boxes are on the same LAN on the west coast, and the client box is on the east coast. Ran additional tests from the FC4 to the FC1 box just to verify, and it succeeds everytime. BTW - the tcpdump session is likely garbage, since it just couldn't capture all the packets as quickly as they were going through. It can be ignored. Conducted some more tests -- downloaded the FC1 openssh srpm, built it on FC4 box, and downgraded to it -- it still fails. The error is either 'Bad packet length' or 'Corrupted MAC on input' -- varies. (also built the FC3 SRPM with same results) So perhaps the problem is not in openssh, but in one of the libraries it depends upon? Zlib? crypto?
Comment 8 Phil Oester 2005-08-07 14:47:28 EDT
This seems similar to bug #110101, which was closed as NOTABUG long ago -- but does smell like a bug somewhere...
Comment 9 Phil Oester 2005-08-08 20:02:08 EDT
Some other things I've tried today... * downgraded FC4 box kernel to 2.6.10: FAILS * tried FC4 default kernel (I was using custom compiled kernel.org): FAILS Attempted to compile SSH rpm with static libraries, but couldn't get it to work properly -- my hope was to be able to upgrade/downgrade various libraries for testing, but that may be fruitless.
Comment 10 Tomas Mraz 2005-08-09 04:00:42 EDT
In my opinion the problem must lie somewhere in the network in between the FC4 server box and the client box. This problem is triggered by some change in TCP implementation in kernel 2.6.x against kernel 2.4.x in FC1. (window scaling and so on) For example see: http://lwn.net/Articles/92727/ which may or may not help. I honestly don't think that the problem lies in any other software/libraries of the FC4 box.
Comment 11 Tomas Mraz 2005-08-09 04:02:20 EDT
Also it could be a hardware bug (memory, motherboard...) of the FC4 box.
Comment 12 Phil Oester 2005-08-16 20:04:05 EDT
I have found that the solution in bug 149887 solves this for me, so it appears to be an e1000 driver issue instead of an ssh bug. I'll leave the duplicate marking to you...