Bug 164942
Summary: | Bad packet length error in SSH | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Phil Oester <bugzilla> | ||||
Component: | openssh | Assignee: | Tomas Mraz <tmraz> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-08-17 08:03:51 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Phil Oester
2005-08-02 22:18:44 UTC
This can happen if you have some broken router in between the server and client machines. What exact versions of openssh are on the server and client? What if you try to for example downgrade them to the openssh version from FC3? Version-Release number of selected component (if applicable): openssh-4.1p1-3.1 Could you elaborate on how my router could be broken? The two boxes are on an internal WAN, separated by a few cisco routers. Downgrading to FC3 openssh isn't really an option for me -- but what specific change between FC3 and FC4 openssh do you suspect? So on both server and client machines there is the same openssh version. Both machines are the same arch (i386) and they are connected by a WAN with a few Ciscos on the route and nothing else. Then (if ssh localhost on both server and client works fine which I suppose) the problem must be in the WAN - most probably in the routers. The packets are somehow mangled and ssh as it is secure encrypted protocol easily detects it. This is my theory. I cannot reproduce your problem so it's only guessing. You will have to do some experiments if you want that problem resolved. Yes, on the problematic transfers, both client and server have the same SSH version and same arch. However, I have experimented with some different versions (FC1 and FC4 boxes): openssh-3.6.1p2-19 -> openssh-4.1p1-3.1 = FAIL openssh-4.1p1-3.1 -> openssh-3.6.1p2-19 = OK openssh-4.1p1-3.1 -> openssh-4.1p1-3.1 = FAIL So the problem appears to be server based. I've collected a tcpdump which is interesting -- it is on an intermediate firewall between an east an west coast box. It shows the disconnect occurring, seemingly just for a single lost packet. <attachment in next email> Created attachment 117427 [details]
tcpdump sesion
annotated tcpdump
Are the FC1 a FC4 server boxes on the same LAN so when you're connecting from the box in another LAN only connection to FC4 server fails? One dropped packet in the TCP connection cannot make the connection fail as TCP is a reliable connection protocol with resending lost data and so on. Of course there is a theoretical possibility of a bug in the TCP implementation in the kernel on the FC4 box (you could try to upgrade kernel to some newer one from testing updates) but I don't think it's probable. At least you could try to upgrade the openssh on the FC1 box although you'd have to rebuild it from SRPM and see if the failures start to appear there. Yes, the FC1 and FC4 boxes are on the same LAN on the west coast, and the client box is on the east coast. Ran additional tests from the FC4 to the FC1 box just to verify, and it succeeds everytime. BTW - the tcpdump session is likely garbage, since it just couldn't capture all the packets as quickly as they were going through. It can be ignored. Conducted some more tests -- downloaded the FC1 openssh srpm, built it on FC4 box, and downgraded to it -- it still fails. The error is either 'Bad packet length' or 'Corrupted MAC on input' -- varies. (also built the FC3 SRPM with same results) So perhaps the problem is not in openssh, but in one of the libraries it depends upon? Zlib? crypto? This seems similar to bug #110101, which was closed as NOTABUG long ago -- but does smell like a bug somewhere... Some other things I've tried today... * downgraded FC4 box kernel to 2.6.10: FAILS * tried FC4 default kernel (I was using custom compiled kernel.org): FAILS Attempted to compile SSH rpm with static libraries, but couldn't get it to work properly -- my hope was to be able to upgrade/downgrade various libraries for testing, but that may be fruitless. In my opinion the problem must lie somewhere in the network in between the FC4 server box and the client box. This problem is triggered by some change in TCP implementation in kernel 2.6.x against kernel 2.4.x in FC1. (window scaling and so on) For example see: http://lwn.net/Articles/92727/ which may or may not help. I honestly don't think that the problem lies in any other software/libraries of the FC4 box. Also it could be a hardware bug (memory, motherboard...) of the FC4 box. I have found that the solution in bug 149887 solves this for me, so it appears to be an e1000 driver issue instead of an ssh bug. I'll leave the duplicate marking to you... *** This bug has been marked as a duplicate of 149887 *** |