The ssh client in openssh 2.9.1p1-3gss crashes repeatedly when connecting to "Remote protocol version 1.99, remote software version OpenSSH_2.9p1" (openssh-server-2.9p1-1 package). Downgrading to openssh-2.9p1-2 appears to make the problem go away (if it recurs, I'll let you know). The crashes may be related to window resizes, although I'm not certain about that. Either that, or they're related to when a large burst of data comes all at once.
I found the problem. There's code in clientloop.c that's zeroing out an fd_set structure using memset and assuming that the fd_set has one byte per file descriptor rather than one *bit* per file descriptor. In fact, it should just use FD_ZERO to zero out the fd_set. I will attach a patch (which you will of course forward back to the maintainers of openssh :-).
Created attachment 23349 [details] Fix memory overrun in clientloop.c
The path I submitted last night was wrong. How was I to know that the things openssh calls "fd_set"s aren't really "fd_set"s, but are actually instead arrays of dynamic length? :-) I'll attach a new patch.
Created attachment 23388 [details] Another memory overrun fix
I bet this is a bug introduced by gss patches.
The code I patched is clearly buggy, and my patch applies cleanly even without the GSS-API patch applied first. I explained specifically what the bug is, and if you read my explanation and patch, it is clear that the code is wrong and needs to be fixed. I'm *sure* that if I used a version of SSH without the GSS-API patch and without my fix to this bug, SSH would continue to crash on me. Since I built a new version of ssh with my patch, it hasn't crashed on me once, even though I've been running it the entire time with ElectricFence.
Uhh, sorry for my uneducated guess. :-) One just has to wonder why this hasn't been happening ever to anyone else. gss patches, in one way or another, could have been a common term... I can post the patch upstream, see what they think..
I suspect that other people *have* run into this; they probably just chalked it up to flakies and restarted ssh, as I did for a long time before I finally decided to track down the problem. It's also possible that the version of ssh in which this bug was introduced is not yet widely deployed. It's also possible that the particular usage paradigm which tickles the bug is not all that common. I frequently do port forwarding, X forwarding, agent forwarding, etc. I suspect you need to be using a good number of file descriptors before this bug kicks in.
The patch looks right to me (the old behavior cleared one byte for each FD, when the fd_set being packed requires clearing one bit); it will be integrated into 2.9p2-7 and later. Thanks!