Description of problem: openssh stops functioning after a few minutes logged into a remote host Version-Release number of selected component (if applicable): OpenSSH_4.5p1, OpenSSL 0.9.8b 04 May 2006 How reproducible: ssh into remote host, perform misc tasks, after about 3 - 5 minutes ssh session becomes non responsive. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I cannot reproduce this. Does it happen on every hosts you connect to? What is the remote host sshd version?
I've been seeing a similar thing connecting from Fedora 7, Fedora Rawhide, or CentOS 5 machines to a CentOS 4 server. For me, it isn't a matter of minutes after connecting -- it seems to be when something dumps a few K of text to the screen at once. Maybe something with terminal control codes, even (although it is ssh that's hung, not the terminal). John, does that match what you're seeing? Tried upgrading openssh on the server to RHEL5 version -- didn't help.
Matt, I've tried to reproduce and wasn't successfull. What application triggers that for you?
It's frustratingly hard to reproduce on purpose. It'll happen in mutt, in ls, in less, in joe, in vi. But doing the same thing again won't necessarily recreate. I need to do more diagnosing next time it happens -- at this point, I was looking for similar bugs that might offer clues.
At least it would be interesting to see full stack backtraces on both server and client.
I don't have a backtrace with debuginfo, but I do have an strace of the running process. client is responsive to keystrokes, with the appropriate-looking selects, reads, and writes. On the server, though, it's just doing this: $ sudo strace -p 25726 Process 25726 attached - interrupt to quit select(10, [3 6 9], [], NULL, NULL and the non-debuginfo (um, with RHEL5 sshd rebuilt on CentOS 4 to see if that would help; it doesn't): #0 0xb7fe87a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0042a64d in ___newselect_nocancel () from /lib/tls/libc.so.6 #2 0x0011f847 in main () from /usr/sbin/sshd Hmmm. That's interesting....
Are there any writes to the network socket on the client? Can you do tcpdump on server and client to see that there are really packets sent from the client and received on the server?
I'll do that next time I see the problem.
So, I should have done that before. Nothing from tcpdump on server or client when I type into the frozen window. However, there's some periodic (keepalive?) stuff going back and forth every (exactly) two minutes: 15:48:26.533029 IP server.bu.edu.ssh > client.bu.edu.40941: P 6143502:6144510(1008) ack 3046252168 win 3228 <nop,nop,timestamp 1418955725 84682970> 15:48:26.533926 IP client.bu.edu.40941 > server.bu.edu.ssh: . ack 1008 win 501 <nop,nop,timestamp 84888549 1418955725,nop,nop,sack 1 {0:1008}> 15:48:28.301874 IP client.bu.edu.40941 > server.bu.edu.ssh: P 1:49(48) ack 1008 win 501 <nop,nop,timestamp 84890318 1418955725> 15:50:26.506466 IP server.bu.edu.ssh > client.bu.edu.40941: P 0:1008(1008) ack 1 win 3228 <nop,nop,timestamp 1419075725 84682970> 15:50:26.506516 IP client.bu.edu.40941 > server.bu.edu.ssh: . ack 1008 win 501 <nop,nop,timestamp 85008521 1419075725,nop,nop,sack 1 {0:1008}> 15:50:28.303192 IP client.bu.edu.40941 > server.bu.edu.ssh: P 1:49(48) ack 1008 win 501 <nop,nop,timestamp 85010318 1419075725> Simultaneously, on the server side: 15:48:26.537106 IP server.bu.edu.ssh > client.bu.edu.40941: P 6143502:6144510(1008) ack 3046252168 win 3228 <nop,nop,timestamp 1418955725 84682970> 15:48:26.537590 IP client.bu.edu.40941 > server.bu.edu.ssh: . ack 1008 win 501 <nop,nop,timestamp 84888549 1418955725,nop,nop,sack sack 1 {0:1008} > 15:48:28.306478 IP client.bu.edu.40941 > server.bu.edu.ssh: P 1:49(48) ack 1008 win 501 <nop,nop,timestamp 84890318 1418955725> 15:50:26.510830 IP server.bu.edu.ssh > client.bu.edu.40941: P 0:1008(1008) ack 1 win 3228 <nop,nop,timestamp 1419075725 84682970> 15:50:26.511332 IP client.bu.edu.40941 > server.bu.edu.ssh: . ack 1008 win 501 <nop,nop,timestamp 85008521 1419075725,nop,nop,sack sack 1 {0:1008} > 15:50:28.308029 IP client.bu.edu.40941 > server.bu.edu.ssh: P 1:49(48) ack 1008 win 501 <nop,nop,timestamp 85010318 1419075725>
Just tried typing <enter>~<ctrl-z> to see what would happen. Got the "^Z [suspend ssh]" output, but didn't seem to actually get a prompt. But then I accidentally closed the window so I'm not 100% sure. Helpful, I know!
I think the problem is rather on the client side - can you try to upgrade to the openssh-4.7p1-1.fc8 from here: http://koji.fedoraproject.org/koji/buildinfo?buildID=17988
Yeah, after seeing the client not send out any packets I gotta agree with your suspicion. :) I'll try the new packages and see what happens.
No luck; just happened with openssh-4.7p1-1.fc8 (client).
(In reply to comment #13) > No luck; just happened with openssh-4.7p1-1.fc8 (client). Damn, so now you can try to bisect versions of the client from various old Fedoras to see when it broke. Because I think the problem wasn't there in the 3.9p1 version - at least I don't see any such reports on RHEL-4. But that won't be fun :(. Note that the old clients should be moreless usable on latest Fedoras. The servers not so much because of selinux changes.
Yeah, the problem definitely started for me when I upgraded my home system from a Fedora Core 4-based distribution, and for Paul Stauffer (who I added to the cc list) on upgrading to a RHEL5-based one. I'll see what more I can discover. Good times. :)
Actually, the system I'm seeing this problem on is Fedora 7. I have never seen this problem on any of several FC6 systems I've used regularly, so I'm guessing it's something that changed between 6 and 7.
I definitely see it on my BU Linux 5.0 ( = CentOS 5 / RHEL5) system too, with openssh-4.3p2-16.el5.centos.bu50.13. (BU changes are to the config file only, so it's unlikely that we've done anything that causes the problem. Plus my home system is currently unmodifed rawhide.) Fedora Core 6 had openssh-4.3p2-10.src.rpm, but then that got updated to openssh-4.3p2-19.fc6.src.rpm. So it seems a bit odd that that would work and RHEL5 not. Maybe the problem is in some supporting library. Wouldn't that be fun. Paul, can you back your home system back to 4.3p2-19.fc6?
Same problem as in #2 server CentOS 4 openssh-3.9p1-8.RHEL4.20.x86_64 clients from FC5, FC6 ,F7 no problem from FC2
I'm seeing what appears to be the same bug on a brand new F9-beta install - openssh-clients-4.7p1-9.fc9.i386.
...and downgrading all the way to openssh-4.5p1-6.fc7.i386.rpm and openssl-0.9.8b-15.fc7.i686.rpm doesn't help.
Moving to Rawhide as per comment #19. Because I think we all know this isn't getting fixed in Fedora 7. :)
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Do you still see the problem with the openssh-5.1p1 which is now in F9 updates?
I don't have any more CentOS4 / RHEL4 servers to test against. (Some of them I upgraded specifically to avoid this bug.) The problem definitely does not appear when connecting to a CentOS 5 or Fedora 9+ system. But I'm suspicious that the problem is still there.
Oh, weird. So, at my new job, we have a lot of machines running under VMware ESX 3.5. I still don't have a RHEL4 server, but the CentOS 5 (w/ latest patches) vm I am working on now exhibits the exact same problem when connecting with openssh-5.1p1-2.fc10.x86_64. Could be a red herring, but the symptoms are exactly the same (including similar network behavior when I watch with tcpdump). Even if the VMware issue I'm seeing now happens to be unrelated, I think this is serious enough that we want to *know* the problem is solved, not just hope that it happens to go away with a new release. This bug probably needs to get the attention of someone on the RHEL side of things before RHEL 6, because it's extremely likely that after that release you'll see a lot of this in shops with RHEL 4 and the new 6 release.
Ohhhhhh. Hmmm. For the connect-to-centos5-under-vmware problem, I can reproduce frequently *when iptables is enabled*. If I drop the local host firewall, there are no problems. I had tested with the firewall off on the server before but I don't think it had occured to me to test the client. I have an entirely stock iptables configuration: $ sudo service iptables status Table: filter Chain INPUT (policy ACCEPT) num target prot opt source destination 1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 3 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 4 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 5 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 6 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) num target prot opt source destination 1 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) num target prot opt source destination Can someone seeing this problem connecting to a stock RHEL4 host test if the issue goes away if you stop iptables on your client machine?
I am really curious what triggers it because it seems apparent, that the problem is not just some kind of version incompatibility. To me it rather seems like certain conditions on the network connection or the client or server machines trigger some kind of race condition between the server and client and the client gets into a deadlock condition.
Previously, it mostly triggered when I was paging through e-mail in mutt. This time, it came up when I was running system-config-*-tui commands, and I was able to pretty reliably reproduce it using that. (I can't guarantee that this latest example is really the same bug, though.) However, it wasn't limited to full-screen apps -- it'd sometimes happen when doing an ls. It seems to mostly happen when there's a bunch of data dumped to the screen all at once — although it never appeared to happen when doing scp.
It seems to me that this could be an iptables connection tracking issue. I caught it while logging all dropped packets while I was having problems with NFS. Although this seems to happen on its own at times, I can force the same result by restarting iptables on the server with some running ssh connections. Iptables then starts to drop seemingly random packets from what should be established connections. Sometimes it's only dropping a percentage of packets, making the connection seem slow (this was the problem I had with NFS), but often drops so many that the connection is effectively frozen. My workaround has been to remove the NEW requirement for ssh connections, and accept all tcp on 22. -A RH-Firewall-1-INPUT -m tcp -p tcp --dport 22 -j ACCEPT I haven't ever had ssh freeze after this change.
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
I used to see this problem a lot when configuring my firewall using shorewall. The problem was a router somewhere was making out of frame packets which by default get marked as INVALID, and dropped. The fix in this case was: echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal It's a long shot, but I thought it might be helpful to mention this.
*** Bug 453752 has been marked as a duplicate of this bug. ***
We just experienced a similar problem as Matt Miller (Comment #9 From Matthew Miller 2007-09-06 15:55:16 EDT). The only difference is we saw the problem performing fast/large SCP and HTTPS transfers. To correct the problem, we changed the "client" (the one doing the acks) setting for SACK (aka. tcp selective acknwledgements, aka. selective acks). SACK is an option in the TCP Header and is set when a connection uses it. On the "client" side, you could run the following command: echo 0 >/proc/sys/net/ipv4/tcp_sack On the network side, you would want to look for dropped packets at the firewall (maybe related to an unset SACK flag if you are using SACK), or if SACK is set in the network equipment you could disable it. I'm not sure what the developers of OpenSSH can do to fix this particular issue. The sessions appear to timeout on the "server" (the one NOT doing the acks) side, so maybe if that could be explored, it might provide some insight. Your mileage may vary. Good luck.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
(In reply to comment #34) > We just experienced a similar problem as Matt Miller (Comment #9 From Matthew > Miller 2007-09-06 15:55:16 EDT). The only difference is we saw the problem > performing fast/large SCP and HTTPS transfers. > > To correct the problem, we changed the "client" (the one doing the acks) > setting for SACK (aka. tcp selective acknwledgements, aka. selective acks). > SACK is an option in the TCP Header and is set when a connection uses it. > > On the "client" side, you could run the following command: > echo 0 >/proc/sys/net/ipv4/tcp_sack > > On the network side, you would want to look for dropped packets at the firewall > (maybe related to an unset SACK flag if you are using SACK), or if SACK is set > in the network equipment you could disable it. > > I'm not sure what the developers of OpenSSH can do to fix this particular > issue. The sessions appear to timeout on the "server" (the one NOT doing the > acks) side, so maybe if that could be explored, it might provide some insight. > > Your mileage may vary. Good luck. please try turn off the window scaling. "echo 0 >/proc/sys/net/ipv4/tcp_window_scaling"
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.