Bug 1333105

Summary: ssh-agent enters busy loop when running out of fds
Product: [Fedora] Fedora Reporter: Lennart Poettering <lpoetter>
Component: opensshAssignee: Jakub Jelen <jjelen>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: jjelen, mattias.ellert, mgrepl, plautrba, tmraz
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-21 14:28:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lennart Poettering 2016-05-04 16:28:09 UTC
ssh-agent starts eating 100% if it gets bombarded by connections, and runs out of file descriptors to use. Looking at strace, it starts to cycle in a select() loop, where the listening AF_UNIX socket is reported active, which makes ssh-agent invoke accept() which will then fail with EMFILE. It will then immediately invoke select() again, and be in a busy loop from then on.

I figure ssh-agent should enforce a limit on concurrent connections (that is much lower than RLIMIT_NOFILE) and quickly terminate further incoming connections when that limit is hit. Most internet software handles this that way, and I figure ssh-agent should do that too for incoming local clients.

I noticed that while creating a ton of ssh connections to my local system in a tight loop, which uses the ssh keyring.

(When ssh-agent is in this mode, and you start further ssh instances with the & suffix in a shell (to make it background), then they will also enter a busy loop handling of SIGTTOU. I don't have further details about this, though, was too lazy to figure out what is really going on there).

Comment 1 Lennart Poettering 2016-05-04 16:29:13 UTC
That was supposed to say "starts eating 100% CPU"...

Comment 2 Jakub Jelen 2016-05-26 15:01:26 UTC
I was trying to burn my virtual box with a lot of requests to ssh-agent but only with partial success. But the behavior you explain sounds possible.

My test case:

  eval `ulimit -n 10; ssh-agent`
  ssh-add rsa
  cat rsa.pub >> .ssh/authorized_keys
  for i in `seq 1 128`; do ssh localhost id & done
  ls /proc/$SSH_AGENT_PID/fd/ | wc -w

and I am left with few cycling ssh processes in some cases, but not with the ssh-agent live-locked.

This sounds like reasonable feature for upstream. I will check that probably next week what can we do here.

Comment 3 Jakub Jelen 2016-05-30 11:38:15 UTC
> Most internet software handles this that way.

SSHD as an internet software handles it too, but ssh-agent is probably not considered as an internet software since it is not facing internet directly.

Parallel tests showed that it is quite hard to reach even the 10 FDs limit with single pair of VMs. Though, handling this case better way would make sense.

The exact way of fixing this is probably up to the upstream developers. Opened upstream bug [1].

[1] https://bugzilla.mindrot.org/show_bug.cgi?id=2576

Comment 4 Fedora Update System 2016-07-27 10:40:16 UTC
openssh-7.2p2-11.fc24 selinux-policy-3.13.1-191.8.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-99191c4aab

Comment 5 Lukas Vrabec 2016-07-27 10:52:09 UTC
Reverting to NEW state. BZ was switched to MODIFIED due to wrong bodhi update.

Comment 6 Jakub Jelen 2017-03-21 14:28:59 UTC
This should be fixed in rawhide (upstream openssh-7.5 version).