Bug 1333105 - ssh-agent enters busy loop when running out of fds
Summary: ssh-agent enters busy loop when running out of fds
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: openssh
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-04 16:28 UTC by Lennart Poettering
Modified: 2017-03-21 14:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-21 14:28:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenSSH Project 2576 0 None None None 2016-05-30 11:38:14 UTC

Description Lennart Poettering 2016-05-04 16:28:09 UTC
ssh-agent starts eating 100% if it gets bombarded by connections, and runs out of file descriptors to use. Looking at strace, it starts to cycle in a select() loop, where the listening AF_UNIX socket is reported active, which makes ssh-agent invoke accept() which will then fail with EMFILE. It will then immediately invoke select() again, and be in a busy loop from then on.

I figure ssh-agent should enforce a limit on concurrent connections (that is much lower than RLIMIT_NOFILE) and quickly terminate further incoming connections when that limit is hit. Most internet software handles this that way, and I figure ssh-agent should do that too for incoming local clients.

I noticed that while creating a ton of ssh connections to my local system in a tight loop, which uses the ssh keyring.

(When ssh-agent is in this mode, and you start further ssh instances with the & suffix in a shell (to make it background), then they will also enter a busy loop handling of SIGTTOU. I don't have further details about this, though, was too lazy to figure out what is really going on there).

Comment 1 Lennart Poettering 2016-05-04 16:29:13 UTC
That was supposed to say "starts eating 100% CPU"...

Comment 2 Jakub Jelen 2016-05-26 15:01:26 UTC
I was trying to burn my virtual box with a lot of requests to ssh-agent but only with partial success. But the behavior you explain sounds possible.

My test case:

  eval `ulimit -n 10; ssh-agent`
  ssh-add rsa
  cat rsa.pub >> .ssh/authorized_keys
  for i in `seq 1 128`; do ssh localhost id & done
  ls /proc/$SSH_AGENT_PID/fd/ | wc -w

and I am left with few cycling ssh processes in some cases, but not with the ssh-agent live-locked.

This sounds like reasonable feature for upstream. I will check that probably next week what can we do here.

Comment 3 Jakub Jelen 2016-05-30 11:38:15 UTC
> Most internet software handles this that way.

SSHD as an internet software handles it too, but ssh-agent is probably not considered as an internet software since it is not facing internet directly.

Parallel tests showed that it is quite hard to reach even the 10 FDs limit with single pair of VMs. Though, handling this case better way would make sense.

The exact way of fixing this is probably up to the upstream developers. Opened upstream bug [1].

[1] https://bugzilla.mindrot.org/show_bug.cgi?id=2576

Comment 4 Fedora Update System 2016-07-27 10:40:16 UTC
openssh-7.2p2-11.fc24 selinux-policy-3.13.1-191.8.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-99191c4aab

Comment 5 Lukas Vrabec 2016-07-27 10:52:09 UTC
Reverting to NEW state. BZ was switched to MODIFIED due to wrong bodhi update.

Comment 6 Jakub Jelen 2017-03-21 14:28:59 UTC
This should be fixed in rawhide (upstream openssh-7.5 version).


Note You need to log in before you can comment on or make changes to this bug.