ssh-agent starts eating 100% if it gets bombarded by connections, and runs out of file descriptors to use. Looking at strace, it starts to cycle in a select() loop, where the listening AF_UNIX socket is reported active, which makes ssh-agent invoke accept() which will then fail with EMFILE. It will then immediately invoke select() again, and be in a busy loop from then on. I figure ssh-agent should enforce a limit on concurrent connections (that is much lower than RLIMIT_NOFILE) and quickly terminate further incoming connections when that limit is hit. Most internet software handles this that way, and I figure ssh-agent should do that too for incoming local clients. I noticed that while creating a ton of ssh connections to my local system in a tight loop, which uses the ssh keyring. (When ssh-agent is in this mode, and you start further ssh instances with the & suffix in a shell (to make it background), then they will also enter a busy loop handling of SIGTTOU. I don't have further details about this, though, was too lazy to figure out what is really going on there).
That was supposed to say "starts eating 100% CPU"...
I was trying to burn my virtual box with a lot of requests to ssh-agent but only with partial success. But the behavior you explain sounds possible. My test case: eval `ulimit -n 10; ssh-agent` ssh-add rsa cat rsa.pub >> .ssh/authorized_keys for i in `seq 1 128`; do ssh localhost id & done ls /proc/$SSH_AGENT_PID/fd/ | wc -w and I am left with few cycling ssh processes in some cases, but not with the ssh-agent live-locked. This sounds like reasonable feature for upstream. I will check that probably next week what can we do here.
> Most internet software handles this that way. SSHD as an internet software handles it too, but ssh-agent is probably not considered as an internet software since it is not facing internet directly. Parallel tests showed that it is quite hard to reach even the 10 FDs limit with single pair of VMs. Though, handling this case better way would make sense. The exact way of fixing this is probably up to the upstream developers. Opened upstream bug [1]. [1] https://bugzilla.mindrot.org/show_bug.cgi?id=2576
openssh-7.2p2-11.fc24 selinux-policy-3.13.1-191.8.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-99191c4aab
Reverting to NEW state. BZ was switched to MODIFIED due to wrong bodhi update.
This should be fixed in rawhide (upstream openssh-7.5 version).