Bug 1242682 - connections with controlpersist/controlmaster reliably hang forever
Summary: connections with controlpersist/controlmaster reliably hang forever
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: openssh
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-14 00:01 UTC by Eric Paris
Modified: 2015-07-29 07:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-29 07:38:08 UTC


Attachments (Terms of Use)
journalctl -u sshd (21.69 KB, text/plain)
2015-07-14 14:33 UTC, Eric Paris
no flags Details

Description Eric Paris 2015-07-14 00:01:29 UTC
my end: openssh-6.9p1-1.fc22.x86_64

other end is CentOS7, although the same behavior was seen with f22 on the other end. openssh-6.6.1p1-12.el7_1.x86_64


I have a 99.99999% reproducer (I think it might have worked once?). I'm running an ansible which makes extensive use of ssh and in particular controlmaster=auto

Every time the playbook runs, ssh hangs on the exact same play. When it is hung it's easy to show it's hung/repeat. Simply executing the following will hang forever:

ssh -o ControlMaster=auto -o ControlPath="/home/eparis/.ansible/cp/ansible-ssh-%h-%p-%r" vagrant@192.168.121.83

if I look in ps | grep ssh I see

root      3285     1  0 Jul11 ?        00:00:00 /usr/sbin/sshd -D
eparis   12629     1  0 19:27 ?        00:00:01 ssh: /home/eparis/.ansible/cp/ansible-ssh-192.168.121.83-22-vagrant [mux]
root     17156  3285  0 17:54 ?        00:00:00 sshd: eparis [priv]
eparis   17161 17156  0 17:54 ?        00:00:00 sshd: eparis@pts/0
eparis   18777     1  0 19:42 pts/0    00:00:00 ssh -C -tt -q -o UserKnownHostsFile=/dev/null -o ControlMaster=auto -o ControlPersist=10s -o ControlPath="/home/eparis/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o Port=22 -o IdentityFile="/home/eparis/.vagrant.d/insecure_private_key" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 192.168.121.83 $SHELL -c 'sudo -k && sudo -H -S -p "[SNIP HUGE LONG LINE]"

Reproducer should be about as follow:

yum install -y ansible vagrant-libvirt
git clone https://github.com/eparis/kubernetes.git
cd kubernetes
git reset --hard origin/total-ansible
edit group_vars/all.yml and change
  source_type: localBuild
  source_type: packageManager
cd contrib/ansible/vagrant
export VAGRANT_DEFAULT_PROVIDER=libvirt
vagrant up --no-provision
vagrant provision

wait until it hangs on the `TASK: node | Get the node token values` takes 5ish minutes or so, I usually start it and get bored and walk away.

I'm very happy to work with you to get the reproducer to reproduce the indefinite hang. Internal to Red Hat I'd be willing to get you a box where I set it up.

Comment 1 Eric Paris 2015-07-14 00:22:03 UTC
I have just downgraded to
openssh-6.8p1-8.fc22
and then to
openssh-6.7p1-11.fc22
and both could be reliably locked up

I'm now testing 
6.6.1p1-11.1.fc22

Comment 2 Jakub Jelen 2015-07-14 06:09:41 UTC
We have got similar report against RHEL7: Bug #1240613 (internal) and I wasn't yet fixing it in Fedora. But the behaviour is little bit different then you are trying to describe. Maybe there is another race condition if you use this special case. I will have a look on this, but from the start, it would be helpful to see verbose logs from server or at least from the client the have some idea where it hangs.

Comment 3 Eric Paris 2015-07-14 14:33:01 UTC
Created attachment 1051873 [details]
journalctl -u sshd

I updated the sshd server with a scratch build from the above mentioned bug and the problem went away.

The attached file is the journalctl -u sshd logs after I added LogLevel DEBUG3 and restarted sshd but BEFORE I updated to the scratch build.

The log after updating is not attached, but I have it as well.

Comment 4 Eric Paris 2015-07-14 17:02:04 UTC
Updating F22 server with http://koji.fedoraproject.org/koji/taskinfo?taskID=10354323 also resolves the problem.

It appears as though you have fixed it!  Yay!

Comment 5 Jakub Jelen 2015-07-15 07:47:07 UTC
Thank you for feedback. Build is on its way.

Comment 6 Fedora Update System 2015-07-16 06:16:52 UTC
openssh-6.9p1-2.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/openssh-6.9p1-2.fc22

Comment 7 Fedora Update System 2015-07-21 08:10:04 UTC
openssh-6.9p1-2.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.