my end: openssh-6.9p1-1.fc22.x86_64
other end is CentOS7, although the same behavior was seen with f22 on the other end. openssh-6.6.1p1-12.el7_1.x86_64
I have a 99.99999% reproducer (I think it might have worked once?). I'm running an ansible which makes extensive use of ssh and in particular controlmaster=auto
Every time the playbook runs, ssh hangs on the exact same play. When it is hung it's easy to show it's hung/repeat. Simply executing the following will hang forever:
ssh -o ControlMaster=auto -o ControlPath="/home/eparis/.ansible/cp/ansible-ssh-%h-%p-%r" email@example.com
if I look in ps | grep ssh I see
root 3285 1 0 Jul11 ? 00:00:00 /usr/sbin/sshd -D
eparis 12629 1 0 19:27 ? 00:00:01 ssh: /home/eparis/.ansible/cp/ansible-ssh-192.168.121.83-22-vagrant [mux]
root 17156 3285 0 17:54 ? 00:00:00 sshd: eparis [priv]
eparis 17161 17156 0 17:54 ? 00:00:00 sshd: eparis@pts/0
eparis 18777 1 0 19:42 pts/0 00:00:00 ssh -C -tt -q -o UserKnownHostsFile=/dev/null -o ControlMaster=auto -o ControlPersist=10s -o ControlPath="/home/eparis/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o Port=22 -o IdentityFile="/home/eparis/.vagrant.d/insecure_private_key" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=10 192.168.121.83 $SHELL -c 'sudo -k && sudo -H -S -p "[SNIP HUGE LONG LINE]"
Reproducer should be about as follow:
yum install -y ansible vagrant-libvirt
git clone https://github.com/eparis/kubernetes.git
git reset --hard origin/total-ansible
edit group_vars/all.yml and change
vagrant up --no-provision
wait until it hangs on the `TASK: node | Get the node token values` takes 5ish minutes or so, I usually start it and get bored and walk away.
I'm very happy to work with you to get the reproducer to reproduce the indefinite hang. Internal to Red Hat I'd be willing to get you a box where I set it up.
I have just downgraded to
and then to
and both could be reliably locked up
I'm now testing
We have got similar report against RHEL7: Bug #1240613 (internal) and I wasn't yet fixing it in Fedora. But the behaviour is little bit different then you are trying to describe. Maybe there is another race condition if you use this special case. I will have a look on this, but from the start, it would be helpful to see verbose logs from server or at least from the client the have some idea where it hangs.
Created attachment 1051873 [details]
journalctl -u sshd
I updated the sshd server with a scratch build from the above mentioned bug and the problem went away.
The attached file is the journalctl -u sshd logs after I added LogLevel DEBUG3 and restarted sshd but BEFORE I updated to the scratch build.
The log after updating is not attached, but I have it as well.
Updating F22 server with http://koji.fedoraproject.org/koji/taskinfo?taskID=10354323 also resolves the problem.
It appears as though you have fixed it! Yay!
Thank you for feedback. Build is on its way.
openssh-6.9p1-2.fc22 has been submitted as an update for Fedora 22.
openssh-6.9p1-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.