1427025 – Randomly hit "Timeout (12s) waiting for privilege escalation prompt" when installing with non-root user

Bug 1427025 - Randomly hit "Timeout (12s) waiting for privilege escalation prompt" when installing with non-root user

Summary: Randomly hit "Timeout (12s) waiting for privilege escalation prompt" when ins...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Scott Dodson
QA Contact:	Gan Huang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-27 06:37 UTC by Gan Huang
Modified:	2017-03-03 08:12 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-03-03 08:12:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Gan Huang 2017-02-27 06:37:23 UTC

Description of problem:
Randomly hit "Timeout (12s) waiting for privilege escalation prompt" when installing with non-root user

Version-Release number of selected component (if applicable):
ansible-2.2.1.0-2.el7.noarch
openshift-ansible-3.5.15-1.git.0.8d2a456.el7.noarch

Ansible Host:
# uname -r
3.10.0-514.2.2.el7.x86_64
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.3 (Maipo)

The hosts to be installed:
# atomic host status
State: idle
Deployments:
● rhel-atomic-host:rhel-atomic-host/7/x86_64/standard
       Version: 7.3.3 (2017-02-23 22:16:59)
        Commit: fbeed59bb47b14e32a6b28e13aaa1cad96e88188930a5bf880f949728b7f36ea
        OSName: rhel-atomic-host


How reproducible:
sometimes

Steps to Reproduce:
1.Trigger a installation.

#cat inventory_hosts

<--snip-->
[OSEv3:vars]
ansible_ssh_user=cloud-user
ansible_become=yes
<--snip-->

2.
3.

Actual results:
Installer might failed at different tasks:

1)##########################
TASK [os_firewall : Remove firewalld allow rules] ******************************

TASK [os_firewall : Ensure firewalld service is not enabled] *******************
ok: [openshift-136.lab.sjc.redhat.com] => {
    "changed": false, 
    "failed": false, 
    "failed_when_result": false
}

MSG:

Could not find the requested service firewalld: cannot mask

ok: [openshift-147.lab.sjc.redhat.com] => {
    "changed": false, 
    "failed": false, 
    "failed_when_result": false
}

MSG:

Could not find the requested service firewalld: cannot mask

fatal: [openshift-137.lab.sjc.redhat.com]: FAILED! => {
    "failed": true
}

MSG:

Timeout (12s) waiting for privilege escalation prompt: 


NO MORE HOSTS LEFT *************************************************************
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP *********************************************************************

2)##########################
TASK [etcd : Install etcd container service file] ******************************
task path: /usr/share/ansible/openshift-ansible/roles/etcd/tasks/main.yml:49
<--snip-->
<openshift-147.lab.sjc.redhat.com> ESTABLISH SSH CONNECTION FOR USER: cloud-user
<openshift-147.lab.sjc.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/home/slave4/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=cloud-user -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r openshift-147.lab.sjc.redhat.com '/bin/sh -c '"'"'chmod u+x /home/cloud-user/.ansible/tmp/ansible-tmp-1488164379.53-269179720745828/ /home/cloud-user/.ansible/tmp/ansible-tmp-1488164379.53-269179720745828/stat.py && sleep 0'"'"''
<openshift-147.lab.sjc.redhat.com> ESTABLISH SSH CONNECTION FOR USER: cloud-user
<openshift-147.lab.sjc.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/home/slave4/workspace/Launch-Environment-Flexy/private/config/keys/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=cloud-user -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt openshift-147.lab.sjc.redhat.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-fpiujtgkgmslktrpyfgryucyxyhvubnw; /usr/bin/python /home/cloud-user/.ansible/tmp/ansible-tmp-1488164379.53-269179720745828/stat.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''
fatal: [openshift-147.lab.sjc.redhat.com]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "dest": "/etc/systemd/system/etcd_container.service", 
            "src": "etcd.docker.service"
        }, 
        "module_name": "template"
    }
}

MSG:

Timeout (12s) waiting for privilege escalation prompt: 


NO MORE HOSTS LEFT *************************************************************
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry

3)#########################
TASK [openshift_excluder : Determine if docker packages are installed] *********
ok: [openshift-147.lab.sjc.redhat.com]
ok: [openshift-136.lab.sjc.redhat.com]
ok: [openshift-106.lab.sjc.redhat.com]
ok: [openshift-127.lab.sjc.redhat.com]
ok: [openshift-138.lab.sjc.redhat.com]
fatal: [openshift-137.lab.sjc.redhat.com]: FAILED! => {
    "failed": true
}

MSG:

Timeout (12s) waiting for privilege escalation prompt: 

Expected results:
No errors.

Additional info:
Similar issue: https://github.com/ansible/ansible/issues/14426

Comment 1 Scott Dodson 2017-02-27 13:39:34 UTC

Can you try the suggested workaround from that issue? Put this in /etc/ansible/ansible.cfg

[defaults]
timeout = 30

Comment 2 Gan Huang 2017-02-28 10:15:50 UTC

My OCP env was recreated, and I'm not able to reproduce this issue with the default ansible config. I'll give a try in following days to simulate the env in comment 0. Will close it if it still can't be reproduced. Lower the severity and priority.

Comment 3 Scott Dodson 2017-02-28 16:26:46 UTC

Cool, thanks for the feedback.

Comment 4 Gan Huang 2017-03-03 08:12:23 UTC

I can only reproduce the issue in the same env as comment 0.

After the suggestion in comment 1, the issue was gone. It seems to be a network issue.

Let's close it temporarily as I have no sufficient data.

Note You need to log in before you can comment on or make changes to this bug.