Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1840905

Summary: [OVN][Migration] Timeout in reduce-dhcp-renewal-time.yml trying to connect to controller nodes using ssh
Product: Red Hat OpenStack Reporter: Roman Safronov <rsafrono>
Component: python-networking-ovnAssignee: Roman Safronov <rsafrono>
Status: CLOSED CURRENTRELEASE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.0 (Train)CC: apevec, jlibosva, lhh, majopela, scohen, tfreger
Target Milestone: ---Keywords: AutomationBlocker, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1845886 (view as bug list) Environment:
Last Closed: 2020-08-10 12:49:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1845886    

Description Roman Safronov 2020-05-27 20:25:23 UTC
Description of problem:

Downstream CI ml2ovs -> ml2ovn migration job fails.

Playbook reduce-dhcp-renewal-time.yml tries to connect to all controller nodes using ssh and timeout occurs. See [0] below.

This happens because connection with hostname is not working via 172.17.1.0/24 subnet. This was working in the past but not with osp16.

(overcloud) [stack@undercloud-0 ~]$ ssh controller-0
^C
Result: not responding

Actually the following command is used but result is the same:
(overcloud) [stack@undercloud-0 ~]$ ssh   -o ControlMaster=auto -o ControlPersist=270s -o ServerAliveInterval=30 -o GSSAPIAuthentication=no -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User="heat-admin" -o ConnectTimeout=10 -o ControlPath=/home/stack/.ansible/cp/%h-%r controller-0 /bin/sh -c 'echo ~heat-admin && sleep 0'
Result: not responding


After checking /etc/hosts (see it's content in [2] below) I found that ssh connection to controller-0.ctlplane is working:

(overcloud) [stack@undercloud-0 ~]$ ssh controller-0.ctlplane
Warning: Permanently added 'controller-0.ctlplane,192.168.24.43' (ECDSA) to the list of known hosts.
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Wed May 27 19:27:10 2020 from 192.168.24.1
[heat-admin@controller-0 ~]$


See also /home/stack/hosts_for_migration below [1].
Seems like ansible inventory file host_for_migration should be generated using 'ctlplane' domain name with all hosts in order to connect via 192.168.24.0/24 subnet addresses.



Version-Release number of selected component (if applicable):
Found on OSP16 , used puddle  RHOS_TRUNK-16.0-RHEL-8-20200513.n.1
python3-networking-ovn-migration-tool-7.1.1-0.20200403214619.4114bc5.el8ost.noarch
python3-networking-ovn-7.1.1-0.20200403214619.4114bc5.el8ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Run downstream CI ml2ovs->ml2ovn migration job
2.
3.

Actual results:
Migration job fails on reduce-dhcp-renewal-time.yml

Expected results:
Migration job does not fail on reduce-dhcp-renewal-time.yml



Additional info:


[0]

from /home/stack/ovn_migration/setup-mtu-t1.log

<controller-2> ssh_retry: attempt: 3, ssh return code is 255. cmd ([b'ssh', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=270s', b'-o', b'ServerAliveInterval=30', b'-o', b'GSSAPIAuthentication=no', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="heat-admin"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/home/stack/.ansible/cp/%h-%r', b'controller-2', b"/bin/sh -c 'echo ~heat-admin && sleep 0'"]...), pausing for 3 seconds
<controller-0> ssh_retry: attempt: 3, ssh return code is 255. cmd ([b'ssh', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=270s', b'-o', b'ServerAliveInterval=30', b'-o', b'GSSAPIAuthentication=no', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="heat-admin"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/home/stack/.ansible/cp/%h-%r', b'controller-0', b"/bin/sh -c 'echo ~heat-admin && sleep 0'"]...), pausing for 3 seconds
<controller-1> ssh_retry: attempt: 3, ssh return code is 255. cmd ([b'ssh', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=270s', b'-o', b'ServerAliveInterval=30', b'-o', b'GSSAPIAuthentication=no', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="heat-admin"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/home/stack/.ansible/cp/%h-%r', b'controller-1', b"/bin/sh -c 'echo ~heat-admin && sleep 0'"]...), pausing for 3 seconds
fatal: [controller-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host controller-1 port 22: Connection timed out", "unreachable": true}
fatal: [controller-2]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host controller-2 port 22: Connection timed out", "unreachable": true}
fatal: [controller-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host controller-0 port 22: Connection timed out", "unreachable": true}

PLAY RECAP *********************************************************************
controller-0               : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
controller-1               : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   
controller-2               : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   

Wednesday 27 May 2020  18:45:07 +0000 (0:00:44.123)       0:00:44.243 ********* 
=============================================================================== 
Gathering Facts -------------------------------------------------------- 44.12s
/home/stack/ovn_migration/playbooks/reduce-dhcp-renewal-time.yml:3 ------------






[1]

(overcloud) [stack@undercloud-0 ~]$ cat hosts_for_migration
[ovn-dbs]
controller-0 ansible_host=controller-0 ovn_central=true ansible_ssh_user=heat-admin ansible_become=true
controller-1 ansible_host=controller-1 ansible_ssh_user=heat-admin ansible_become=true
controller-2 ansible_host=controller-2 ansible_ssh_user=heat-admin ansible_become=true

[ovn-controllers]
controller-0 ansible_host=controller-0 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true
controller-1 ansible_host=controller-1 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true
controller-2 ansible_host=controller-2 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true
compute-0 ansible_host=compute-0 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true
compute-1 ansible_host=compute-1 ansible_ssh_user=heat-admin ansible_become=true ovn_controller=true


[overcloud-controllers:children]
ovn-dbs

[overcloud:children]
ovn-controllers
ovn-dbs


[overcloud:vars]
remote_user=heat-admin
public_network_name=nova
image_name=cirros
working_dir=/home/stack/ovn_migration
server_user_name=cirros
validate_migration=True
overcloud_ovn_deploy_script=/home/stack/overcloud-deploy-ovn.sh
overcloudrc=/home/stack/overcloudrc
ovn_migration_backups=/var/lib/ovn-migration-backup

[overcloud-controllers:vars]
remote_user=heat-admin
public_network_name=nova
image_name=cirros
working_dir=/home/stack/ovn_migration
server_user_name=cirros
validate_migration=True
overcloud_ovn_deploy_script=/home/stack/overcloud-deploy-ovn.sh
overcloudrc=/home/stack/overcloudrc
ovn_migration_backups=/var/lib/ovn-migration-backup





[2]

(overcloud) [stack@undercloud-0 ~]$ cat /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
172.17.1.17 compute-0.redhat.local compute-0
172.17.3.43 compute-0.storage.redhat.local compute-0.storage
172.17.1.17 compute-0.internalapi.redhat.local compute-0.internalapi
172.17.2.122 compute-0.tenant.redhat.local compute-0.tenant
192.168.24.37 compute-0.ctlplane.redhat.local compute-0.ctlplane
172.17.1.98 compute-1.redhat.local compute-1
172.17.3.103 compute-1.storage.redhat.local compute-1.storage
172.17.1.98 compute-1.internalapi.redhat.local compute-1.internalapi
172.17.2.59 compute-1.tenant.redhat.local compute-1.tenant
192.168.24.27 compute-1.ctlplane.redhat.local compute-1.ctlplane
172.17.1.104 controller-0.redhat.local controller-0
172.17.3.94 controller-0.storage.redhat.local controller-0.storage
172.17.4.127 controller-0.storagemgmt.redhat.local controller-0.storagemgmt
172.17.1.104 controller-0.internalapi.redhat.local controller-0.internalapi
172.17.2.133 controller-0.tenant.redhat.local controller-0.tenant
10.0.0.144 controller-0.external.redhat.local controller-0.external
192.168.24.43 controller-0.ctlplane.redhat.local controller-0.ctlplane
172.17.1.59 controller-1.redhat.local controller-1
172.17.3.37 controller-1.storage.redhat.local controller-1.storage
172.17.4.117 controller-1.storagemgmt.redhat.local controller-1.storagemgmt
172.17.1.59 controller-1.internalapi.redhat.local controller-1.internalapi
172.17.2.33 controller-1.tenant.redhat.local controller-1.tenant
10.0.0.107 controller-1.external.redhat.local controller-1.external
192.168.24.17 controller-1.ctlplane.redhat.local controller-1.ctlplane
172.17.1.62 controller-2.redhat.local controller-2
172.17.3.44 controller-2.storage.redhat.local controller-2.storage
172.17.4.94 controller-2.storagemgmt.redhat.local controller-2.storagemgmt
172.17.1.62 controller-2.internalapi.redhat.local controller-2.internalapi
172.17.2.135 controller-2.tenant.redhat.local controller-2.tenant
10.0.0.121 controller-2.external.redhat.local controller-2.external
192.168.24.26 controller-2.ctlplane.redhat.local controller-2.ctlplane

Comment 4 Jakub Libosvar 2020-08-10 12:49:12 UTC
This bug is fixed with the fix for bug 1847463 in OSP 16.1