Bug 1438662 - CI tests failed with ssh timeout error
Summary: CI tests failed with ssh timeout error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 11.0 (Ocata)
Assignee: Ihar Hrachyshka
QA Contact: Eran Kuris
URL:
Whiteboard:
: 1433685 1433688 1433702 1433710 1433712 1438346 (view as bug list)
Depends On: 1450203 1450205
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-04 05:53 UTC by Eran Kuris
Modified: 2020-09-10 10:25 UTC (History)
8 users (show)

Fixed In Version: openstack-neutron-10.0.1-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1457504 (view as bug list)
Environment:
Last Closed: 2017-07-19 17:03:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log1 (97.43 KB, text/plain)
2017-04-04 05:59 UTC, Eran Kuris
no flags Details
log2 (42.63 KB, text/plain)
2017-04-04 06:01 UTC, Eran Kuris
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 463816 0 'None' MERGED Send both gratuitous ARP REQUESTs and REPLYs 2020-09-17 01:30:02 UTC
OpenStack gerrit 464020 0 'None' MERGED Wait 2 seconds between gratuitous ARP updates instead of 1 second 2020-09-17 01:30:00 UTC
Red Hat Product Errata RHBA-2017:1785 0 normal SHIPPED_LIVE openstack-neutron bug fix advisory 2017-07-19 21:00:36 UTC

Description Eran Kuris 2017-04-04 05:53:53 UTC
Description of problem:
 
Running tempest, the following tests are failing  

1. tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_stop_start failed

2. tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_suspend_resume

3. tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details


Version-Release number of selected component (if applicable):
python-neutron-lib-1.1.0-0.20170213120052.9b3ea8f.el7ost.noarch
openstack-neutron-ml2-10.0.0-5.el7ost.noarch
openstack-neutron-lbaas-10.0.1-0.20170222151526.c6011fb.el7ost.noarch
python-neutronclient-6.1.0-0.20170208193918.1a2820d.el7ost.noarch
openstack-neutron-common-10.0.0-5.el7ost.noarch
openstack-neutron-10.0.0-5.el7ost.noarch
openstack-neutron-openvswitch-10.0.0-5.el7ost.noarch
puppet-neutron-10.3.0-1.el7ost.noarch
python-neutron-10.0.0-5.el7ost.noarch
python-neutron-lbaas-10.0.1-0.20170222151526.c6011fb.el7ost.noarch

How reproducible:
always

Steps to Reproduce:
1. Deploy RH-OSP 11 HA (3 controllers, 2 compute)
2. Run tempest network scenario tests


Actual results:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 103, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 176, in test_server_connectivity_suspend_resume
    server, keypair, floating_ip)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 101, in _wait_server_status_and_check_network_connectivity
    self._check_network_connectivity(server, keypair, floating_ip)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 94, in _check_network_connectivity
    servers=[server])
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 608, in check_public_network_connectivity
    mtu=mtu)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 591, in check_vm_connectivity
    msg=msg)
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 678, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true : Timed out waiting for 10.0.0.211 to become reachable

Expected results:
Tests passed 100%

Additional info:
https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-gre-lvm-lbaas/lastCompletedBuild/testReport/tempest.scenario.test_network_advanced_server_ops/TestNetworkAdvancedServerOps/test_server_connectivity_suspend_resume_compute_id_5cdf9499_541d_4923_804e_b9a60620a7f0_network_/

Comment 1 Eran Kuris 2017-04-04 05:59:07 UTC
the following tests are failing with another traceback that related to SSHTIMEOUT: 

1.tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_multi_prefix_dhcpv6_stateless

2.tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os

3.tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac

4.tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic 

5.neutron.tests.tempest.scenario.test_basic.NetworkBasicTest.test_basic_instance

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 103, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 246, in test_dualnet_dhcp6_stateless_from_os
    self._prepare_and_test(address6_mode='dhcpv6-stateless', dualnet=True)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 161, in _prepare_and_test
    sshv4_1, ips_from_api_1, sid1 = self.prepare_server(networks=net_list)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 134, in prepare_server
    username=username)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 351, in get_remote_client
    linux_client.validate_authentication()
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/linux/remote_client.py", line 55, in wrapper
    six.reraise(*original_exception)
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/linux/remote_client.py", line 36, in wrapper
    return function(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/linux/remote_client.py", line 100, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 206, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 120, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.221 via SSH timed out.
User: cirros, Password: None


https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-1cont_2comp-ipv4-vxlan-lvm-lbaas/lastCompletedBuild/testReport/tempest.scenario.test_network_v6/TestGettingAddress/test_dualnet_dhcp6_stateless_from_os_compute_id_76f26acd_9688_42b4_bc3e_cd134c4cb09e_network_slow_/

Comment 2 Eran Kuris 2017-04-04 05:59:46 UTC
Created attachment 1268575 [details]
log1

Comment 3 Eran Kuris 2017-04-04 06:01:03 UTC
Created attachment 1268576 [details]
log2

Comment 4 Eran Kuris 2017-04-04 06:01:37 UTC
*** Bug 1433712 has been marked as a duplicate of this bug. ***

Comment 5 Eran Kuris 2017-04-04 06:02:11 UTC
*** Bug 1433710 has been marked as a duplicate of this bug. ***

Comment 6 Eran Kuris 2017-04-04 06:02:39 UTC
*** Bug 1433702 has been marked as a duplicate of this bug. ***

Comment 7 Eran Kuris 2017-04-04 06:03:02 UTC
*** Bug 1433688 has been marked as a duplicate of this bug. ***

Comment 8 Eran Kuris 2017-04-04 06:03:41 UTC
*** Bug 1433685 has been marked as a duplicate of this bug. ***

Comment 10 Ihar Hrachyshka 2017-05-11 19:55:52 UTC
I reported the following bugs against kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=1450203
https://bugzilla.redhat.com/show_bug.cgi?id=1450205

Comment 12 Ihar Hrachyshka 2017-05-15 14:40:38 UTC
*** Bug 1438346 has been marked as a duplicate of this bug. ***

Comment 13 Ihar Hrachyshka 2017-05-18 17:15:47 UTC
A workaround on neutron side applied. This should help CI runs.

Comment 16 Ihar Hrachyshka 2017-05-24 01:00:46 UTC
The jobs still fail in gate. We probably need to enforce gratuitous ARP overriding existing entries, but with current kernel, only ARP REQUESTs will work, we need to backport https://review.openstack.org/#/c/463816/. Then we also need to enable arp_accept = 1 on eth2 on undercloud.

Comment 17 Ihar Hrachyshka 2017-05-31 21:00:38 UTC
This fix should be tested with https://bugzilla.redhat.com/show_bug.cgi?id=1457504 fixed on tripleo side.

Comment 24 errata-xmlrpc 2017-07-19 17:03:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1785


Note You need to log in before you can comment on or make changes to this bug.