Bug 1438662

Summary: CI tests failed with ssh timeout error
Product: Red Hat OpenStack Reporter: Eran Kuris <ekuris>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 11.0 (Ocata)CC: abregman, afazekas, amuller, chrisw, jjoyce, nyechiel, oblaut, srevivo
Target Milestone: z1Keywords: AutomationBlocker, Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-10.0.1-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1457504 (view as bug list) Environment:
Last Closed: 2017-07-19 17:03:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1450203, 1450205    
Bug Blocks:    
Attachments:
Description Flags
log1
none
log2 none

Description Eran Kuris 2017-04-04 05:53:53 UTC
Description of problem:
 
Running tempest, the following tests are failing  

1. tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_stop_start failed

2. tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_suspend_resume

3. tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details


Version-Release number of selected component (if applicable):
python-neutron-lib-1.1.0-0.20170213120052.9b3ea8f.el7ost.noarch
openstack-neutron-ml2-10.0.0-5.el7ost.noarch
openstack-neutron-lbaas-10.0.1-0.20170222151526.c6011fb.el7ost.noarch
python-neutronclient-6.1.0-0.20170208193918.1a2820d.el7ost.noarch
openstack-neutron-common-10.0.0-5.el7ost.noarch
openstack-neutron-10.0.0-5.el7ost.noarch
openstack-neutron-openvswitch-10.0.0-5.el7ost.noarch
puppet-neutron-10.3.0-1.el7ost.noarch
python-neutron-10.0.0-5.el7ost.noarch
python-neutron-lbaas-10.0.1-0.20170222151526.c6011fb.el7ost.noarch

How reproducible:
always

Steps to Reproduce:
1. Deploy RH-OSP 11 HA (3 controllers, 2 compute)
2. Run tempest network scenario tests


Actual results:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 103, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 176, in test_server_connectivity_suspend_resume
    server, keypair, floating_ip)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 101, in _wait_server_status_and_check_network_connectivity
    self._check_network_connectivity(server, keypair, floating_ip)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 94, in _check_network_connectivity
    servers=[server])
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 608, in check_public_network_connectivity
    mtu=mtu)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 591, in check_vm_connectivity
    msg=msg)
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 678, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true : Timed out waiting for 10.0.0.211 to become reachable

Expected results:
Tests passed 100%

Additional info:
https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-3cont_2comp-ipv4-gre-lvm-lbaas/lastCompletedBuild/testReport/tempest.scenario.test_network_advanced_server_ops/TestNetworkAdvancedServerOps/test_server_connectivity_suspend_resume_compute_id_5cdf9499_541d_4923_804e_b9a60620a7f0_network_/

Comment 1 Eran Kuris 2017-04-04 05:59:07 UTC
the following tests are failing with another traceback that related to SSHTIMEOUT: 

1.tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_multi_prefix_dhcpv6_stateless

2.tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os

3.tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac

4.tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_cross_tenant_traffic 

5.neutron.tests.tempest.scenario.test_basic.NetworkBasicTest.test_basic_instance

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/test.py", line 103, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 246, in test_dualnet_dhcp6_stateless_from_os
    self._prepare_and_test(address6_mode='dhcpv6-stateless', dualnet=True)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 161, in _prepare_and_test
    sshv4_1, ips_from_api_1, sid1 = self.prepare_server(networks=net_list)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 134, in prepare_server
    username=username)
  File "/usr/lib/python2.7/site-packages/tempest/scenario/manager.py", line 351, in get_remote_client
    linux_client.validate_authentication()
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/linux/remote_client.py", line 55, in wrapper
    six.reraise(*original_exception)
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/linux/remote_client.py", line 36, in wrapper
    return function(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/linux/remote_client.py", line 100, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 206, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 120, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.221 via SSH timed out.
User: cirros, Password: None


https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-neutron-11_director-rhel-7.3-virthost-1cont_2comp-ipv4-vxlan-lvm-lbaas/lastCompletedBuild/testReport/tempest.scenario.test_network_v6/TestGettingAddress/test_dualnet_dhcp6_stateless_from_os_compute_id_76f26acd_9688_42b4_bc3e_cd134c4cb09e_network_slow_/

Comment 2 Eran Kuris 2017-04-04 05:59:46 UTC
Created attachment 1268575 [details]
log1

Comment 3 Eran Kuris 2017-04-04 06:01:03 UTC
Created attachment 1268576 [details]
log2

Comment 4 Eran Kuris 2017-04-04 06:01:37 UTC
*** Bug 1433712 has been marked as a duplicate of this bug. ***

Comment 5 Eran Kuris 2017-04-04 06:02:11 UTC
*** Bug 1433710 has been marked as a duplicate of this bug. ***

Comment 6 Eran Kuris 2017-04-04 06:02:39 UTC
*** Bug 1433702 has been marked as a duplicate of this bug. ***

Comment 7 Eran Kuris 2017-04-04 06:03:02 UTC
*** Bug 1433688 has been marked as a duplicate of this bug. ***

Comment 8 Eran Kuris 2017-04-04 06:03:41 UTC
*** Bug 1433685 has been marked as a duplicate of this bug. ***

Comment 10 Ihar Hrachyshka 2017-05-11 19:55:52 UTC
I reported the following bugs against kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=1450203
https://bugzilla.redhat.com/show_bug.cgi?id=1450205

Comment 12 Ihar Hrachyshka 2017-05-15 14:40:38 UTC
*** Bug 1438346 has been marked as a duplicate of this bug. ***

Comment 13 Ihar Hrachyshka 2017-05-18 17:15:47 UTC
A workaround on neutron side applied. This should help CI runs.

Comment 16 Ihar Hrachyshka 2017-05-24 01:00:46 UTC
The jobs still fail in gate. We probably need to enforce gratuitous ARP overriding existing entries, but with current kernel, only ARP REQUESTs will work, we need to backport https://review.openstack.org/#/c/463816/. Then we also need to enable arp_accept = 1 on eth2 on undercloud.

Comment 17 Ihar Hrachyshka 2017-05-31 21:00:38 UTC
This fix should be tested with https://bugzilla.redhat.com/show_bug.cgi?id=1457504 fixed on tripleo side.

Comment 24 errata-xmlrpc 2017-07-19 17:03:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1785