Bug 1468868

Summary: CI tests failed with ssh timeout error when ovs-security-group is enabled
Product: Red Hat OpenStack Reporter: Eran Kuris <ekuris>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED DUPLICATE QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: amuller, chrisw, nyechiel, srevivo
Target Milestone: ---Keywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-16 17:23:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eran Kuris 2017-07-09 06:17:26 UTC
Description of problem:
The issue with ssh timeout errors reproduces on our CI job only when the job configures with OVS-security-group.

Debugging with dev-{Ihar}  found this:

1. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron-lbaas/job/DFG-network-neutron-lbaas-11_director-7.3-virthost-3cont_2comp-ipv4-vxlan-ovs-secgroups-with-custom-guest-image/10/testReport/tempest.scenario.test_network_v6/TestGettingAddress/test_dualnet_dhcp6_stateless_from_os_compute_id_76f26acd_9688_42b4_bc3e_cd134c4cb09e_network_slow_/

There, you can see that it claims connectivity failure, but when you look at tempest log, you see that it successfully reached the node via ssh:

2017-07-04 05:25:57,102 21550 INFO     [paramiko.transport] Connected (version 2.0, client OpenSSH_6.6.1)
2017-07-04 05:25:57,221 21550 INFO     [paramiko.transport] Authentication (publickey) successful!
2017-07-04 05:25:57,253 21550 INFO     [tempest.lib.common.ssh] ssh connection to cloud-user.0.212 successfully created

The way the test checks if everything works is by logging via ssh and calling 'ip address' and checking that the expected address is in the output.

2. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron-lbaas/job/DFG-network-neutron-lbaas-11_director-7.3-virthost-3cont_2comp-ipv4-vxlan-ovs-secgroups-with-custom-guest-image/10/testReport/neutron.tests.tempest.scenario.test_qos/QoSTest/test_qos_id_1f7ed39b_428f_410a_bd2b_db9f465680df_/

Again, tempest output suggests that ssh connectivity is ok, but it still fails to connect to the desired address to execute bandwidth measurement. 

3. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron-lbaas/job/DFG-network-neutron-lbaas-11_director-7.3-virthost-3cont_2comp-ipv4-vxlan-ovs-secgroups-with-custom-guest-image/10/testReport/neutron_lbaas.tests.tempest.v2.scenario.test_listener_basic/TestListenerBasic/test_listener_basic_compute_network_/

This explicitly says it's a timeout, but again, ssh connectivity is fine, it just seems like 'nc' started in the instance can't be reached.

4. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron-lbaas/job/DFG-network-neutron-lbaas-11_director-7.3-virthost-3cont_2comp-ipv4-vxlan-ovs-secgroups-with-custom-guest-image/10/testReport/tempest.scenario.test_network_basic_ops/TestNetworkBasicOps/test_update_instance_port_admin_state_compute_id_f5dfcc22_45fd_409f_954c_5bd500d7890b_network_slow_/

The failure is for SSH (Error reading SSH protocol banner) but see that ping for the address was successful. Also in console log, we see "[   20.541718] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready" 


https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron-lbaas/job/DFG-network-neutron-lbaas-11_director-7.3-virthost-3cont_2comp-ipv4-vxlan-ovs-secgroups-with-custom-guest-image/10/testReport/

Version-Release number of selected component (if applicable):
python-neutron-10.0.2-1.el7ost.noarch
openstack-neutron-openvswitch-10.0.2-1.el7ost.noarch
puppet-neutron-10.3.1-1.el7ost.noarch
python-neutron-tests-10.0.2-1.el7ost.noarch
openstack-neutron-ml2-10.0.2-1.el7ost.noarch
openstack-neutron-10.0.2-1.el7ost.noarch
How reproducible:
always

Steps to Reproduce:
1.run the job: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/neutron-lbaas/job/DFG-network-neutron-lbaas-11_director-7.3-virthost-3cont_2comp-ipv4-vxlan-ovs-secgroups-with-custom-guest-image/
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Assaf Muller 2017-07-17 18:27:45 UTC
We should get an OVS-FW job stable for OSP 12.

Comment 2 Jakub Libosvar 2017-11-16 17:23:47 UTC

*** This bug has been marked as a duplicate of bug 1508738 ***