Bug 1565055 - Functional tests fail on L3HATestFailover
Summary: Functional tests fail on L3HATestFailover
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z5
: 11.0 (Ocata)
Assignee: Bernard Cafarelli
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks: 1567493
TreeView+ depends on / blocked
 
Reported: 2018-04-09 09:24 UTC by Bernard Cafarelli
Modified: 2018-05-18 16:56 UTC (History)
4 users (show)

Fixed In Version: openstack-neutron-10.0.4-8.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1567493 (view as bug list)
Environment:
Last Closed: 2018-05-18 16:56:11 UTC
Target Upstream Version:


Attachments (Terms of Use)
journalctl log while running failing tests (336.06 KB, text/plain)
2018-04-18 12:44 UTC, Bernard Cafarelli
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1674780 0 None None None 2018-04-23 10:31:24 UTC
Red Hat Product Errata RHBA-2018:1614 0 None None None 2018-05-18 16:56:45 UTC

Description Bernard Cafarelli 2018-04-09 09:24:07 UTC
Description of problem:

neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_both_ha_router_lost_gw_connection may fail with timeout
neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection may fail with mismatch_error

Version-Release number of selected component (if applicable): latest OSP 11


How reproducible: recent tests show one of these 2 failures

Comment 5 Bernard Cafarelli 2018-04-18 12:44:16 UTC
More tests:
* upstream stable/ocata devstack on centos 7.4: tests do not fail (left running all night)
* osp 11 installed with packstack on rhel 7.4 + checkout of rhos-11.0-patches: tests do not fail
* connecting to one of the CI nodes, rhel 7.5: tests almost always fail now, sample with both timeout and mismatch below

neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection
---------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "neutron/tests/base.py", line 117, in func
        return f(self, *args, **kwargs)
      File "neutron/tests/functional/agent/l3/test_ha_router.py", line 391, in test_ha_router_lost_gw_connection
        self.assertEqual(master_router, new_slave)
      File "/usr/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
        self.assertThat(observed, matcher, message)
      File "/usr/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
        raise mismatch_error
    testtools.matchers._impl.MismatchError: !=:
    reference = <neutron.agent.l3.ha_router.HaRouter object at 0x7f79d481d9d0>
    actual    = <neutron.agent.l3.ha_router.HaRouter object at 0x7f79d46ad210>
    
    

Captured stderr:
~~~~~~~~~~~~~~~~
    neutron/agent/ovsdb/native/connection.py:116: DeprecationWarning: Using function/method 'Connection._idl_factory()' is deprecated in version 'Ocata' and will be removed in version 'Pike': Use an idl_factory function instead
      self.idl = self.idl_factory()
    neutron/agent/ovsdb/native/connection.py:98: DeprecationWarning: Using function/method 'Connection.get_schema_helper()' is deprecated in version 'Ocata' and will be removed in version 'Pike': Use idlutils.get_schema_helper(conn, schema, retry=True)
      helper = self.get_schema_helper()
    neutron/agent/ovsdb/native/connection.py:99: DeprecationWarning: Using function/method 'Connection.update_schema_helper()' is deprecated in version 'Ocata' and will be removed in version 'Pike': Use an idl_factory and ovs.db.SchemaHelper for filtering
      self.update_schema_helper(helper)
    neutron/common/utils.py:804: DeprecationWarning: Raising eventlet.TimeoutError by default has been deprecated in version 'Ocata' and will be removed in version 'Pike': wait_until_true() now raises WaitTimeout error by default.
      removal_version="Pike")
    

neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_both_ha_router_lost_gw_connection
--------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "neutron/tests/base.py", line 119, in func
        self.fail('Execution of this test timed out: %s' % e)
      File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail
        raise self.failureException(msg)
    AssertionError: Execution of this test timed out: Timed out after 60 seconds
    

Captured stderr:
~~~~~~~~~~~~~~~~
    neutron/agent/ovsdb/native/connection.py:116: DeprecationWarning: Using function/method 'Connection._idl_factory()' is deprecated in version 'Ocata' and will be removed in version 'Pike': Use an idl_factory function instead
      self.idl = self.idl_factory()
    neutron/agent/ovsdb/native/connection.py:98: DeprecationWarning: Using function/method 'Connection.get_schema_helper()' is deprecated in version 'Ocata' and will be removed in version 'Pike': Use idlutils.get_schema_helper(conn, schema, retry=True)
      helper = self.get_schema_helper()
    neutron/agent/ovsdb/native/connection.py:99: DeprecationWarning: Using function/method 'Connection.update_schema_helper()' is deprecated in version 'Ocata' and will be removed in version 'Pike': Use an idl_factory and ovs.db.SchemaHelper for filtering
      self.update_schema_helper(helper)
    neutron/common/utils.py:804: DeprecationWarning: Raising eventlet.TimeoutError by default has been deprecated in version 'Ocata' and will be removed in version 'Pike': wait_until_true() now raises WaitTimeout error by default.
      removal_version="Pike")

I captured journalctl, attaching it to the bz

Comment 6 Bernard Cafarelli 2018-04-18 12:44:53 UTC
Created attachment 1423562 [details]
journalctl log while running failing tests

Comment 7 Bernard Cafarelli 2018-04-19 10:25:38 UTC
The OS versions made me test a 7.4 to 7.5 update, looks like the failure only appears with 7.5 packages.

On the "osp 11 installed with packstack on rhel 7.4 + checkout of rhos-11.0-patches" setup, I can reproduce after running yum update on the system (only system updates). Looking into possible changes

Comment 9 Bernard Cafarelli 2018-04-19 15:20:37 UTC
Just updating keepalived to the 7.5 version is enough to trigger the failures, checking specific changes between builds (both are upstream 1.3.5)

Comment 10 Bernard Cafarelli 2018-04-19 15:48:39 UTC
After 1.3.5-4 "Fix bugs related to failures when load modules and/or segfaults" for #1508435 the health check script can not be found anymore

Before:
defiant-rhos Keepalived_vrrp: VRRP_Script(ha_health_check_1) succeeded
After:
defiant-rhos Keepalived_vrrp: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
defiant-rhos Keepalived_vrrp: Unable to access script `/tmp/tmpBwr_Ci/tmpQckjUG/ha_confs/83c87422-74b7-46c8-8f44-79f3fee54062/ha_check_script_1.sh`
defiant-rhos Keepalived_vrrp: Disabling track script ha_health_check_1 since not found

That explains the timeouts observed in tests I guess

Comment 12 Bernard Cafarelli 2018-04-23 10:26:46 UTC
OK, so the possible race fixed by https://bugs.launchpad.net/neutron/+bug/1674780 occurs more often with keepalived from rhel 7.5

That explains the "Unable to access script" logs while nothing had changed in the file generation/path/content.

Backporting this change to get CI opinion

Comment 18 errata-xmlrpc 2018-05-18 16:56:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1614


Note You need to log in before you can comment on or make changes to this bug.