Bug 1610889

Summary: [Scale] Change restconf_poll_interval default to 15 seconds
Product: Red Hat OpenStack Reporter: Victor Pickard <vpickard>
Component: python-networking-odlAssignee: Tim Rozet <trozet>
Status: CLOSED DUPLICATE QA Contact: Noam Manos <nmanos>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: aadam, mkolesni, sgaddam
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2018-08-07 11:18:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Victor Pickard 2018-08-01 14:46:26 UTC
Description of problem:

Currently, networking_odl has a default timer of 30 seconds for the restconf_poll_interval.

In a cluster environment, it is a race as to which networking-odl will run the hostconfig task. The hostconfig task on each controller will have this timer pop, but there is some logic checks to make sure the task didn't just complete within the last timer interval, make sure task isn't already running, etc.

With these checks, in a clustered environment, it is possible that the hostconfig task may not actually run (in default 30 second config) until 55+ seconds have elapsed on a given controller node.

Given that the dead_agent_timer is 75 seconds, this would mean, that depending on timer cycles of networking-odl hostconfig agent, we would only get 1 shot to properly fetch the hostconfing from ODL.


Version-Release number of selected component (if applicable):


How reproducible:

Intermittent


Steps to Reproduce:
1. Deploy cluster
2. Monitor hostconfig collection interval
3.

Actual results:

Interval for running hostconfig task could exceed 30 seconds between cycles.


Expected results:

In order to allow for at least 2 attempts to fetch ODL hostconfig, adjust the restconf_poll_interval to 15 seconds, so that the max time between the hostconfig task running in networking-odl will be max 29 seconds.


Additional info:

We should move the default to 15 seconds, but also, have a way to configure this in production (with proper instructions from support staff... meaning hidden config command).

Comment 1 Mike Kolesnik 2018-08-07 11:18:57 UTC
Closing as duplicate of bug 1610546 since we need to solve the big picture of how/what these agent statuses mean and their alive/deadness.

*** This bug has been marked as a duplicate of bug 1610546 ***