Bug 1774764

Summary: When starting neutron-dhcp-agent, it loops over network state synching and timesout at various locations
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED ERRATA QA Contact: Candido Campos <ccamposr>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: afariasa, amuller, chrisw, ebarrera, ekuris, kmehta, scohen, skaplons, slinaber, wesun
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-9.4.1-52.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1789718 (view as bug list) Environment:
Last Closed: 2020-01-09 15:32:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1789718    

Description David Hill 2019-11-20 21:40:33 UTC
Description of problem:
When starting neutron-dhcp-agent, it looks over network state synching and timesout at various locations

2019-11-20 14:31:38.688 80689 ERROR neutron.common.rpc [req-937b3181-3565-4794-8228-26ab3ce26c64 - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 18 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.
2019-11-20 14:31:38.689 80689 WARNING neutron.common.rpc [req-937b3181-3565-4794-8228-26ab3ce26c64 - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it to the default value.
2019-11-20 14:31:56.643 80689 ERROR neutron.agent.dhcp.agent [req-937b3181-3565-4794-8228-26ab3ce26c64 - - - - -] Timeout notifying server of ports ready. Retrying...
2019-11-20 14:33:56.775 80689 ERROR neutron.common.rpc [req-a3fd2384-d9f7-4a29-9752-89ce540ab09e - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 40 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.
2019-11-20 14:33:56.775 80689 WARNING neutron.common.rpc [req-a3fd2384-d9f7-4a29-9752-89ce540ab09e - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it to the default value.
2019-11-20 14:34:36.712 80689 ERROR neutron.agent.dhcp.agent [req-a3fd2384-d9f7-4a29-9752-89ce540ab09e - - - - -] Timeout notifying server of ports ready. Retrying...

We might need those commit backported to RHOSP10:

https://review.opendev.org/#/c/659274/
https://review.opendev.org/#/c/694561/

Version-Release number of selected component (if applicable):
openstack-neutron-9.1.0-8.el7ost.noarch

How reproducible:
Always now

Steps to Reproduce:
1. Restart neutron-dhcp-agent and wait for a long time before it maybe some day become available
2.
3.

Actual results:
Breaks

Expected results:
Faster and less looping

Additional info:

neutron agent-list shows the service as "xxx" for 60 minutes or so, becomes available for some time and goes away in flames back again.

Comment 1 David Hill 2019-11-20 21:41:30 UTC
2019-11-20 07:01:16.943 463987 INFO neutron.agent.dhcp.agent [req-5b93abe5-92ee-478e-9fc0-57f4518e5660 - - - - -] Synchronizing state
2019-11-20 07:01:28.380 463987 INFO neutron.agent.dhcp.agent [req-2e20c066-928b-43a9-8d72-e426d0aca994 - - - - -] Synchronizing state complete
2019-11-20 10:27:20.094 463987 INFO neutron.agent.dhcp.agent [req-acbba601-c1e1-442e-9658-2a0770b79157 - - - - -] Agent has just been revived. Scheduling full sync
2019-11-20 13:40:27.182 30900 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2019-11-20 13:40:27.241 30900 INFO neutron.agent.dhcp.agent [req-c205ad86-5ebd-42cd-8a78-f111022967e2 - - - - -] Agent has just been revived. Scheduling full sync
2019-11-20 13:46:53.401 30900 INFO neutron.agent.dhcp.agent [req-c205ad86-5ebd-42cd-8a78-f111022967e2 - - - - -] Synchronizing state
2019-11-20 13:48:33.622 80689 INFO neutron.agent.dhcp.agent [-] Synchronizing state
2019-11-20 13:48:33.731 80689 INFO neutron.agent.dhcp.agent [req-dac68e42-dfb4-48d7-8e0a-7f46d14294de - - - - -] Agent has just been revived. Scheduling full sync
2019-11-20 14:09:18.170 80689 INFO neutron.agent.dhcp.agent [req-6a1643e7-3a5f-4e57-a130-b0d00c7dea28 - - - - -] Synchronizing state complete
2019-11-20 14:09:18.199 80689 INFO neutron.agent.dhcp.agent [req-dac68e42-dfb4-48d7-8e0a-7f46d14294de - - - - -] Synchronizing state
2019-11-20 14:30:20.230 80689 INFO neutron.agent.dhcp.agent [req-7c13ec5e-2c45-4994-9412-6ec514b41f53 - - - - -] Synchronizing state complete
2019-11-20 14:30:37.439 80689 INFO neutron.agent.dhcp.agent [req-8a3258f2-ba26-4997-958b-c8b8c32b9ce2 - - - - -] Agent has just been revived. Scheduling full sync

Comment 8 Eran Kuris 2019-11-28 10:02:54 UTC
 cat /etc/rhosp-release 
Red Hat OpenStack Platform release 10.0.13 (Newton)

[root@controller-0 ~]# rpm -qa | grep openstack-neutron-9.
openstack-neutron-9.4.1-53.el7ost.noarch

Comment 9 wesun 2019-12-06 17:28:48 UTC
We are experiencing similar issue. When/where can we get the RPM build of openstack-neutron-9.4.1-53 for testing?

Thanks. --weiguo

Comment 10 Andre 2019-12-10 13:33:09 UTC
Hi,

I'd like to know what's the status of this bz, I see that it says fixed in version openstack-neutron-9.4.1-52.el7ost, but Eran's comment makes it unclear, do we have it fixed on openstack-neutron-9.4.1-52, openstack-neutron-9.4.1-53? Or we still face the issue on version openstack-neutron-9.4.1-53?

Comment 15 errata-xmlrpc 2019-12-17 16:52:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4298