Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1645049

Summary: Ansible Networking - nuking Neutron server.log in case of connection failure + logged message
Product: Red Hat OpenStack Reporter: Arkady Shtempler <ashtempl>
Component: python-networking-ansibleAssignee: Dan Radez <dradez>
Status: CLOSED NOTABUG QA Contact: Arkady Shtempler <ashtempl>
Severity: medium Docs Contact:
Priority: medium    
Version: 14.0 (Rocky)CC: jlibosva, michapma
Target Milestone: Upstream M2Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-29 16:43:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Arkady Shtempler 2018-11-01 10:18:26 UTC
In case when incorrect IP is set for switch:


neutron-ml2-ansible.yaml:
resource_registry:
  OS::TripleO::Services::NeutronCorePlugin: OS::TripleO::Services::NeutronCorePluginML2Ansible
parameter_defaults:
  NeutronMechanismDrivers: openvswitch,ansible
  NeutronTypeDrivers: local,vxlan,vlan,flat
  NeutronNetworkType: vlan
  ML2HostConfigs:
    switch1:
      ansible_network_os: junos
      ansible_host: 10.9.95.26 #Not existing IP
      ansible_user: ansible
      ansible_ssh_pass: N3tAutomation!
      #manage_vlans: false
                  
There are two issues:

1 - Loop
Error message is logged ~every 2-3 seconds into Neutron server.log:

For example:
2018-11-01 07:20:40.866 34 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
[root@overcloud-controller-0 heat-admin]# cat /var/log/containers/neutron/server.log | grep 'Name or service not known' | grep switch1 | grep 2018
2018-11-01 06:55:29.903 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:32.359 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:34.744 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:37.545 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:39.930 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:42.275 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:44.665 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:47.126 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:49.641 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:52.005 33 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:53.426 29 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:56.054 29 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}
2018-11-01 06:55:58.478 29 ERROR neutron.plugins.ml2.managers  fatal: [switch1]: FAILED! => {"msg": "[Errno -2] Name or service not known"}

As far as I know "retry mechanism" in such cases is usually started from some low value and then getting increased per failure until Final/Fatal decision is made.

2- more meaningful message
Actual message is: "Name or service not known"

I think we should improve experience for operators. One
thing is to look at the loop why we retry so many times and other thing
is to provide more meaningful message.

Comment 1 Arkady Shtempler 2019-08-29 16:43:21 UTC
Not reproducible