Bug 1188199

Summary: neutron-server fails to start due to systemd timeout if DB is unavailable
Product: Red Hat OpenStack Reporter: Javier Peña <jpena>
Component: openstack-neutronAssignee: lpeer <lpeer>
Status: CLOSED DUPLICATE QA Contact: Ofer Blaut <oblaut>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0 (Juno)CC: chrisw, nyechiel, yeylon
Target Milestone: ---   
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-02 10:29:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Javier Peña 2015-02-02 10:27:19 UTC
Description of problem: if the database is unavailable when neutron-server tries to start, Neutron will retry a pre-configured number of times, as specified in max_retries. 

max_retries can be set to -1 for an indefinite retry, but this is never attempted because the systemd unit times out and stops the service. 

This is an issue when configuring an HA environment, in case of a complete cluster restart, because it depends on ensuring the Galera DB cluster is formed before starting Neutron, and this can only be done if using Pacemaker. In other architectures, this cannot be guaranteed and requires manual Neutron server startup after bootstraping the Galera cluster.

This behaviour can be fixed by setting Restart=on-failure on the systemd unit file.

Version-Release number of selected component (if applicable):
openstack-neutron-2014.2.1-6.el7ost.noarch

How reproducible: always


Steps to Reproduce:
1. Setup an OpenStack environment, set MariaDB startup to disabled.
2. Set max_retries=-1 in /etc/neutron/neutron.conf
3. Try to start neutron-server

Actual results:
After a number of retries, neutron-server will stop and never try to start again.

Expected results:
neutron-server retries until the database connection is restablished.

Additional info:
Some information gathered from my test environment:

[root@hacontroller1 ~]# systemctl status neutron-server
neutron-server.service - OpenStack Neutron Server
   Loaded: loaded (/usr/lib/systemd/system/neutron-server.service; enabled)
   Active: failed (Result: timeout) since lun 2015-02-02 11:13:25 CET; 4s ago
 Main PID: 1495
   CGroup: /system.slice/neutron-server.service

feb 02 11:13:25 hacontroller1.example.com systemd[1]: neutron-server.service operation timed out. Terminating.
feb 02 11:13:25 hacontroller1.example.com systemd[1]: Failed to start OpenStack Neutron Server.
feb 02 11:13:25 hacontroller1.example.com systemd[1]: Unit neutron-server.service entered failed state.

[root@hacontroller1 ~]# grep retries /etc/neutron/neutron.conf 
# Maximum amount of retries to generate a unique MAC address
# mac_generation_retries = 16
# How long to backoff for between retries when connecting to
# Maximum number of RabbitMQ connection retries. Default is 0
#rabbit_max_retries=0
max_retries = -1
# max_retries = 10


[root@hacontroller1 neutron]# tail -n 25 /var/log/neutron/server.log 
2015-02-02 11:12:11.719 1495 INFO neutron.manager [-] Loading core plugin: neutron.plugins.ml2.plugin.Ml2Plugin
2015-02-02 11:12:14.076 1495 INFO neutron.plugins.ml2.managers [-] Configured type driver names: ['local', 'gre', 'flat', 'vxlan', 'vlan']
2015-02-02 11:12:14.099 1495 INFO neutron.plugins.ml2.drivers.type_flat [-] Arbitrary flat physical_network names allowed
2015-02-02 11:12:14.128 1495 INFO neutron.plugins.ml2.drivers.type_vlan [-] Network VLAN ranges: {}
2015-02-02 11:12:14.188 1495 INFO neutron.plugins.ml2.drivers.type_local [-] ML2 LocalTypeDriver initialization complete
2015-02-02 11:12:14.326 1495 INFO neutron.plugins.ml2.managers [-] Loaded type driver names: ['flat', 'vlan', 'local', 'gre', 'vxlan']
2015-02-02 11:12:14.327 1495 INFO neutron.plugins.ml2.managers [-] Registered types: ['flat', 'vlan', 'local', 'gre', 'vxlan']
2015-02-02 11:12:14.328 1495 INFO neutron.plugins.ml2.managers [-] Tenant network_types: ['vxlan']
2015-02-02 11:12:14.328 1495 INFO neutron.plugins.ml2.managers [-] Configured extension driver names: []
2015-02-02 11:12:14.330 1495 INFO neutron.plugins.ml2.managers [-] Loaded extension driver names: []
2015-02-02 11:12:14.330 1495 INFO neutron.plugins.ml2.managers [-] Registered extension drivers: []
2015-02-02 11:12:14.331 1495 INFO neutron.plugins.ml2.managers [-] Configured mechanism driver names: ['openvswitch']
2015-02-02 11:12:14.455 1495 INFO neutron.plugins.ml2.managers [-] Loaded mechanism driver names: ['openvswitch']
2015-02-02 11:12:14.455 1495 INFO neutron.plugins.ml2.managers [-] Registered mechanism drivers: ['openvswitch']
2015-02-02 11:12:14.638 1495 INFO neutron.plugins.ml2.managers [-] Initializing driver for type 'flat'
2015-02-02 11:12:14.638 1495 INFO neutron.plugins.ml2.drivers.type_flat [-] ML2 FlatTypeDriver initialization complete
2015-02-02 11:12:14.639 1495 INFO neutron.plugins.ml2.managers [-] Initializing driver for type 'vlan'
2015-02-02 11:12:14.773 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -1 attempts left.
2015-02-02 11:12:24.788 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -2 attempts left.
2015-02-02 11:12:34.801 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -3 attempts left.
2015-02-02 11:12:44.817 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -4 attempts left.
2015-02-02 11:12:54.830 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -5 attempts left.
2015-02-02 11:13:04.842 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -6 attempts left.
2015-02-02 11:13:14.854 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -7 attempts left.
2015-02-02 11:13:24.867 1495 WARNING oslo.db.sqlalchemy.session [-] SQL connection failed. -8 attempts left.

Comment 3 Javier Peña 2015-02-02 10:29:12 UTC
For some reason the bug has been duplicated. Sorry about that.

Comment 4 Javier Peña 2015-02-02 10:29:32 UTC

*** This bug has been marked as a duplicate of bug 1188198 ***