Description of problem: OSP14 + opendaylight deployment (opendaylight is not relevant here I guess) can't spwan new VMs on overcloud, it's a deployment that passed a few tempest tests but when tried to run it manually (after a few hours since tempest): in nova-scheduler.log: 2018-10-03 12:47:09.796 24 DEBUG oslo_concurrency.lockutils [req-d0aa38c6-5579-40a3-a3f5-9706b6240242 a8deb6344f3a4546b786fd40da85c012 873e0d69004c4b0381c701bd2ac85bf6 - default default] Lock "placement_client" released by "nova.scheduler.client.report._create_client" :: held 0.016s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285 2018-10-03 12:47:10.677 24 ERROR nova.scheduler.client.report [req-d0aa38c6-5579-40a3-a3f5-9706b6240242 a8deb6344f3a4546b786fd40da85c012 873e0d69004c4b0381c701bd2ac85bf6 - default default] Failed to retrieve allocation candidates from placement API for filters: RequestGroup(use_same_provider=False, resources={DISK_GB:1, MEMORY_MB:512, VCPU:1}, traits=[], aggregates=[]) Got 500: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> . 2018-10-03 12:47:10.678 24 DEBUG nova.scheduler.manager [req-d0aa38c6-5579-40a3-a3f5-9706b6240242 a8deb6344f3a4546b786fd40da85c012 873e0d69004c4b0381c701bd2ac85bf6 - default default] Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up. select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:150 2018-10-03 12:47:10.706 24 ERROR oslo.messaging._drivers.impl_rabbit [req-d0aa38c6-5579-40a3-a3f5-9706b6240242 a8deb6344f3a4546b786fd40da85c012 873e0d69004c4b0381c701bd2ac85bf6 - default default] [3d3b34ca-779a-4eef-ac40-5a87c12c02ea] AMQP server on controller-1.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.: error: [Errno 111] ECONNREFUSED 2018-10-03 12:47:11.727 24 ERROR oslo.messaging._drivers.impl_rabbit [req-d0aa38c6-5579-40a3-a3f5-9706b6240242 a8deb6344f3a4546b786fd40da85c012 873e0d69004c4b0381c701bd2ac85bf6 - default default] [3d3b34ca-779a-4eef-ac40-5a87c12c02ea] AMQP server on controller-1.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.: error: [Errno 111] ECONNREFUSED ### NOTE THE LONG LASTING UNHEALTHY STATE [root@controller-0 ~]# docker ps -a | grep -i nova_placement ef1df61022c5 192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2018-09-26.1 "kolla_start" 23 hours ago Up 14 hours (unhealthy) nova_placement 8dac25359c8c 192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2018-09-26.1 "/bin/bash -c 'cho..." 23 hours ago Exited (0) 23 hours ago nova_placement_init_log docker inspect nova_placement shows: "Health": { "Status": "unhealthy", "FailingStreak": 1742, "Log": [ { "Start": "2018-10-03T13:27:13.152906462+01:00", "End": "2018-10-03T13:27:13.453833088+01:00", "ExitCode": 1, "Output": "\n500 172.17.1.22:8778 0.010 seconds\ncurl: (22) The requested URL returned error: 500 Internal Server Error\n" }, { "Start": "2018-10-03T13:27:43.454237889+01:00", "End": "2018-10-03T13:27:43.849069205+01:00", "ExitCode": 1, "Output": "\n500 172.17.1.22:8778 0.009 seconds\ncurl: (22) The requested URL returned error: 500 Internal Server Error\n" }, it looks like nova placement 'died' after many failures to connect to mysql and never recovered: 2018-10-02 22:52:15.303 16 DEBUG nova.api.openstack.placement.wsgi [-] key_manager.fixed_key = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-10-02 22:52:15.304 16 DEBUG nova.api.openstack.placement.wsgi [-] osapi_v21.project_id_regex = None log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3031 2018-10-02 22:52:15.305 16 DEBUG nova.api.openstack.placement.wsgi [-] ******************************************************************************** log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3033 2018-10-02 22:52:15.340 16 WARNING keystonemiddleware.auth_token [-] AuthToken middleware is set with keystone_authtoken.service_token_roles_required set to False. This is backwards compatible but deprecated behaviour. Please set this to True. 2018-10-02 22:52:15.350 16 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "trait_sync" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212 2018-10-02 22:52:15.375 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 10 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:15.378 17 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:24.279 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:25.386 17 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:25.389 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 9 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:34.291 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:35.397 17 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "trait_sync" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 2018-10-02 22:52:35.400 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 8 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:44.304 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:45.412 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 7 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:54.310 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:52:55.425 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 6 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:53:04.322 14 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:53:05.440 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 5 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:53:14.333 14 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "trait_sync" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 2018-10-02 22:53:15.453 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 4 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.16' ([Errno 111] Connection refused)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:53:26.463 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 3 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.16' ([Errno 113] No route to host)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:53:38.487 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 2 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.16' ([Errno 113] No route to host)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:53:50.511 16 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. 1 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.16' ([Errno 113] No route to host)") (Background on this error at: http://sqlalche.me/e/e3q8) 2018-10-02 22:54:00.521 16 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "trait_sync" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 NOTE: there was a serious networking issues (another bug) between overcloud controllers hence 'no route to host' failures, but nova placement should recover from this instead of staying in unhealthy state for long I think Version-Release number of selected component (if applicable): python2-novaclient-11.0.0-0.20180809174649.f1005ce.el7ost.noarch python-nova-18.0.1-0.20180922012453.518fdc7.el7ost.noarch openstack-nova-placement-api-18.0.1-0.20180922012453.518fdc7.el7ost.noarch puppet-nova-13.3.1-0.20180917153244.6fdb591.el7ost.noarch openstack-nova-common-18.0.1-0.20180922012453.518fdc7.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: nova placement doesn't recover from 'unhealthy' container state Expected results: nova placement to recover Additional info:
*** This bug has been marked as a duplicate of bug 1630069 ***