Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 914736

Summary: OpenStack compute MySQL/Qpid timeouts
Product: Red Hat OpenStack Reporter: Dan Prince <dprince>
Component: openstack-packstackAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Nir Magnezi <nmagnezi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: aortega, apevec, derekh, ykaul
Target Milestone: snapshot5   
Target Release: 2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 13:58:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Nova log files
none
nova.conf none

Description Dan Prince 2013-02-22 16:30:05 UTC
Description of problem:

I'm seeing a small number of CI runs which install RHOS via PackStack fail after installation due to MySQL connection and Qpid heartbeat timeout issues in compute.log. Packstack seems to finish installing fine... but ultimately the first instance booted after the installation fails to start (goes to ERROR state).

When the issue occurs here is what I'm seeing in Nova's compute.log file:

2013-02-22 10:29:06 AUDIT nova.compute.resource_tracker [req-acb984a0-72b5-426a-9441-9b048efcd717 d3d9444c99d9453daa2cadee4d38e518 e67d4782497d438190016f23fac0bb11] VCPU limit not specified, defaulting to unlimited
2013-02-22 10:46:43 11394 WARNING nova.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
2013-02-22 10:46:43 11394 ERROR nova.openstack.common.rpc.impl_qpid [-] Failed to consume message from queue: heartbeat timeout
2013-02-22 10:46:43 11394 TRACE nova.openstack.common.rpc.impl_qpid Traceback (most recent call last):

Version-Release number of selected component (if applicable):

-bash-4.1# rpm -qa | grep openstack
openstack-cinder-2012.2.3-2.el6ost.noarch
openstack-nova-common-2012.2.3-1.el6ost.noarch
openstack-nova-volume-2012.2.3-1.el6ost.noarch
openstack-nova-objectstore-2012.2.3-1.el6ost.noarch
openstack-nova-cert-2012.2.3-1.el6ost.noarch
openstack-nova-novncproxy-0.4-2.el6.noarch
openstack-swift-1.7.4-7.el6ost.noarch
openstack-swift-container-1.7.4-7.el6ost.noarch
openstack-keystone-2012.2.3-3.el6ost.noarch
openstack-glance-2012.2.3-1.el6ost.noarch
openstack-nova-api-2012.2.3-1.el6ost.noarch
openstack-packstack-2012.2.2-1.0.dev408.el6ost.noarch
openstack-nova-console-2012.2.3-1.el6ost.noarch
openstack-nova-2012.2.3-1.el6ost.noarch
openstack-selinux-0.1.2-4.el6ost.noarch
openstack-dashboard-2012.2.3-2.el6ost.noarch
openstack-swift-plugin-swift3-1.0.0-0.20120711git.el6.noarch
openstack-swift-proxy-1.7.4-7.el6ost.noarch
openstack-swift-account-1.7.4-7.el6ost.noarch
openstack-nova-scheduler-2012.2.3-1.el6ost.noarch
openstack-nova-network-2012.2.3-1.el6ost.noarch
openstack-nova-compute-2012.2.3-1.el6ost.noarch
python-django-openstack-auth-1.0.2-3.1.el6.noarch
openstack-swift-object-1.7.4-7.el6ost.noarch
openstack-utils-2012.2-6.1.el6ost.noarch

How reproducible:

 Seems to occur once every couple days.

Steps to Reproduce:
1. Take bare bones RHEL 6.4 VM.
2. Install latest RHOS packages via Packstack.
3. Run Torpedo (a smoke test suite... runs fast)

Comment 1 Dan Prince 2013-02-22 16:31:07 UTC
Created attachment 701255 [details]
Nova log files

Comment 2 Dan Prince 2013-02-22 16:31:58 UTC
Created attachment 701257 [details]
nova.conf

Comment 4 Alan Pevec 2013-02-26 11:20:43 UTC
>  WARNING nova.db.sqlalchemy.session [-] Got mysql server has gone away: (2013, 'Lost connection to MySQL server during query')
> ERROR nova.openstack.common.rpc.impl_qpid [-] Failed to consume message from queue: heartbeat timeout

AFAIK Nova should reconnect to both db and message broker.
Dan, are mysqld and qpidd running? What state are they when that happens?

Comment 5 Dan Prince 2013-03-06 13:58:00 UTC
Closing for now as we can't reproduce this anymore. Will bring it up again if it happens again.