RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 1271002 - Ceilometer dbsync failing during HA deployment
Summary: Ceilometer dbsync failing during HA deployment
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RDO
Classification: Community
Component: openstack-ceilometer
Version: Liberty
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: GA
: Liberty
Assignee: Alan Pevec (Fedora)
QA Contact: Yurii Prokulevych
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-12 21:22 UTC by Marius Cornea
Modified: 2016-03-30 23:09 UTC (History)
7 users (show)

Fixed In Version: openstack-ceilometer-5.0.0-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-30 23:09:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1505669 0 None None None Never
OpenStack gerrit 234254 0 None None None Never

Description Marius Cornea 2015-10-12 21:22:27 UTC
Description of problem:
Ceilometer dbsync is failing during HA deployment:
Error: /Stage[main]/Ceilometer::Db::Sync/Exec[ceilometer-dbsync]: Command exceeded timeout

In /var/log/ceilometer/ceilometer-dbsync.log:
CRITICAL ceilometer [-] ServerSelectionTimeoutError: No replica set members available for replica set name ""

Version-Release number of selected component (if applicable):
openstack-heat-templates-0.0.1-dev381.el7.centos.noarch
openstack-tripleo-heat-templates-0.8.7-dev277.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
1. openstack overcloud deploy --templates ~/templates/my-overcloud -e ~/templates/my-overcloud/environments/network-isolation.yaml -e ~/templates/network-environment.yaml --control-scale 3 --compute-scale 1 --libvirt-type qemu -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server clock.redhat.com

Actual results:
Deployment fails. os-collect-config logs show:
Error: /Stage[main]/Ceilometer::Db::Sync/Exec[ceilometer-dbsync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Ceilometer::Db::Sync/Exec[ceilometer-dbsync]: Command exceeded timeout

Expected results:
Deployment succeeds.

Additional info:
[root@overcloud-controller-0 ~]# grep mongo /etc/ceilometer/ceilometer.conf 
connection=mongodb://172.16.20.12:27017,172.16.20.15:27017,172.16.20.13:27017/ceilometer?replicaSet=tripleo

[root@overcloud-controller-0 ~]# mongo --host 172.16.20.12 <<<'rs.status()'
MongoDB shell version: 2.6.11
connecting to: 172.16.20.12:27017/test
{
	"set" : "tripleo",
	"date" : ISODate("2015-10-12T21:21:42Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "172.16.20.12:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 5273,
			"optime" : Timestamp(1444679633, 1),
			"optimeDate" : ISODate("2015-10-12T19:53:53Z"),
			"electionTime" : Timestamp(1444679641, 1),
			"electionDate" : ISODate("2015-10-12T19:54:01Z"),
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "172.16.20.15:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 5269,
			"optime" : Timestamp(1444679633, 1),
			"optimeDate" : ISODate("2015-10-12T19:53:53Z"),
			"lastHeartbeat" : ISODate("2015-10-12T21:21:41Z"),
			"lastHeartbeatRecv" : ISODate("2015-10-12T21:21:42Z"),
			"pingMs" : 0,
			"syncingTo" : "172.16.20.12:27017"
		},
		{
			"_id" : 2,
			"name" : "172.16.20.13:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 5269,
			"optime" : Timestamp(1444679633, 1),
			"optimeDate" : ISODate("2015-10-12T19:53:53Z"),
			"lastHeartbeat" : ISODate("2015-10-12T21:21:41Z"),
			"lastHeartbeatRecv" : ISODate("2015-10-12T21:21:41Z"),
			"pingMs" : 0,
			"syncingTo" : "172.16.20.12:27017"
		}
	],
	"ok" : 1
}
bye

Comment 1 Marius Cornea 2015-10-13 10:35:01 UTC
I was able to run ceilometer-dbsync after applying this:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/mongo/utils.py#L269-L276

Installed ceilometer packages:
python-ceilometer-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-central-5.0.0.0-rc1.el7.centos.noarch
python-ceilometerclient-1.5.1-dev1.el7.centos.noarch
openstack-ceilometer-collector-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-alarm-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-polling-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-api-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-common-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-notification-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-compute-5.0.0.0-rc1.el7.centos.noarch

Comment 2 Alan Pevec 2015-10-13 11:19:39 UTC
(In reply to Marius Cornea from comment #1)
> I was able to run ceilometer-dbsync after applying this:

I don't see how master code could help vs what's in stable/liberty?
https://github.com/openstack/ceilometer/blob/stable/liberty/ceilometer/storage/mongo/utils.py#L269-L281

Could it be just timing issue and mongodb was fully ready on the first attempt?
Please try revert to original stable/liberty code and try again.
 
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/mongo/
> utils.py#L269-L276
> 
> Installed ceilometer packages:
> python-ceilometer-5.0.0.0-rc1.el7.centos.noarch

There was ceilometer RC2 in the meantime but _mongo_connect was not changed rc1..rc2

Comment 3 Marius Cornea 2015-10-13 11:30:31 UTC
I tried to manually run dbsync after the cluster was up and got the following error:

/usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf --debug
Unable to reconnect to the primary mongodb: No replica set members available for replica set name "". Trying again in 10 seconds.

After switching to the master chunk in utils.py the dbsync finished:

[root@overcloud-controller-2 ~]# /usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf --debug
No handlers could be found for logger "oslo_config.cfg"
2015-10-13 11:29:27.270 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.metering.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149
2015-10-13 11:29:27.329 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)]
2015-10-13 11:29:27.343 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.alarm.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149
2015-10-13 11:29:27.418 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)]
2015-10-13 11:29:27.424 29659 WARNING oslo_config.cfg [-] Option "alarm_history_time_to_live" from group "database" is deprecated for removal.  Its value may be silently ignored in the future.
2015-10-13 11:29:27.428 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.event.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149
2015-10-13 11:29:27.429 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)]

Comment 4 John Trowbridge 2015-10-13 11:59:31 UTC
I have also seen this issue in my HA RDO-Manager deploys. It actually makes sense that master code is the fix, since the upstream tripleoci does not see this issue.

Also, if I remove the ceilometer dbsync from the tripleo heat templates, I get a successful deploy. So, this is the only issue blocking working HA for RDO-Manager.

Comment 5 Chris Dent 2015-10-13 12:16:39 UTC
I've had a look at the change in https://review.openstack.org/#/c/227909/ and I don't think we can do a straight backport of that.

Instead I think there's been a bug in the liberty code for some time where the code in ceilometer.storage.mongo.utils needs to have a conditional like the one in the test: https://github.com/openstack/ceilometer/commit/a6d608a33235dfa0d4ef91e3a3d69359ceb0263f#diff-0a4e8fdfc30fefb2d0aab976822c386bL3592

Basically, for liberty, if replica_set is set, use it, otherwise, use the URL without passing a replica_set argument.

I'll make an upstream bug about this, targeting liberty and see where that gets us (and link it back here).

Comment 6 Chris Dent 2015-10-13 12:31:04 UTC
A potential workaround (proven by mcornea) to this problem is to:

* _not_ use replica_set parameter in the database connection url
* set [database]mongodb_replica_set in ceilometer.conf to the name of the replica set

Comment 7 Marius Cornea 2015-10-13 12:58:20 UTC
Adding mongodb_replica_set=tripleo to ceilometer.conf database section(I left the mongodb url untouched) made dbsync pass.

Comment 8 Chris Dent 2015-10-23 11:52:20 UTC
Upstream fix has merged to stable/liberty and been confirmed by trown.


Note You need to log in before you can comment on or make changes to this bug.