This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1271002 - Ceilometer dbsync failing during HA deployment
Ceilometer dbsync failing during HA deployment
Status: CLOSED CURRENTRELEASE
Product: RDO
Classification: Community
Component: openstack-ceilometer (Show other bugs)
Liberty
Unspecified Unspecified
unspecified Severity high
: GA
: Liberty
Assigned To: Alan Pevec
Yurii Prokulevych
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-12 17:22 EDT by Marius Cornea
Modified: 2016-03-30 19:09 EDT (History)
7 users (show)

See Also:
Fixed In Version: openstack-ceilometer-5.0.0-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-30 19:09:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1505669 None None None Never
OpenStack gerrit 234254 None None None Never

  None (edit)
Description Marius Cornea 2015-10-12 17:22:27 EDT
Description of problem:
Ceilometer dbsync is failing during HA deployment:
Error: /Stage[main]/Ceilometer::Db::Sync/Exec[ceilometer-dbsync]: Command exceeded timeout

In /var/log/ceilometer/ceilometer-dbsync.log:
CRITICAL ceilometer [-] ServerSelectionTimeoutError: No replica set members available for replica set name ""

Version-Release number of selected component (if applicable):
openstack-heat-templates-0.0.1-dev381.el7.centos.noarch
openstack-tripleo-heat-templates-0.8.7-dev277.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
1. openstack overcloud deploy --templates ~/templates/my-overcloud -e ~/templates/my-overcloud/environments/network-isolation.yaml -e ~/templates/network-environment.yaml --control-scale 3 --compute-scale 1 --libvirt-type qemu -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server clock.redhat.com

Actual results:
Deployment fails. os-collect-config logs show:
Error: /Stage[main]/Ceilometer::Db::Sync/Exec[ceilometer-dbsync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Ceilometer::Db::Sync/Exec[ceilometer-dbsync]: Command exceeded timeout

Expected results:
Deployment succeeds.

Additional info:
[root@overcloud-controller-0 ~]# grep mongo /etc/ceilometer/ceilometer.conf 
connection=mongodb://172.16.20.12:27017,172.16.20.15:27017,172.16.20.13:27017/ceilometer?replicaSet=tripleo

[root@overcloud-controller-0 ~]# mongo --host 172.16.20.12 <<<'rs.status()'
MongoDB shell version: 2.6.11
connecting to: 172.16.20.12:27017/test
{
	"set" : "tripleo",
	"date" : ISODate("2015-10-12T21:21:42Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "172.16.20.12:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 5273,
			"optime" : Timestamp(1444679633, 1),
			"optimeDate" : ISODate("2015-10-12T19:53:53Z"),
			"electionTime" : Timestamp(1444679641, 1),
			"electionDate" : ISODate("2015-10-12T19:54:01Z"),
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "172.16.20.15:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 5269,
			"optime" : Timestamp(1444679633, 1),
			"optimeDate" : ISODate("2015-10-12T19:53:53Z"),
			"lastHeartbeat" : ISODate("2015-10-12T21:21:41Z"),
			"lastHeartbeatRecv" : ISODate("2015-10-12T21:21:42Z"),
			"pingMs" : 0,
			"syncingTo" : "172.16.20.12:27017"
		},
		{
			"_id" : 2,
			"name" : "172.16.20.13:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 5269,
			"optime" : Timestamp(1444679633, 1),
			"optimeDate" : ISODate("2015-10-12T19:53:53Z"),
			"lastHeartbeat" : ISODate("2015-10-12T21:21:41Z"),
			"lastHeartbeatRecv" : ISODate("2015-10-12T21:21:41Z"),
			"pingMs" : 0,
			"syncingTo" : "172.16.20.12:27017"
		}
	],
	"ok" : 1
}
bye
Comment 1 Marius Cornea 2015-10-13 06:35:01 EDT
I was able to run ceilometer-dbsync after applying this:

https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/mongo/utils.py#L269-L276

Installed ceilometer packages:
python-ceilometer-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-central-5.0.0.0-rc1.el7.centos.noarch
python-ceilometerclient-1.5.1-dev1.el7.centos.noarch
openstack-ceilometer-collector-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-alarm-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-polling-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-api-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-common-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-notification-5.0.0.0-rc1.el7.centos.noarch
openstack-ceilometer-compute-5.0.0.0-rc1.el7.centos.noarch
Comment 2 Alan Pevec 2015-10-13 07:19:39 EDT
(In reply to Marius Cornea from comment #1)
> I was able to run ceilometer-dbsync after applying this:

I don't see how master code could help vs what's in stable/liberty?
https://github.com/openstack/ceilometer/blob/stable/liberty/ceilometer/storage/mongo/utils.py#L269-L281

Could it be just timing issue and mongodb was fully ready on the first attempt?
Please try revert to original stable/liberty code and try again.
 
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/mongo/
> utils.py#L269-L276
> 
> Installed ceilometer packages:
> python-ceilometer-5.0.0.0-rc1.el7.centos.noarch

There was ceilometer RC2 in the meantime but _mongo_connect was not changed rc1..rc2
Comment 3 Marius Cornea 2015-10-13 07:30:31 EDT
I tried to manually run dbsync after the cluster was up and got the following error:

/usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf --debug
Unable to reconnect to the primary mongodb: No replica set members available for replica set name "". Trying again in 10 seconds.

After switching to the master chunk in utils.py the dbsync finished:

[root@overcloud-controller-2 ~]# /usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf --debug
No handlers could be found for logger "oslo_config.cfg"
2015-10-13 11:29:27.270 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.metering.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149
2015-10-13 11:29:27.329 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)]
2015-10-13 11:29:27.343 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.alarm.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149
2015-10-13 11:29:27.418 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)]
2015-10-13 11:29:27.424 29659 WARNING oslo_config.cfg [-] Option "alarm_history_time_to_live" from group "database" is deprecated for removal.  Its value may be silently ignored in the future.
2015-10-13 11:29:27.428 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.event.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149
2015-10-13 11:29:27.429 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)]
Comment 4 John Trowbridge 2015-10-13 07:59:31 EDT
I have also seen this issue in my HA RDO-Manager deploys. It actually makes sense that master code is the fix, since the upstream tripleoci does not see this issue.

Also, if I remove the ceilometer dbsync from the tripleo heat templates, I get a successful deploy. So, this is the only issue blocking working HA for RDO-Manager.
Comment 5 Chris Dent 2015-10-13 08:16:39 EDT
I've had a look at the change in https://review.openstack.org/#/c/227909/ and I don't think we can do a straight backport of that.

Instead I think there's been a bug in the liberty code for some time where the code in ceilometer.storage.mongo.utils needs to have a conditional like the one in the test: https://github.com/openstack/ceilometer/commit/a6d608a33235dfa0d4ef91e3a3d69359ceb0263f#diff-0a4e8fdfc30fefb2d0aab976822c386bL3592

Basically, for liberty, if replica_set is set, use it, otherwise, use the URL without passing a replica_set argument.

I'll make an upstream bug about this, targeting liberty and see where that gets us (and link it back here).
Comment 6 Chris Dent 2015-10-13 08:31:04 EDT
A potential workaround (proven by mcornea) to this problem is to:

* _not_ use replica_set parameter in the database connection url
* set [database]mongodb_replica_set in ceilometer.conf to the name of the replica set
Comment 7 Marius Cornea 2015-10-13 08:58:20 EDT
Adding mongodb_replica_set=tripleo to ceilometer.conf database section(I left the mongodb url untouched) made dbsync pass.
Comment 8 Chris Dent 2015-10-23 07:52:20 EDT
Upstream fix has merged to stable/liberty and been confirmed by trown.

Note You need to log in before you can comment on or make changes to this bug.