Bug 1271002
Summary: | Ceilometer dbsync failing during HA deployment | ||
---|---|---|---|
Product: | [Community] RDO | Reporter: | Marius Cornea <mcornea> |
Component: | openstack-ceilometer | Assignee: | Alan Pevec (Fedora) <apevec> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Yurii Prokulevych <yprokule> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | Liberty | CC: | apevec, eglynn, jruzicka, jtrowbri, mburns, mcornea, yeylon |
Target Milestone: | GA | ||
Target Release: | Liberty | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-ceilometer-5.0.0-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-03-30 23:09:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marius Cornea
2015-10-12 21:22:27 UTC
I was able to run ceilometer-dbsync after applying this: https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/mongo/utils.py#L269-L276 Installed ceilometer packages: python-ceilometer-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-central-5.0.0.0-rc1.el7.centos.noarch python-ceilometerclient-1.5.1-dev1.el7.centos.noarch openstack-ceilometer-collector-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-alarm-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-polling-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-api-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-common-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-notification-5.0.0.0-rc1.el7.centos.noarch openstack-ceilometer-compute-5.0.0.0-rc1.el7.centos.noarch (In reply to Marius Cornea from comment #1) > I was able to run ceilometer-dbsync after applying this: I don't see how master code could help vs what's in stable/liberty? https://github.com/openstack/ceilometer/blob/stable/liberty/ceilometer/storage/mongo/utils.py#L269-L281 Could it be just timing issue and mongodb was fully ready on the first attempt? Please try revert to original stable/liberty code and try again. > https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/mongo/ > utils.py#L269-L276 > > Installed ceilometer packages: > python-ceilometer-5.0.0.0-rc1.el7.centos.noarch There was ceilometer RC2 in the meantime but _mongo_connect was not changed rc1..rc2 I tried to manually run dbsync after the cluster was up and got the following error: /usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf --debug Unable to reconnect to the primary mongodb: No replica set members available for replica set name "". Trying again in 10 seconds. After switching to the master chunk in utils.py the dbsync finished: [root@overcloud-controller-2 ~]# /usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf --debug No handlers could be found for logger "oslo_config.cfg" 2015-10-13 11:29:27.270 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.metering.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149 2015-10-13 11:29:27.329 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)] 2015-10-13 11:29:27.343 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.alarm.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149 2015-10-13 11:29:27.418 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)] 2015-10-13 11:29:27.424 29659 WARNING oslo_config.cfg [-] Option "alarm_history_time_to_live" from group "database" is deprecated for removal. Its value may be silently ignored in the future. 2015-10-13 11:29:27.428 29659 DEBUG ceilometer.storage [-] looking for 'mongodb' driver in 'ceilometer.event.storage' get_connection /usr/lib/python2.7/site-packages/ceilometer/storage/__init__.py:149 2015-10-13 11:29:27.429 29659 INFO ceilometer.storage.mongo.utils [-] Connecting to mongodb on [('172.16.20.15', 27017), ('172.16.20.13', 27017), ('172.16.20.14', 27017)] I have also seen this issue in my HA RDO-Manager deploys. It actually makes sense that master code is the fix, since the upstream tripleoci does not see this issue. Also, if I remove the ceilometer dbsync from the tripleo heat templates, I get a successful deploy. So, this is the only issue blocking working HA for RDO-Manager. I've had a look at the change in https://review.openstack.org/#/c/227909/ and I don't think we can do a straight backport of that. Instead I think there's been a bug in the liberty code for some time where the code in ceilometer.storage.mongo.utils needs to have a conditional like the one in the test: https://github.com/openstack/ceilometer/commit/a6d608a33235dfa0d4ef91e3a3d69359ceb0263f#diff-0a4e8fdfc30fefb2d0aab976822c386bL3592 Basically, for liberty, if replica_set is set, use it, otherwise, use the URL without passing a replica_set argument. I'll make an upstream bug about this, targeting liberty and see where that gets us (and link it back here). A potential workaround (proven by mcornea) to this problem is to: * _not_ use replica_set parameter in the database connection url * set [database]mongodb_replica_set in ceilometer.conf to the name of the replica set Adding mongodb_replica_set=tripleo to ceilometer.conf database section(I left the mongodb url untouched) made dbsync pass. Upstream fix has merged to stable/liberty and been confirmed by trown. |