Bug 1446798
| Summary: | Changing DB password requires services restart | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Gregory Charot <gcharot> |
| Component: | rhosp-director | Assignee: | Emilien Macchi <emacchi> |
| Status: | CLOSED EOL | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 10.0 (Newton) | CC: | aherr, aschultz, chjones, dbecker, ealcaniz, emacchi, gcharot, mburns, michele, morazi, ushkalim, vcojot |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-08 15:17:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1441944 | ||
|
Description
Gregory Charot
2017-04-28 23:44:48 UTC
can you confirm that the MySQL has a new password? if that's the case, it's maybe an orchestration issue where password is updated in nova.conf before actually updating in it in mysql. I tried to reproduce and failed to hit the same error, my password was updated in MySQL and service restarted. Can you describe what environment you're deploying also? Thanks for the answers The mysql and the service config files were updated, just needed to restart the service(s). Probably an orchestration issue indeed. Was deployed on a fully virtualised environment with rhos-release (osp10). Will try to reproduce as soon as i can free one of our lab's node. Reproduced the issue on our team lab, steps are: 1) Deploy OSP10 (no user defined password) 2) Redeploy with custom passwords via an env file. parameter_defaults: NeutronPassword: blabla GlancePassword: blabla CinderPassword: blabla NovaPassword: blabla (...) Results are errors in the services log files: [root@lab-controller01 ~]# tail /var/log/nova/nova-conductor.log (...) 2017-07-31 15:51:58.775 74672 ERROR nova.servicegroup.drivers.db OperationalError: (pymysql.err.OperationalError) (1045, u"Access denied for user 'nova'@'172.17.1.203' (using password: YES)") Same kind of errors for nova and cinder scheduler as well as gnocchi. Have not tested to change the password for all services so more may apply. Passwords are updated in the services config files. mysql -u nova -p'blabla' works so passwords are correctly updated in the DB. The environment is at your disposal if you want to have a look at it, please contact me on IRC (gcharot) and i'll give you access. Packages versions : rpm -qa | grep tripleo openstack-tripleo-image-elements-5.2.0-2.el7ost.noarch openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch openstack-tripleo-validations-5.1.1-1.el7ost.noarch puppet-tripleo-5.6.0-4.el7ost.noarch openstack-tripleo-puppet-elements-5.3.0-1.el7ost.noarch openstack-tripleo-ui-1.2.0-1.el7ost.noarch openstack-tripleo-common-5.4.2-2.el7ost.noarch python-tripleoclient-5.4.2-1.el7ost.noarch openstack-tripleo-heat-templates-5.2.0-25.el7ost.noarch Thanks Greg for the details. I think I found something weird (still investigating right now) but it sounds like Pacemaker doesn't restart services when configuration files change for OpenStack services. I'm looking at it now. I talked with bandini and aschultz & we need to know what services are running in Pacemaker in your environment. Thanks I reproduced the issue and I'm still investigating. It seems like nova.conf is well updated on controllers, services are restarted after nova user is updated in MySQL, but in logs nova-conductor seems not working, though nova-api does. See my logs and timestamps: https://clbin.com/fIPhI I'll continue tomorrow. it's definitely a race, nova-conductor works well on controller-1 but not on controller-0.
I think that nova-conductor was restarted with new credentials during the stack-update on controller-0, but service tried to connect to mysql on controller-1 which didn't have the Mysql_user resource updated yet with the new password (theory) and still had old password (maybe Galera replication took some time?).
Look pcs status:
Online: [ controller-0 controller-1 controller-2 ]
Full list of resources:
ip-172.17.4.10 (ocf::heartbeat:IPaddr2): Started controller-0
Clone Set: haproxy-clone [haproxy]
Started: [ controller-0 controller-1 controller-2 ]
Master/Slave Set: galera-master [galera]
Masters: [ controller-0 controller-1 controller-2 ]
ip-192.168.24.7 (ocf::heartbeat:IPaddr2): Started controller-1
ip-172.17.3.19 (ocf::heartbeat:IPaddr2): Started controller-2
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ controller-0 controller-1 controller-2 ]
ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-0
Master/Slave Set: redis-master [redis]
Masters: [ controller-1 ]
Slaves: [ controller-0 controller-2 ]
ip-172.17.1.17 (ocf::heartbeat:IPaddr2): Started controller-1
ip-10.0.0.105 (ocf::heartbeat:IPaddr2): Started controller-2
openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-0
So Puppet updated the MySQL password on controller-0 but since controller-1 is master, it didn't get the user update on time via replication?
Michele, any chance to comment on my theories here?
So first a little preface about this issue. There is just no way in a distributed system to change all service passwords (be it mysql, or redis or mongodb, it does not matter) in a race-free way of *existing* users. Say you change the password for the nova user at time t1 in the DB (let's ignore the galera cluster for a sec). Now you change the nova password around the overcloud at t2 (let's assume we have only one server for the sake of argument). Any DB connection that happens between t1 and t2 will get a wrong user reply. The only race-free way to change passwords would be to: 1) Add a new service user (say novatmp) 2) Switch over the services to use the new novatmp user/pass 3) Remove old nova user (Potentially redo the same steps but with the nova user if the user has to be named 'nova' and not 'novatmp' The above, is a rather long way to say that if we change passwords for existing service users, there will always be some sort of downtime. If this BZ is not about the potential downtime, but about the fact that this should eventually all just work even with a bit of downtime, then by all means let's look into it. If it isn't, please speak up ;) So galera users are a special case: they get replicated right away (aka a modifying SQL statement on mysql users gets replicated on all nodes and *then* returns). So I don't think the theory at comment 9 is the whole story. I'd need to try it out myself and see all the logs in detail. From the logs at comment 8 I can only tell that: 1) mysql user password was changed at: heat-debug.log:2017-07-31 22:49:15 +0000 /Stage[main]/Nova::Db::Mysql/Openstacklib::Db::Mysql[nova]/Openstacklib::Db::Mysql::Host_access[nova_172.17.1.20]/Mysql_user[nova.1.20]/password_hash (notice): password_hash changed '*EDD2A49E14EFE3014782FC88531DF56D1A2BC488' to '*73C98624E32963F3D4828B9398FD3F67B8D58E40' 2) nova-api was restarted (on the same box) afterwards: heat-debug.log:2017-07-31 22:53:45 +0000 Puppet (debug): Executing '/usr/bin/systemctl restart openstack-nova-api' 3) nova-conductor was never restarted (assuming it was running on the box where the logs where collected): $ grep -ir conductor log-puppet-update |grep -i restart $ Emilien, I am on PTO tomorrow and Fri, but happy to tag-team this issue on Monday, if that works? Greg, this sounds a bit more like an RFE here. (I.e. there is no way to make a race-free password change with a simple deploy). Can we chat a bit about this bug in the next weeks when you have some time? Changed PM Score to 100 to reflect this BZ is being worked on and will take quite a bit of time. |