| Summary: | [documentation] Controller replacement fails during step 14. Wait until the Galera service starts on all nodes. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||
| Component: | documentation | Assignee: | Dan Macpherson <dmacpher> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | RHOS Documentation Team <rhos-docs> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 8.0 (Liberty) | CC: | dbecker, dciabrin, djuran, dmacpher, jschluet, jslagle, mburns, mcornea, morazi, ochalups, rhel-osp-director-maint, srevivo | ||||
| Target Milestone: | ga | Keywords: | Documentation, Regression | ||||
| Target Release: | 10.0 (Newton) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-03-08 12:41:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Marius Cornea
2016-08-30 18:29:11 UTC
The same error happens with OSP9. I see that we introduced password authentication for mysql and there's no /root/.my.cnf file on the replaced controller. Now even if I add it, run 'pcs resource cleanup galera overcloud-controller-3' I end up with the same error:
* galera_promote_0 on overcloud-controller-3 'unknown error' (1): call=565, status=complete, exitreason='Failed initial monitor action',
last-rc-change='Wed Aug 31 06:40:54 2016', queued=0ms, exec=8358ms
Aug 31 06:41:02 overcloud-controller-3 galera(galera)[28927]: ERROR: Unable to retrieve wsrep_cluster_status, verify check_user '' has permissions to view status
Aug 31 06:41:02 overcloud-controller-3 galera(galera)[28927]: ERROR: local node <overcloud-controller-3> is started, but not in primary mode. Unknown state.
Aug 31 06:41:02 overcloud-controller-3 galera(galera)[28927]: ERROR: Failed initial monitor action
Aug 31 06:41:02 overcloud-controller-3 lrmd[3410]: notice: galera_promote_0:28927:stderr [ ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO) ]
Aug 31 06:41:02 overcloud-controller-3 lrmd[3410]: notice: galera_promote_0:28927:stderr [ ocf-exit-reason:Unable to retrieve wsrep_cluster_status, verify check_user '' has permissions to view status ]
Aug 31 06:41:02 overcloud-controller-3 lrmd[3410]: notice: galera_promote_0:28927:stderr [ ocf-exit-reason:local node <overcloud-controller-3> is started, but not in primary mode. Unknown state. ]
Aug 31 06:41:02 overcloud-controller-3 lrmd[3410]: notice: galera_promote_0:28927:stderr [ ocf-exit-reason:Failed initial monitor action ]
Aug 31 06:41:02 overcloud-controller-3 crmd[3413]: notice: Operation galera_promote_0: unknown error (node=overcloud-controller-3, call=565, rc=1, cib-update=220, confirmed=true)
Aug 31 06:41:02 overcloud-controller-3 crmd[3413]: notice: overcloud-controller-3-galera_promote_0:565 [ ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)\nocf-exit-reason:Unable to retrieve wsrep_cluster_
status, verify check_user '' has permissions to view status\nocf-exit-reason:local node <overcloud-controller-3> is started, but not in primary mode. Unknown state.\nocf-exit-reason:Failed initial monitor action\n ]
Aug 31 06:41:05 overcloud-controller-3 os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping
Update: the missing file on the replaced controller was /etc/sysconfig/clustercheck . I'm going to rerun the procedure and copy it before step 14. OK, so we need both /root/.my.cnf and /etc/sysconfig/clustercheck copied from one of the existing controllers to the replaced controller before running step 13 that brings the cluster out of maintenance. Moving this to the docs component. Dan, do you think we can add these steps to the docs please? Thank you. Hi Marius, Sorry for the long wait on this BZ. I originally modified the OSP10 docs to include these steps with an intention to backport to OSP 9 and 8. I've now pushed an update to the OSP9 and OSP8 docs to include the following two steps (step 8 and step 9) as part of the process: 8. Configure the Galera cluster check on the new node. Copy the /etc/sysconfig/clustercheck from the existing node to the same location on the new node. 9. Configure the root user’s Galera access on the new node. Copy the /root/.my.cnf from the existing node to the same location on the new node. OSP8 version: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes-Manual_Intervention OSP9 version: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes-Manual_Intervention Was there anything further required for this BZ? (In reply to Dan Macpherson from comment #8) > Hi Marius, > > Sorry for the long wait on this BZ. I originally modified the OSP10 docs to > include these steps with an intention to backport to OSP 9 and 8. > > I've now pushed an update to the OSP9 and OSP8 docs to include the following > two steps (step 8 and step 9) as part of the process: > > 8. Configure the Galera cluster check on the new node. Copy the > /etc/sysconfig/clustercheck from the existing node to the same location on > the new node. > > 9. Configure the root user’s Galera access on the new node. Copy the > /root/.my.cnf from the existing node to the same location on the new node. > > OSP8 version: > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/ > html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes- > Manual_Intervention > > OSP9 version: > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/9/ > html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes- > Manual_Intervention > > Was there anything further required for this BZ? That is all. Thank you, Dan! Thanks, Marius. And again sorry about the long wait. |