Description of problem: During Controller replacement procedure we need to add new controller to pcs cluster https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes 10.4.3.6 Add the new node to the cluster: Sometimes when I do that I got HTTP 408 [heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 --wait=90 Disabling SBD service... controller-3: sbd disabled Sending 'corosync authkey' to 'controller-3' controller-3: successful distribution of the file 'corosync authkey' Sending remote node configuration files to 'controller-3' Error: Error connecting to controller-3 (HTTP error: 408) Error: Errors have occurred, therefore pcs is unable to continue heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 --wait=500 Disabling SBD service... controller-3: sbd disabled Sending 'corosync authkey' to 'controller-3' controller-3: successful distribution of the file 'corosync authkey' Sending remote node configuration files to 'controller-3' controller-3: successful distribution of the file 'pacemaker_remote authkey' controller-0: Corosync updated controller-2: Corosync updated Setting up corosync... controller-3: Succeeded Synchronizing pcsd certificates on nodes controller-3... controller-3: Success Restarting pcsd on the nodes in order to reload the certificates... controller-3: Success Version-Release number of selected component (if applicable): OSP13 [heat-admin@controller-0 ~]$ sudo rpm -qa "*pacemaker*" pacemaker-cli-1.1.18.notifyfix-11.el7.x86_64 ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch pacemaker-cluster-libs-1.1.18.notifyfix-11.el7.x86_64 pacemaker-libs-1.1.18.notifyfix-11.el7.x86_64 pacemaker-1.1.18.notifyfix-11.el7.x86_64 puppet-pacemaker-0.7.2-0.20180301221314.2d2d877.el7ost.noarch pacemaker-nagios-plugins-metadata-1.1.18.notifyfix-11.el7.x86_64 pacemaker-remote-1.1.18.notifyfix-11.el7.x86_64 [heat-admin@controller-0 ~]$ sudo rpm -qa "*pcs*" pcs-0.9.162-5.el7.x86_64 How reproducible: in 70% of test Steps to Reproduce: 1.Deploy OSP13 2.Try to replace controller using official documentation Actual results: Error: Error connecting to controller-3 (HTTP error: 408) Error: Errors have occurred, therefore pcs is unable to continue Expected results: controller-3: successful distribution of the file 'corosync authkey' Sending remote node configuration files to 'controller-3' controller-3: successful distribution of the file 'pacemaker_remote authkey' controller-0: Corosync updated controller-2: Corosync updated Setting up corosync... controller-3: Succeeded Synchronizing pcsd certificates on nodes controller-3... controller-3: Success Restarting pcsd on the nodes in order to reload the certificates... controller-3: Success Additional info:
Hey Artem, we lack sosreports for that one, could you please try and rerun the controller replacement procedure and let us know if we can access the env or where to get sosreports? Thanks!
wasn't reproduce in two last attempts
works in passed_phase2 puddle - 2018-04-26.3
Reproduced OSP13 puddle - 2018-10-18.1 The reports should be available here: http://rhos-release.virt.bos.redhat.com/log/bz1564218 [heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 Disabling SBD service... controller-3: sbd disabled Sending 'corosync authkey' to 'controller-3' controller-3: successful distribution of the file 'corosync authkey' Sending remote node configuration files to 'controller-3' Error: Error connecting to controller-3 (HTTP error: 408) Error: Errors have occurred, therefore pcs is unable to continue
This has been fixed with pcs-0.9.165-2.el7 *** This bug has been marked as a duplicate of bug 1600169 ***