Bug 1564218

Summary: [osp13][controller replacement] Cannot add new node to pcs cluster - Error: Error connecting to controller-3 (HTTP error: 408)
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: rhosp-directorAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED DUPLICATE QA Contact: Artem Hrechanychenko <ahrechan>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: ahrechan, dbecker, jschluet, mburns, michele, morazi, rhel-osp-director-maint
Target Milestone: gaKeywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-21 20:57:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artem Hrechanychenko 2018-04-05 17:11:19 UTC
Description of problem:

During Controller replacement procedure we need to add new controller to pcs cluster
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes

10.4.3.6

Add the new node to the cluster: 

Sometimes when I do that I got HTTP 408

[heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 --wait=90
Disabling SBD service...
controller-3: sbd disabled
Sending 'corosync authkey' to 'controller-3'
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
Error: Error connecting to controller-3 (HTTP error: 408)
Error: Errors have occurred, therefore pcs is unable to continue

heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 --wait=500
Disabling SBD service...
controller-3: sbd disabled
Sending 'corosync authkey' to 'controller-3'
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
controller-3: successful distribution of the file 'pacemaker_remote authkey'
controller-0: Corosync updated
controller-2: Corosync updated
Setting up corosync...
controller-3: Succeeded
Synchronizing pcsd certificates on nodes controller-3...
controller-3: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller-3: Success

Version-Release number of selected component (if applicable):
OSP13
[heat-admin@controller-0 ~]$ sudo rpm -qa "*pacemaker*"
pacemaker-cli-1.1.18.notifyfix-11.el7.x86_64
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-cluster-libs-1.1.18.notifyfix-11.el7.x86_64
pacemaker-libs-1.1.18.notifyfix-11.el7.x86_64
pacemaker-1.1.18.notifyfix-11.el7.x86_64
puppet-pacemaker-0.7.2-0.20180301221314.2d2d877.el7ost.noarch
pacemaker-nagios-plugins-metadata-1.1.18.notifyfix-11.el7.x86_64
pacemaker-remote-1.1.18.notifyfix-11.el7.x86_64

[heat-admin@controller-0 ~]$ sudo rpm -qa "*pcs*"
pcs-0.9.162-5.el7.x86_64

How reproducible:
in 70% of test

Steps to Reproduce:
1.Deploy OSP13
2.Try to replace controller using official documentation


Actual results:
Error: Error connecting to controller-3 (HTTP error: 408)
Error: Errors have occurred, therefore pcs is unable to continue

Expected results:
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
controller-3: successful distribution of the file 'pacemaker_remote authkey'
controller-0: Corosync updated
controller-2: Corosync updated
Setting up corosync...
controller-3: Succeeded
Synchronizing pcsd certificates on nodes controller-3...
controller-3: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller-3: Success


Additional info:

Comment 3 Damien Ciabrini 2018-04-18 12:24:22 UTC
Hey Artem, we lack sosreports for that one, could you please try and rerun the controller replacement procedure and let us know if we can access the env or where to get sosreports?

Thanks!

Comment 4 Artem Hrechanychenko 2018-05-01 14:54:32 UTC
wasn't reproduce in two last attempts

Comment 5 Artem Hrechanychenko 2018-05-03 13:07:57 UTC
works in passed_phase2 puddle - 2018-04-26.3

Comment 6 Artem Hrechanychenko 2018-10-25 16:33:37 UTC
Reproduced
 OSP13 puddle - 2018-10-18.1

The reports should be available here: http://rhos-release.virt.bos.redhat.com/log/bz1564218


[heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3
Disabling SBD service...
controller-3: sbd disabled
Sending 'corosync authkey' to 'controller-3'
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
Error: Error connecting to controller-3 (HTTP error: 408)
Error: Errors have occurred, therefore pcs is unable to continue

Comment 7 Michele Baldessari 2018-11-21 20:57:26 UTC
This has been fixed with pcs-0.9.165-2.el7

*** This bug has been marked as a duplicate of bug 1600169 ***