Bug 1564218 - [osp13][controller replacement] Cannot add new node to pcs cluster - Error: Error connecting to controller-3 (HTTP error: 408)
Summary: [osp13][controller replacement] Cannot add new node to pcs cluster - Error: E...
Keywords:
Status: CLOSED DUPLICATE of bug 1600169
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ga
: ---
Assignee: Damien Ciabrini
QA Contact: Artem Hrechanychenko
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-05 17:11 UTC by Artem Hrechanychenko
Modified: 2018-11-21 20:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-21 20:57:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Artem Hrechanychenko 2018-04-05 17:11:19 UTC
Description of problem:

During Controller replacement procedure we need to add new controller to pcs cluster
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html-single/director_installation_and_usage/#sect-Replacing_Controller_Nodes

10.4.3.6

Add the new node to the cluster: 

Sometimes when I do that I got HTTP 408

[heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 --wait=90
Disabling SBD service...
controller-3: sbd disabled
Sending 'corosync authkey' to 'controller-3'
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
Error: Error connecting to controller-3 (HTTP error: 408)
Error: Errors have occurred, therefore pcs is unable to continue

heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3 --wait=500
Disabling SBD service...
controller-3: sbd disabled
Sending 'corosync authkey' to 'controller-3'
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
controller-3: successful distribution of the file 'pacemaker_remote authkey'
controller-0: Corosync updated
controller-2: Corosync updated
Setting up corosync...
controller-3: Succeeded
Synchronizing pcsd certificates on nodes controller-3...
controller-3: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller-3: Success

Version-Release number of selected component (if applicable):
OSP13
[heat-admin@controller-0 ~]$ sudo rpm -qa "*pacemaker*"
pacemaker-cli-1.1.18.notifyfix-11.el7.x86_64
ansible-pacemaker-1.0.4-0.20180220234310.0e4d7c0.el7ost.noarch
pacemaker-cluster-libs-1.1.18.notifyfix-11.el7.x86_64
pacemaker-libs-1.1.18.notifyfix-11.el7.x86_64
pacemaker-1.1.18.notifyfix-11.el7.x86_64
puppet-pacemaker-0.7.2-0.20180301221314.2d2d877.el7ost.noarch
pacemaker-nagios-plugins-metadata-1.1.18.notifyfix-11.el7.x86_64
pacemaker-remote-1.1.18.notifyfix-11.el7.x86_64

[heat-admin@controller-0 ~]$ sudo rpm -qa "*pcs*"
pcs-0.9.162-5.el7.x86_64

How reproducible:
in 70% of test

Steps to Reproduce:
1.Deploy OSP13
2.Try to replace controller using official documentation


Actual results:
Error: Error connecting to controller-3 (HTTP error: 408)
Error: Errors have occurred, therefore pcs is unable to continue

Expected results:
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
controller-3: successful distribution of the file 'pacemaker_remote authkey'
controller-0: Corosync updated
controller-2: Corosync updated
Setting up corosync...
controller-3: Succeeded
Synchronizing pcsd certificates on nodes controller-3...
controller-3: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller-3: Success


Additional info:

Comment 3 Damien Ciabrini 2018-04-18 12:24:22 UTC
Hey Artem, we lack sosreports for that one, could you please try and rerun the controller replacement procedure and let us know if we can access the env or where to get sosreports?

Thanks!

Comment 4 Artem Hrechanychenko 2018-05-01 14:54:32 UTC
wasn't reproduce in two last attempts

Comment 5 Artem Hrechanychenko 2018-05-03 13:07:57 UTC
works in passed_phase2 puddle - 2018-04-26.3

Comment 6 Artem Hrechanychenko 2018-10-25 16:33:37 UTC
Reproduced
 OSP13 puddle - 2018-10-18.1

The reports should be available here: http://rhos-release.virt.bos.redhat.com/log/bz1564218


[heat-admin@controller-0 ~]$ sudo pcs cluster node add controller-3
Disabling SBD service...
controller-3: sbd disabled
Sending 'corosync authkey' to 'controller-3'
controller-3: successful distribution of the file 'corosync authkey'
Sending remote node configuration files to 'controller-3'
Error: Error connecting to controller-3 (HTTP error: 408)
Error: Errors have occurred, therefore pcs is unable to continue

Comment 7 Michele Baldessari 2018-11-21 20:57:26 UTC
This has been fixed with pcs-0.9.165-2.el7

*** This bug has been marked as a duplicate of bug 1600169 ***


Note You need to log in before you can comment on or make changes to this bug.