Bug 1348489 - During overcloud deployment the ceph monitor gets started only on one of the controllers which causes the deployment to get stuck for some time
Summary: During overcloud deployment the ceph monitor gets started only on one of the ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 10.0 (Newton)
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks: 1349456
TreeView+ depends on / blocked
 
Reported: 2016-06-21 10:08 UTC by Marius Cornea
Modified: 2016-10-12 15:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1349456 (view as bug list)
Environment:
Last Closed: 2016-10-12 15:58:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marius Cornea 2016-06-21 10:08:51 UTC
Description of problem:
During overcloud deployment the ceph monitor gets started only on controller-0 which delays the deployment as the ceph client is trying to reach the rest of 2 monitors(set in the conf file) which are not reachable:

During the following step:
overcloud-ControllerNodesPostDeployment-r42rlskbzdeb-ControllerServicesBaseDeployment_Step2-sezqtmy5qs4r

on controller-0 os-collect-config shows that it's stuck with:
Running /var/lib/heat-config/hooks/puppet < /var/lib/heat-config/deployed/101ffc97-51c5-4ceb-bb3e-00a537697ad3.json

Running the puppet command manually with debug enabled:
FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/101ffc97-51c5-4ceb-bb3e-00a537697ad3"  FACTER_fqdn="overcloud-controller-0.localdomain"  FACTER_deploy_config_name="ControllerServicesBaseDeployment_Step2"  puppet apply --detailed-exitcodes /var/lib/heat-config/heat-config-puppet/101ffc97-51c5-4ceb-bb3e-00a537697ad3.pp -d


Shows that it's stuck at:

Debug: Executing 'service ceph status mon.overcloud-controller-0'
Debug: Exec[rm-keyring-overcloud-controller-0](provider=posix): Executing check '/bin/true # comment to satisfy puppet syntax requirements
set -ex
test ! -e /tmp/ceph-mon-keyring-overcloud-controller-0
'
Debug: Executing '/bin/true # comment to satisfy puppet syntax requirements
set -ex
test ! -e /tmp/ceph-mon-keyring-overcloud-controller-0
'
Debug: /Stage[main]/Ceph::Profile::Mon/Ceph::Mon[overcloud-controller-0]/Exec[rm-keyring-overcloud-controller-0]/unless: + test '!' -e /tmp/ceph-mon-keyring-overcloud-controller-0
Debug: Exec[ceph-injectkey-client.admin](provider=posix): Executing check '/bin/true # comment to satisfy puppet syntax requirements
set -ex
ceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring'  auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg=='
Debug: Executing '/bin/true # comment to satisfy puppet syntax requirements
set -ex
ceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring'  auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg=='


If I run the ceph command manually I get the following output:
[root@overcloud-controller-0 heat-admin]# ceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring'  auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg==
2016-06-21 09:53:33.873039 7f1bce6fb700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0004ef0).fault
2016-06-21 09:53:36.873332 7f1bce7fc700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000c520).fault
2016-06-21 09:53:42.874206 7f1bd4184700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0007790).fault
2016-06-21 09:53:45.874477 7f1bce7fc700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0005160).fault
2016-06-21 09:53:51.875374 7f1bce6fb700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0005ab0).fault
2016-06-21 09:53:57.875759 7f1bd4184700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000e270).fault
2016-06-21 09:54:00.876063 7f1bce6fb700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000e770).fault
2016-06-21 09:54:03.876455 7f1bd4184700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0013090).fault

10.0.0.143 and 10.0.0.140 are the IP addresses of overcloud-controller-2 and overcloud-controller-1 respectively where the ceph service is not listening as there is no ceph.conf file:

[root@overcloud-controller-0 heat-admin]# facter ipaddress_vlan300
10.0.0.139

[heat-admin@overcloud-controller-1 ~]$ facter ipaddress_vlan300
10.0.0.140

[root@overcloud-controller-2 heat-admin]# facter ipaddress_vlan300
10.0.0.143


[root@overcloud-controller-2 ~]# ls -l /etc/ceph/
total 4
-rwxr-xr-x. 1 root root 92 May 23 21:30 rbdmap

systemctl status ceph reports that there is no /etc/ceph/ceph.conf file

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with HA controllers
2. Wait for ControllerServicesBaseDeployment_Step2 
3. SSH to controller-0 
4. Check the os-collect-config journal
5. Run puppet manually with debug mode

Actual results:
The ceph client is trying to reach the other controllers where the ceph service is not running:
ceph   --name 'mon.'   --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring'  auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg==
2016-06-21 09:53:33.873039 7f1bce6fb700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0004ef0).fault
2016-06-21 09:53:36.873332 7f1bce7fc700  0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000c520).fault


Expected results:
The ceph service is configured on the other controllers as well and the client can reach them.

Additional info:

Comment 2 Giulio Fidente 2016-10-12 15:51:40 UTC
Marius is this an issue in OSPd10?

Comment 3 Marius Cornea 2016-10-12 15:58:42 UTC
No, not anymore. Closing it.


Note You need to log in before you can comment on or make changes to this bug.