Description of problem: During overcloud deployment the ceph monitor gets started only on controller-0 which delays the deployment as the ceph client is trying to reach the rest of 2 monitors(set in the conf file) which are not reachable: During the following step: overcloud-ControllerNodesPostDeployment-r42rlskbzdeb-ControllerServicesBaseDeployment_Step2-sezqtmy5qs4r on controller-0 os-collect-config shows that it's stuck with: Running /var/lib/heat-config/hooks/puppet < /var/lib/heat-config/deployed/101ffc97-51c5-4ceb-bb3e-00a537697ad3.json Running the puppet command manually with debug enabled: FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/101ffc97-51c5-4ceb-bb3e-00a537697ad3" FACTER_fqdn="overcloud-controller-0.localdomain" FACTER_deploy_config_name="ControllerServicesBaseDeployment_Step2" puppet apply --detailed-exitcodes /var/lib/heat-config/heat-config-puppet/101ffc97-51c5-4ceb-bb3e-00a537697ad3.pp -d Shows that it's stuck at: Debug: Executing 'service ceph status mon.overcloud-controller-0' Debug: Exec[rm-keyring-overcloud-controller-0](provider=posix): Executing check '/bin/true # comment to satisfy puppet syntax requirements set -ex test ! -e /tmp/ceph-mon-keyring-overcloud-controller-0 ' Debug: Executing '/bin/true # comment to satisfy puppet syntax requirements set -ex test ! -e /tmp/ceph-mon-keyring-overcloud-controller-0 ' Debug: /Stage[main]/Ceph::Profile::Mon/Ceph::Mon[overcloud-controller-0]/Exec[rm-keyring-overcloud-controller-0]/unless: + test '!' -e /tmp/ceph-mon-keyring-overcloud-controller-0 Debug: Exec[ceph-injectkey-client.admin](provider=posix): Executing check '/bin/true # comment to satisfy puppet syntax requirements set -ex ceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring' auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg==' Debug: Executing '/bin/true # comment to satisfy puppet syntax requirements set -ex ceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring' auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg==' If I run the ceph command manually I get the following output: [root@overcloud-controller-0 heat-admin]# ceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring' auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg== 2016-06-21 09:53:33.873039 7f1bce6fb700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0004ef0).fault 2016-06-21 09:53:36.873332 7f1bce7fc700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000c520).fault 2016-06-21 09:53:42.874206 7f1bd4184700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0007790).fault 2016-06-21 09:53:45.874477 7f1bce7fc700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0005160).fault 2016-06-21 09:53:51.875374 7f1bce6fb700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0005ab0).fault 2016-06-21 09:53:57.875759 7f1bd4184700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000e270).fault 2016-06-21 09:54:00.876063 7f1bce6fb700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000e770).fault 2016-06-21 09:54:03.876455 7f1bd4184700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0013090).fault 10.0.0.143 and 10.0.0.140 are the IP addresses of overcloud-controller-2 and overcloud-controller-1 respectively where the ceph service is not listening as there is no ceph.conf file: [root@overcloud-controller-0 heat-admin]# facter ipaddress_vlan300 10.0.0.139 [heat-admin@overcloud-controller-1 ~]$ facter ipaddress_vlan300 10.0.0.140 [root@overcloud-controller-2 heat-admin]# facter ipaddress_vlan300 10.0.0.143 [root@overcloud-controller-2 ~]# ls -l /etc/ceph/ total 4 -rwxr-xr-x. 1 root root 92 May 23 21:30 rbdmap systemctl status ceph reports that there is no /etc/ceph/ceph.conf file Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with HA controllers 2. Wait for ControllerServicesBaseDeployment_Step2 3. SSH to controller-0 4. Check the os-collect-config journal 5. Run puppet manually with debug mode Actual results: The ceph client is trying to reach the other controllers where the ceph service is not running: ceph --name 'mon.' --keyring '/var/lib/ceph/mon/ceph-overcloud-controller-0/keyring' auth get client.admin | grep AQDqCGlXAAAAABAAAk3LxlGiCOI4D0xpBpHxIg== 2016-06-21 09:53:33.873039 7f1bce6fb700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.143:6789/0 pipe(0x7f1bc0000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc0004ef0).fault 2016-06-21 09:53:36.873332 7f1bce7fc700 0 -- 10.0.0.139:0/1025709 >> 10.0.0.140:6789/0 pipe(0x7f1bc0008280 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1bc000c520).fault Expected results: The ceph service is configured on the other controllers as well and the client can reach them. Additional info:
Marius is this an issue in OSPd10?
No, not anymore. Closing it.