Bug 1508038 - [SPLIT-STACK] Failed to deploy ceph: cannot stat '/var/run/ceph/ceph-mon.controller-1.asok
Summary: [SPLIT-STACK] Failed to deploy ceph: cannot stat '/var/run/ceph/ceph-mon.cont...
Keywords:
Status: CLOSED DUPLICATE of bug 1507888
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: ceph-ansible
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 12.0 (Pike)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-31 17:51 UTC by Yurii Prokulevych
Modified: 2017-11-02 10:08 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-02 10:08:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yurii Prokulevych 2017-10-31 17:51:35 UTC
Description of problem:
-----------------------
Attempt to deploy RHOS-12 failed:

...
2017-10-31 17:34:45Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2017-10-31 17:34:46Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2017-10-31 17:34:46Z [overcloud]: CREATE_FAILED  Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud CREATE_FAILED 

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: b61b5e78-2e2e-4657-9aaa-db91b9f5ddd8
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR

In ceph-ansible.log
--------------------
2017-10-31 13:33:24,178 p=23674 u=mistral |  changed: [192.168.24.51]
2017-10-31 13:33:24,186 p=23674 u=mistral |  TASK [ceph-mon : systemd start mon container] **********************************
2017-10-31 13:33:24,734 p=23674 u=mistral |  ok: [192.168.24.51]
2017-10-31 13:33:24,743 p=23674 u=mistral |  TASK [ceph-mon : wait for monitor socket to exist] *****************************
2017-10-31 13:33:25,145 p=23674 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (5 retries left).
2017-10-31 13:33:40,509 p=23674 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (4 retries left).
2017-10-31 13:33:55,875 p=23674 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (3 retries left).
2017-10-31 13:34:11,248 p=23674 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (2 retries left).
2017-10-31 13:34:26,584 p=23674 u=mistral |  FAILED - RETRYING: wait for monitor socket to exist (1 retries left).
2017-10-31 13:34:41,938 p=23674 u=mistral |  fatal: [192.168.24.51]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller-1", "stat", "/var/run/ceph/ceph-mon.controller-1.asok"], "delta": "0:00:00.069564", "end": "2017-10-31 17:34:41.848227", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2017-10-31 17:34:41.778663", "stderr": "stat: cannot stat '/var/run/ceph/ceph-mon.controller-1.asok': No such file or directory", "stderr_lines": ["stat: cannot stat '/var/run/ceph/ceph-mon.controller-1.asok': No such file or directory"], "stdout": "", "stdout_lines": []}
2017-10-31 13:34:41,938 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : copy mon restart script] **********************
2017-10-31 13:34:41,939 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph mon daemon(s) - non container] ***
2017-10-31 13:34:41,939 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph mon daemon(s) - container] *******
2017-10-31 13:34:41,939 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : copy osd restart script] **********************
2017-10-31 13:34:41,939 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph osds daemon(s) - non container] ***
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph osds daemon(s) - container] ******
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : copy mds restart script] **********************
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph mds daemon(s) - non container] ***
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph mds daemon(s) - container] *******
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : copy rgw restart script] **********************
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph rgw daemon(s) - non container] ***
2017-10-31 13:34:41,940 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph rgw daemon(s) - container] *******
2017-10-31 13:34:41,941 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : copy rbd mirror restart script] ***************
2017-10-31 13:34:41,941 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph rbd mirror daemon(s) - non container] ***
2017-10-31 13:34:41,941 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph rbd mirror daemon(s) - container] ***
2017-10-31 13:34:41,941 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : copy mgr restart script] **********************
2017-10-31 13:34:41,941 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph mgr daemon(s) - non container] ***
2017-10-31 13:34:41,942 p=23674 u=mistral |  RUNNING HANDLER [ceph-defaults : restart ceph mgr daemon(s) - container] *******
2017-10-31 13:34:41,942 p=23674 u=mistral |  PLAY RECAP *********************************************************************
2017-10-31 13:34:41,942 p=23674 u=mistral |  192.168.24.30              : ok=1    changed=0    unreachable=0    failed=0   
2017-10-31 13:34:41,942 p=23674 u=mistral |  192.168.24.31              : ok=1    changed=0    unreachable=0    failed=0   
2017-10-31 13:34:41,942 p=23674 u=mistral |  192.168.24.32              : ok=1    changed=0    unreachable=0    failed=0   
2017-10-31 13:34:41,942 p=23674 u=mistral |  192.168.24.40              : ok=1    changed=0    unreachable=0    failed=0   
2017-10-31 13:34:41,943 p=23674 u=mistral |  192.168.24.41              : ok=1    changed=0    unreachable=0    failed=0   
2017-10-31 13:34:41,943 p=23674 u=mistral |  192.168.24.50              : ok=1    changed=0    unreachable=0    failed=0   
2017-10-31 13:34:41,943 p=23674 u=mistral |  192.168.24.51              : ok=34   changed=3    unreachable=0    failed=1   
2017-10-31 13:34:41,943 p=23674 u=mistral |  192.168.24.52              : ok=1    changed=0    unreachable=0    failed=0   

Logs from container:
--------------------
docker logs ceph-mon-controller-1 
creating /etc/ceph/ceph.client.admin.keyring
creating /etc/ceph/ceph.mon.keyring
creating /var/lib/ceph/bootstrap-osd/ceph.keyring
creating /var/lib/ceph/bootstrap-mds/ceph.keyring
creating /var/lib/ceph/bootstrap-rgw/ceph.keyring
monmaptool: monmap file /etc/ceph/monmap-ceph
monmaptool: set fsid to f2988754-be37-11e7-aaf8-525400d614f5
monmaptool: writing epoch 0 to /etc/ceph/monmap-ceph (1 monitors)
importing contents of /var/lib/ceph/bootstrap-osd/ceph.keyring into /etc/ceph/ceph.mon.keyring
importing contents of /var/lib/ceph/bootstrap-mds/ceph.keyring into /etc/ceph/ceph.mon.keyring
importing contents of /var/lib/ceph/bootstrap-rgw/ceph.keyring into /etc/ceph/ceph.mon.keyring
importing contents of /etc/ceph/ceph.client.admin.keyring into /etc/ceph/ceph.mon.keyring
ceph-mon: renaming mon.noname-a 172.17.3.16:6789/0 to mon.controller-1.localdomain
ceph-mon: set fsid to f2988754-be37-11e7-aaf8-525400d614f5
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-controller-1.localdomain for mon.controller-1.localdomain
2017-10-31 17:33:25  /entrypoint.sh: SUCCESS
2017-10-31 17:33:25.328773 7f0180018700  0 set uid:gid to 167:167 (ceph:ceph)
2017-10-31 17:33:25.328821 7f0180018700  0 ceph version 10.2.7-48.el7cp (cf7751bcd460c757e596d3ee2991884e13c37b96), process ceph-mon, pid 1
2017-10-31 17:33:25.328910 7f0180018700  0 pidfile_write: ignore empty --pid-file
2017-10-31 17:33:25.356076 7f0180018700  1 leveldb: Recovering log #3
2017-10-31 17:33:25.356140 7f0180018700  1 leveldb: Level-0 table #5: started
2017-10-31 17:33:25.356852 7f0180018700  1 leveldb: Level-0 table #5: 1373 bytes OK
2017-10-31 17:33:25.357746 7f0180018700  1 leveldb: Delete type=0 #3
2017-10-31 17:33:25.357798 7f0180018700  1 leveldb: Delete type=3 #2
starting mon.controller-1.localdomain rank 1 at 172.17.3.16:6789/0 mon_data /var/lib/ceph/mon/ceph-controller-1.localdomain fsid f2988754-be37-11e7-aaf8-525400d614f5
2017-10-31 17:33:25.357985 7f0180018700  0 starting mon.controller-1.localdomain rank 1 at 172.17.3.16:6789/0 mon_data /var/lib/ceph/mon/ceph-controller-1.localdomain fsid f2988754-be37-11e7-aaf8-525400d614f5
2017-10-31 17:33:25.358891 7f0180018700  1 mon.controller-1.localdomain@-1(probing) e0 preinit fsid f2988754-be37-11e7-aaf8-525400d614f5
2017-10-31 17:33:25.358938 7f0180018700  1 mon.controller-1.localdomain@-1(probing) e0  initial_members controller-1,controller-0,controller-2, filtering seed monmap
2017-10-31 17:33:25.359729 7f0180004700  0 -- 172.17.3.16:6789/0 >> 0.0.0.0:0/1 pipe(0x5557e8388000 sd=10 :0 s=1 pgs=0 cs=0 l=0 c=0x5557e80e0d80).fault
2017-10-31 17:33:25.360026 7f0176d1a700  0 -- 172.17.3.16:6789/0 >> 0.0.0.0:0/3 pipe(0x5557e838a800 sd=22 :0 s=1 pgs=0 cs=0 l=0 c=0x5557e80e1080).fault
2017-10-31 17:33:25.360216 7f0176e1b700  0 -- 172.17.3.16:6789/0 >> 0.0.0.0:0/2 pipe(0x5557e8389400 sd=11 :0 s=1 pgs=0 cs=0 l=0 c=0x5557e80e0f00).fault
2017-10-31 17:33:25.360773 7f0176c19700  0 -- 172.17.3.16:6789/0 >> 172.17.3.15:6789/0 pipe(0x5557e8392000 sd=12 :0 s=1 pgs=0 cs=0 l=0 c=0x5557e80e1200).fault
2017-10-31 17:33:25.361522 7f0176b18700  0 -- 172.17.3.16:6789/0 >> 172.17.3.18:6789/0 pipe(0x5557e8393400 sd=21 :0 s=1 pgs=0 cs=0 l=0 c=0x5557e80e1380).fault
2017-10-31 17:34:25.359260 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7308 MB, avail 33639 MB
2017-10-31 17:35:25.359623 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7308 MB, avail 33639 MB
2017-10-31 17:36:25.359958 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7308 MB, avail 33639 MB
2017-10-31 17:37:25.360386 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:38:25.360679 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:39:25.360984 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:40:25.361559 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7308 MB, avail 33639 MB
2017-10-31 17:41:25.362163 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7308 MB, avail 33639 MB
2017-10-31 17:42:25.362581 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:43:25.362974 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:44:25.363254 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:45:25.363634 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:46:25.363931 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:47:25.364180 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB
2017-10-31 17:48:25.364492 7f0178e1f700  0 mon.controller-1.localdomain@-1(probing).data_health(0) update_stats avail 82% total 40947 MB, used 7307 MB, avail 33639 MB



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
ceph-ansible-3.0.3-1.el7cp.noarch
puppet-ceph-2.4.2-0.20170927195215.718a5ff.el7ost.noarch


Deploy command:
---------------
timeout 240m openstack overcloud deploy \
    --disable-validations \
    --templates /usr/share/openstack-tripleo-heat-templates \
    -r /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml \
    --libvirt-type kvm \
    --ntp-server clock.redhat.com \
    -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
    -e /home/stack/SPLIT-STACK-ENV/internal.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
    -e /home/stack/SPLIT-STACK-ENV/network-environment.yaml \
    -e /home/stack/SPLIT-STACK-ENV/enable-tls.yaml \
    -e /home/stack/SPLIT-STACK-ENV/inject-trust-anchor.yaml \
    -e /home/stack/SPLIT-STACK-ENV/public_vip.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
    -e /home/stack/SPLIT-STACK-ENV/hostnames.yml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
    -e /home/stack/SPLIT-STACK-ENV/debug.yaml \
    -e /home/stack/SPLIT-STACK-ENV/docker-images.yaml \
    -e /home/stack/SPLIT-STACK-ENV/nodes_data.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/config-debug.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-rhel.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-pacemaker-environment.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
    -e /home/stack/SPLIT-STACK-ENV/disable-firewall.yaml \
    -e /home/stack/SPLIT-STACK-ENV/ctlplane-net-ports.yaml \
    -e /home/stack/SPLIT-STACK-ENV/deployed-server-env.yaml \
    -e /home/stack/SPLIT-STACK-ENV/deployment-swift-data-map.yaml \
    -e /home/stack/SPLIT-STACK-ENV/network-interface-mappings.yaml



Actual results:
---------------
OC deployment failed

Additional info:
----------------
Virtual setup: 3controllers + 2computes + 3ceph

Comment 3 Giulio Fidente 2017-11-01 10:34:24 UTC
Yuri, can you see if the same fix suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1507888#c5 works for you?

Comment 4 Giulio Fidente 2017-11-02 10:08:36 UTC

*** This bug has been marked as a duplicate of bug 1507888 ***


Note You need to log in before you can comment on or make changes to this bug.