Description of problem: We are not able to deploy : parameter_defaults: ControllerCount: 3 CephStorageCount: 3 R620ComputeCount: 77 R630ComputeCount: 1 6018RComputeCount: 1 R930ComputeCount: 1 1029pComputeCount: 0 1029uComputeCount: 1 1028rComputeCount: 1 R730ComputeCount: 1 ComputeCount: 0 ceph-ansible has failed the past 2 times I tried to deploy this. It fails with : [root@b04-h01-1029p stack]# tail -200 /var/log/mistral/ceph-install-workflow.log | grep unreachable=1 2018-02-26 20:02:35,753 p=257284 u=mistral | 192.168.25.103 : ok=0 changed=0 unreachable=1 failed=0 2018-02-26 20:02:35,754 p=257284 u=mistral | 192.168.25.123 : ok=0 changed=0 unreachable=1 failed=0 2018-02-26 20:02:35,754 p=257284 u=mistral | 192.168.25.130 : ok=0 changed=0 unreachable=1 failed=0 2018-02-26 20:02:35,756 p=257284 u=mistral | 192.168.25.69 : ok=0 changed=0 unreachable=1 failed=0 And, the second time : [root@b04-h01-1029p stack]# tail -200 /var/log/mistral/ceph-install-workflow.log | grep unreachable=1 2018-02-27 02:32:03,854 p=55900 u=mistral | 192.168.25.79 : ok=0 changed=0 unreachable=1 failed=0 2018-02-27 02:32:03,854 p=55900 u=mistral | 192.168.25.87 : ok=0 changed=0 unreachable=1 failed=0 However, each of these nodes were reachable via the heat-admin.
Latest attempt ended in a timeout : 2018-02-27 20:48:12Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_IN_PROGRESS state changed 2018-02-27 20:48:14Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_COMPLETE state changed 2018-02-27 20:48:14Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS state changed 2018-02-27 23:31:03Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED CREATE aborted 2018-02-27 23:31:03Z [overcloud]: CREATE_FAILED Create timed out Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps: resource_type: OS::TripleO::PostDeploySteps physical_resource_id: 32ddd927-e685-40b7-ae87-583a40744e12 status: CREATE_FAILED status_reason: | CREATE aborted Heat Stack create failed. Heat Stack create failed. real 243m5.161s user 0m10.601s sys 0m0.947s Tue Feb 27 23:31:23 UTC 2018 Most of the time is spent on ceph-ansible. First I will increase the timeout, if that doesn't help I am going to set fork_count to 100 to see if that helps with the concurrency...
For the sake of contrast, here is a successful deployment of the same amount of nodes, just minus the 3 ceph nodes. 2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps.1029uComputePostConfig]: CREATE_COMPLETE state changed 2018-02Host 172.168.30.17 not found in /home/stack/.ssh/known_hosts -27 17:16:16Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE state changed 2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps.1028rComputePostConfig]: CREATE_COMPLETE state changed 2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed 2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully 2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed 2018-02-27 17:16:17Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud CREATE_COMPLETE Overcloud Endpoint: http://172.168.30.17:5000/v2.0 Overcloud Deployed real 96m31.399s user 0m6.422s sys 0m0.723s Tue Feb 27 17:16:32 UTC 2018
Joe, can you attach the full version of /var/log/mistral/ceph-install-workflow.log ?
Created attachment 1401854 [details] mistral ceph-ansible log
@Giulio - So the latest failure seems to be due to : 2018-02-28 02:45:53,509 p=31523 u=mistral | failed: [192.168.25.67] (item=[{'_ansible_parsed': True, 'stderr_lines': [u'Error: /dev/sdb: unrecognised disk label'], u'cmd': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'end': u'2018-02-28 02:38:23.104123', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-02-28 02:38:23.078030', u'delta': u'0:00:00.026093', 'item': u'/dev/sdb', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'Error: /dev/sdb: unrecognised disk label', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdb', u'/dev/nvme0n1']) => {"changed": true, "cmd": "docker run --net=host --pid=host --privileged=true --name=ceph-osd-prepare-overcloud-cephstorage-2-sdb -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -e DEBUG=verbose -e CLUSTER=ceph -e CEPH_DAEMON=OSD_CEPH_DISK_PREPARE -e OSD_DEVICE=/dev/sdb -e OSD_JOURNAL=/dev/nvme0n1 -e OSD_BLUESTORE=0 -e OSD_FILESTORE=1 -e OSD_DMCRYPT=0 -e OSD_JOURNAL_SIZE=5120 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest", "delta": "0:00:05.313413", "end": "2018-02-28 02:45:53.468563", "item": [{"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.026093", "end": "2018-02-28 02:38:23.104123", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": "/dev/sdb", "msg": "non-zero return code", "rc": 1, "start": "2018-02-28 02:38:23.078030", "stderr": "Error: /dev/sdb: unrecognised disk label", "stderr_lines": ["Error: /dev/sdb: unrecognised disk label"], "stdout": "", "stdout_lines": []}, "/dev/sdb", "/dev/nvme0n1"], "msg": "non-zero return code", "rc": 1, "start": "2018-02-28 02:45:48.155150", "stderr": "+ case \"$KV_TYPE\" in\n+ source /config.static.sh\n++ set -e\n++ to_lowercase OSD_CEPH_DISK_PREPARE\n++ echo osd_ceph_disk_prepare\n+ CEPH_DAEMON=osd_ceph_disk_prepare\n+ create_mandatory_directories\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-osd\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-mds\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-rgw\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mon\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/osd\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mds\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/radosgw\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/tmp\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mgr\n+ mkdir -p /var/lib/ceph/mon/ceph-overcloud-cephstorage-2\n+ mkdir -p /var/run/ceph\n+ mkdir -p /var/lib/ceph/radosgw/overcloud-cephstorage-2\n+ mkdir -p /var/lib/ceph/mds/ceph-mds-overcloud-cephstorage-2\n+ mkdir -p /var/lib/ceph/mgr/ceph-\n+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rbd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp\n+ case \"$CEPH_DAEMON\" in\n+ source start_osd.sh\n++ set -e\n++ is_redhat\n++ get_package_manager\n++ is_available rpm\n++ command -v rpm\n++ OS_VENDOR=redhat\n++ [[ redhat == \\r\\e\\d\\h\\a\\t ]]\n++ source /etc/sysconfig/ceph\n+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728\n+++ CEPH_AUTO_RESTART_ON_UPGRADE=no\n+ OSD_TYPE=prepare\n+ start_osd\n+ get_config\n+ log 'static: does not generate config'\n+ '[' -z 'static: does not generate config' ']'\n++ date '+%F %T'\n+ TIMESTAMP='2018-02-28 02:45:48'\n+ echo '2018-02-28 02:45:48 /entrypoint.sh: static: does not generate config'\n+ return 0\n+ check_config\n+ [[ ! -e /etc/ceph/ceph.conf ]]\n+ '[' 0 -eq 1 ']'\n+ case \"$OSD_TYPE\" in\n+ source osd_disk_prepare.sh\n++ set -e\n+ osd_disk_prepare\n+ [[ -z /dev/sdb ]]\n+ [[ ! -e /dev/sdb ]]\n+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'\n+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health\n+ parted --script /dev/sdb print\n++ parted --script /dev/sdb print\n++ egrep '^ 1.*ceph data'\n+ [[ -n '' ]]\n+ [[ 0 -eq 1 ]]\n+ [[ 0 -eq 1 ]]\n+ ceph-disk -v prepare --cluster ceph --journal-uuid 26726772-a4e4-48ac-bce2-b6fc1bb6529c /dev/sdb /dev/nvme0n1\ncommand: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid\ncommand: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\ncommand: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\ncommand: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\ncommand: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\ncommand: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type\ncommand: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs\ncommand: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nprepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nptype_tobe_for_name: name = journal\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\ncommand: Running command: /usr/sbin/parted --machine -- /dev/nvme0n1 print\nget_free_partition_index: get_free_partition_index: analyzing \r \rBYT;\n/dev/nvme0n1:800GB:unknown:512:512:unknown:Unknown:;\n\ncreate_partition: Creating journal partition num 1 size 5120 on /dev/nvme0n1\ncommand_check_call: Running command: /usr/sbin/sgdisk --new=1:0:+5120M --change-name=1:ceph journal --partition-guid=1:26726772-a4e4-48ac-bce2-b6fc1bb6529c --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/nvme0n1\nupdate_partition: Calling partprobe on created device /dev/nvme0n1\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\ncommand: Running command: /usr/bin/flock -s /dev/nvme0n1 /usr/sbin/partprobe /dev/nvme0n1\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/nvme0n1p1 uuid path is /sys/dev/block/259:3/dm/uuid\nprepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c\nprepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nset_data_partition: Creating osd partition on /dev/sdb\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nptype_tobe_for_name: name = data\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\ncreate_partition: Creating data partition num 1 size 0 on /dev/sdb\ncommand_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6d2a74b6-041c-4a5c-b9da-6b11092c25aa --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb\nupdate_partition: Calling partprobe on created device /dev/sdb\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\ncommand: Running command: /usr/bin/flock -s /dev/sdb /usr/sbin/partprobe /dev/sdb\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid\npopulate_data_path_device: Creating xfs fs on /dev/sdb1\ncommand_check_call: Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -f -- /dev/sdb1\nmkfs.xfs: cannot open /dev/sdb1: Device or resource busy\nTraceback (most recent call last):\n File \"/usr/sbin/ceph-disk\", line 9, in <module>\n load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5343, in run\n main(sys.argv[1:])\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5294, in main\n args.func(args)\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1896, in main\n Prepare.factory(args).prepare()\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1885, in prepare\n self.prepare_locked()\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1916, in prepare_locked\n self.data.prepare(self.journal)\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2583, in prepare\n self.prepare_device(*to_prepare_list)\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2747, in prepare_device\n self.populate_data_path_device(*to_prepare_list)\n File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2702, in populate_data_path_device\n raise Error(e)\nceph_disk.main.Error: Error: Command '['/usr/sbin/mkfs', '-t', u'xfs', u'-f', u'-i', u'size=2048', '-f', '--', '/dev/sdb1']' returned non-zero exit status 1", "stderr_lines": ["+ case \"$KV_TYPE\" in", "+ source /config.static.sh", "++ set -e", "++ to_lowercase OSD_CEPH_DISK_PREPARE", "++ echo osd_ceph_disk_prepare", "+ CEPH_DAEMON=osd_ceph_disk_prepare", "+ create_mandatory_directories", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-osd", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-mds", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-rgw", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mon", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/osd", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mds", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/radosgw", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/tmp", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mgr", "+ mkdir -p /var/lib/ceph/mon/ceph-overcloud-cephstorage-2", "+ mkdir -p /var/run/ceph", "+ mkdir -p /var/lib/ceph/radosgw/overcloud-cephstorage-2", "+ mkdir -p /var/lib/ceph/mds/ceph-mds-overcloud-cephstorage-2", "+ mkdir -p /var/lib/ceph/mgr/ceph-", "+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rbd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp", "+ case \"$CEPH_DAEMON\" in", "+ source start_osd.sh", "++ set -e", "++ is_redhat", "++ get_package_manager", "++ is_available rpm", "++ command -v rpm", "++ OS_VENDOR=redhat", "++ [[ redhat == \\r\\e\\d\\h\\a\\t ]]", "++ source /etc/sysconfig/ceph", "+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728", "+++ CEPH_AUTO_RESTART_ON_UPGRADE=no", "+ OSD_TYPE=prepare", "+ start_osd", "+ get_config", "+ log 'static: does not generate config'", "+ '[' -z 'static: does not generate config' ']'", "++ date '+%F %T'", "+ TIMESTAMP='2018-02-28 02:45:48'", "+ echo '2018-02-28 02:45:48 /entrypoint.sh: static: does not generate config'", "+ return 0", "+ check_config", "+ [[ ! -e /etc/ceph/ceph.conf ]]", "+ '[' 0 -eq 1 ']'", "+ case \"$OSD_TYPE\" in", "+ source osd_disk_prepare.sh", "++ set -e", "+ osd_disk_prepare", "+ [[ -z /dev/sdb ]]", "+ [[ ! -e /dev/sdb ]]", "+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'", "+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health", "+ parted --script /dev/sdb print", "++ parted --script /dev/sdb print", "++ egrep '^ 1.*ceph data'", "+ [[ -n '' ]]", "+ [[ 0 -eq 1 ]]", "+ [[ 0 -eq 1 ]]", "+ ceph-disk -v prepare --cluster ceph --journal-uuid 26726772-a4e4-48ac-bce2-b6fc1bb6529c /dev/sdb /dev/nvme0n1", "command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid", "command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph", "command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph", "command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type", "command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs", "command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "ptype_tobe_for_name: name = journal", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "command: Running command: /usr/sbin/parted --machine -- /dev/nvme0n1 print", "get_free_partition_index: get_free_partition_index: analyzing ", " ", "BYT;", "/dev/nvme0n1:800GB:unknown:512:512:unknown:Unknown:;", "", "create_partition: Creating journal partition num 1 size 5120 on /dev/nvme0n1", "command_check_call: Running command: /usr/sbin/sgdisk --new=1:0:+5120M --change-name=1:ceph journal --partition-guid=1:26726772-a4e4-48ac-bce2-b6fc1bb6529c --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/nvme0n1", "update_partition: Calling partprobe on created device /dev/nvme0n1", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "command: Running command: /usr/bin/flock -s /dev/nvme0n1 /usr/sbin/partprobe /dev/nvme0n1", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/nvme0n1p1 uuid path is /sys/dev/block/259:3/dm/uuid", "prepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c", "prepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "set_data_partition: Creating osd partition on /dev/sdb", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "ptype_tobe_for_name: name = data", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "create_partition: Creating data partition num 1 size 0 on /dev/sdb", "command_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6d2a74b6-041c-4a5c-b9da-6b11092c25aa --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb", "update_partition: Calling partprobe on created device /dev/sdb", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "command: Running command: /usr/bin/flock -s /dev/sdb /usr/sbin/partprobe /dev/sdb", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid", "populate_data_path_device: Creating xfs fs on /dev/sdb1", "command_check_call: Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -f -- /dev/sdb1", "mkfs.xfs: cannot open /dev/sdb1: Device or resource busy", "Traceback (most recent call last):", " File \"/usr/sbin/ceph-disk\", line 9, in <module>", " load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5343, in run", " main(sys.argv[1:])", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5294, in main", " args.func(args)", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1896, in main", " Prepare.factory(args).prepare()", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1885, in prepare", " self.prepare_locked()", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1916, in prepare_locked", " self.data.prepare(self.journal)", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2583, in prepare", " self.prepare_device(*to_prepare_list)", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2747, in prepare_device", " self.populate_data_path_device(*to_prepare_list)", " File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2702, in populate_data_path_device", " raise Error(e)", "ceph_disk.main.Error: Error: Command '['/usr/sbin/mkfs', '-t', u'xfs', u'-f', u'-i', u'size=2048', '-f', '--', '/dev/sdb1']' returned non-zero exit status 1"], "stdout": "VERBOSE: activating bash debugging mode.\n2018-02-28 02:45:48 /entrypoint.sh: static: does not generate config\nHEALTH_ERR 20544 pgs are stuck inactive for more than 300 seconds; 20544 pgs stuck inactive; 20544 pgs stuck unclean; no osds\nCreating new GPT entries.\nThe operation has completed successfully.\nThe operation has completed successfully.", "stdout_lines": ["VERBOSE: activating bash debugging mode.", "2018-02-28 02:45:48 /entrypoint.sh: static: does not generate config", "HEALTH_ERR 20544 pgs are stuck inactive for more than 300 seconds; 20544 pgs stuck inactive; 20544 pgs stuck unclean; no osds", "Creating new GPT entries.", "The operation has completed successfully.", "The operation has completed successfully."]}
@Giulio - Another point I failed to make, is that this exact configuration worked fine when deployed with : parameter_defaults: DnsServers: ["10.16.36.29","10.11.5.19"] ControllerCount: 3 CephStorageCount: 3 R620ComputeCount: 1 R630ComputeCount: 0 6018RComputeCount: 1 R930ComputeCount: 1 1029pComputeCount: 0 1029uComputeCount: 1 1028rComputeCount: 1 R730ComputeCount: 1 ComputeCount: 0
(In reply to Joe Talerico from comment #6) > @Giulio - Another point I failed to make, is that this exact configuration > worked fine when deployed with : > > parameter_defaults: > DnsServers: ["10.16.36.29","10.11.5.19"] > > ControllerCount: 3 > CephStorageCount: 3 > R620ComputeCount: 1 > R630ComputeCount: 0 > 6018RComputeCount: 1 > R930ComputeCount: 1 > 1029pComputeCount: 0 > 1029uComputeCount: 1 > 1028rComputeCount: 1 > R730ComputeCount: 1 > ComputeCount: 0 Thanks Joe, good point. I think this might have been fixed by: https://github.com/openstack/tripleo-heat-templates/commit/7b762a6a0c6a2931d3a11eecaad246a16f66f4e0 Can you check if you have that change in the templates and try applying it manually in case you did not?
This has seemed to help Giulio! 2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE state changed 2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed 2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.R620ComputePostConfig]: CREATE_COMPLETE state changed 2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.6018rComputePostConfig]: CREATE_COMPLETE state changed 2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully 2018-02-28 21:14:57Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed 2018-02-28 21:14:57Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud CREATE_COMPLETE Overcloud Endpoint: http://172.168.30.19:5000/v2.0 Overcloud Deployed real 165m31.403s user 0m8.251s sys 0m0.932s Wed Feb 28 21:15:21 UTC 2018
verified on openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086