1549775 – ceph-ansible is failing to communicate with compute nodes, causing failures

Bug 1549775 - ceph-ansible is failing to communicate with compute nodes, causing failures

Summary: ceph-ansible is failing to communicate with compute nodes, causing failures

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	All
OS:	All
Priority:	urgent
Severity:	urgent
Target Milestone:	beta
Target Release:	13.0 (Queens)
Assignee:	Martin André
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:	scale_lab
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-27 19:51 UTC by Joe Talerico
Modified:	2018-06-27 13:46 UTC (History)
CC List:	6 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-8.0.0-0.20180227121938.e0f59ee.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-27 13:45:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
mistral ceph-ansible log (91 bytes, text/plain) 2018-02-28 12:36 UTC, Joe Talerico	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	544957	0	None	None	None	2018-03-01 08:59:12 UTC
Red Hat Product Errata	RHEA-2018:2086	0	None	None	None	2018-06-27 13:46:28 UTC

Description Joe Talerico 2018-02-27 19:51:35 UTC

Description of problem:
We are not able to deploy :
parameter_defaults:
  ControllerCount: 3
  CephStorageCount: 3
  R620ComputeCount:  77
  R630ComputeCount: 1 
  6018RComputeCount: 1
  R930ComputeCount: 1
  1029pComputeCount: 0
  1029uComputeCount: 1
  1028rComputeCount: 1
  R730ComputeCount: 1
  ComputeCount: 0

ceph-ansible has failed the past 2 times I tried to deploy this. It fails with :

[root@b04-h01-1029p stack]# tail -200 /var/log/mistral/ceph-install-workflow.log | grep unreachable=1
2018-02-26 20:02:35,753 p=257284 u=mistral |  192.168.25.103             : ok=0    changed=0    unreachable=1    failed=0   
2018-02-26 20:02:35,754 p=257284 u=mistral |  192.168.25.123             : ok=0    changed=0    unreachable=1    failed=0   
2018-02-26 20:02:35,754 p=257284 u=mistral |  192.168.25.130             : ok=0    changed=0    unreachable=1    failed=0   
2018-02-26 20:02:35,756 p=257284 u=mistral |  192.168.25.69              : ok=0    changed=0    unreachable=1    failed=0 

And, the second time :
[root@b04-h01-1029p stack]# tail -200 /var/log/mistral/ceph-install-workflow.log | grep unreachable=1
2018-02-27 02:32:03,854 p=55900 u=mistral |  192.168.25.79              : ok=0    changed=0    unreachable=1    failed=0   
2018-02-27 02:32:03,854 p=55900 u=mistral |  192.168.25.87              : ok=0    changed=0    unreachable=1    failed=0 

However, each of these nodes were reachable via the heat-admin.

Comment 1 Joe Talerico 2018-02-27 23:58:01 UTC

Latest attempt ended in a timeout :
2018-02-27 20:48:12Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_IN_PROGRESS  state changed
2018-02-27 20:48:14Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_COMPLETE  state changed
2018-02-27 20:48:14Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS  state changed
2018-02-27 23:31:03Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  CREATE aborted
2018-02-27 23:31:03Z [overcloud]: CREATE_FAILED  Create timed out

 Stack overcloud CREATE_FAILED 

overcloud.AllNodesDeploySteps:
  resource_type: OS::TripleO::PostDeploySteps
  physical_resource_id: 32ddd927-e685-40b7-ae87-583a40744e12
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
Heat Stack create failed.
Heat Stack create failed.

real    243m5.161s
user    0m10.601s
sys     0m0.947s
Tue Feb 27 23:31:23 UTC 2018

Most of the time is spent on ceph-ansible. First I will increase the timeout, if that doesn't help I am going to set fork_count to 100 to see if that helps with the concurrency...

Comment 2 Joe Talerico 2018-02-27 23:59:56 UTC

For the sake of contrast, here is a successful deployment of the same amount of nodes, just minus the 3 ceph nodes.

2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps.1029uComputePostConfig]: CREATE_COMPLETE  state changed
2018-02Host 172.168.30.17 not found in /home/stack/.ssh/known_hosts
-27 17:16:16Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE  state changed
2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps.1028rComputePostConfig]: CREATE_COMPLETE  state changed
2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE  state changed
2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  Stack CREATE completed successfully
2018-02-27 17:16:16Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  state changed
2018-02-27 17:16:17Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud CREATE_COMPLETE 

Overcloud Endpoint: http://172.168.30.17:5000/v2.0
Overcloud Deployed

real    96m31.399s
user    0m6.422s
sys     0m0.723s
Tue Feb 27 17:16:32 UTC 2018

Comment 3 Giulio Fidente 2018-02-28 11:11:44 UTC

Joe, can you attach the full version of /var/log/mistral/ceph-install-workflow.log ?

Comment 4 Joe Talerico 2018-02-28 12:36:34 UTC

Created attachment 1401854 [details]
mistral ceph-ansible log

Comment 5 Joe Talerico 2018-02-28 12:40:35 UTC

@Giulio - So the latest failure seems to be due to :
2018-02-28 02:45:53,509 p=31523 u=mistral |  failed: [192.168.25.67] (item=[{'_ansible_parsed': True, 'stderr_lines': [u'Error: /dev/sdb: unrecognised disk label'], u'cmd': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'end': u'2018-02-28 02:38:23.104123', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-02-28 02:38:23.078030', u'delta': u'0:00:00.026093', 'item': u'/dev/sdb', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'Error: /dev/sdb: unrecognised disk label', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdb', u'/dev/nvme0n1']) => {"changed": true, "cmd": "docker run --net=host --pid=host --privileged=true --name=ceph-osd-prepare-overcloud-cephstorage-2-sdb -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -e DEBUG=verbose -e CLUSTER=ceph -e CEPH_DAEMON=OSD_CEPH_DISK_PREPARE -e OSD_DEVICE=/dev/sdb -e OSD_JOURNAL=/dev/nvme0n1 -e OSD_BLUESTORE=0 -e OSD_FILESTORE=1 -e OSD_DMCRYPT=0 -e OSD_JOURNAL_SIZE=5120 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest", "delta": "0:00:05.313413", "end": "2018-02-28 02:45:53.468563", "item": [{"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.026093", "end": "2018-02-28 02:38:23.104123", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": "/dev/sdb", "msg": "non-zero return code", "rc": 1, "start": "2018-02-28 02:38:23.078030", "stderr": "Error: /dev/sdb: unrecognised disk label", "stderr_lines": ["Error: /dev/sdb: unrecognised disk label"], "stdout": "", "stdout_lines": []}, "/dev/sdb", "/dev/nvme0n1"], "msg": "non-zero return code", "rc": 1, "start": "2018-02-28 02:45:48.155150", "stderr": "+ case \"$KV_TYPE\" in\n+ source /config.static.sh\n++ set -e\n++ to_lowercase OSD_CEPH_DISK_PREPARE\n++ echo osd_ceph_disk_prepare\n+ CEPH_DAEMON=osd_ceph_disk_prepare\n+ create_mandatory_directories\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-osd\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-mds\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-rgw\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mon\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/osd\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mds\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/radosgw\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/tmp\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mgr\n+ mkdir -p /var/lib/ceph/mon/ceph-overcloud-cephstorage-2\n+ mkdir -p /var/run/ceph\n+ mkdir -p /var/lib/ceph/radosgw/overcloud-cephstorage-2\n+ mkdir -p /var/lib/ceph/mds/ceph-mds-overcloud-cephstorage-2\n+ mkdir -p /var/lib/ceph/mgr/ceph-\n+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rbd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp\n+ case \"$CEPH_DAEMON\" in\n+ source start_osd.sh\n++ set -e\n++ is_redhat\n++ get_package_manager\n++ is_available rpm\n++ command -v rpm\n++ OS_VENDOR=redhat\n++ [[ redhat == \\r\\e\\d\\h\\a\\t ]]\n++ source /etc/sysconfig/ceph\n+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728\n+++ CEPH_AUTO_RESTART_ON_UPGRADE=no\n+ OSD_TYPE=prepare\n+ start_osd\n+ get_config\n+ log 'static: does not generate config'\n+ '[' -z 'static: does not generate config' ']'\n++ date '+%F %T'\n+ TIMESTAMP='2018-02-28 02:45:48'\n+ echo '2018-02-28 02:45:48  /entrypoint.sh: static: does not generate config'\n+ return 0\n+ check_config\n+ [[ ! -e /etc/ceph/ceph.conf ]]\n+ '[' 0 -eq 1 ']'\n+ case \"$OSD_TYPE\" in\n+ source osd_disk_prepare.sh\n++ set -e\n+ osd_disk_prepare\n+ [[ -z /dev/sdb ]]\n+ [[ ! -e /dev/sdb ]]\n+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'\n+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health\n+ parted --script /dev/sdb print\n++ parted --script /dev/sdb print\n++ egrep '^ 1.*ceph data'\n+ [[ -n '' ]]\n+ [[ 0 -eq 1 ]]\n+ [[ 0 -eq 1 ]]\n+ ceph-disk -v prepare --cluster ceph --journal-uuid 26726772-a4e4-48ac-bce2-b6fc1bb6529c /dev/sdb /dev/nvme0n1\ncommand: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid\ncommand: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\ncommand: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\ncommand: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\ncommand: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\ncommand: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type\ncommand: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs\ncommand: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nprepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nptype_tobe_for_name: name = journal\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\ncommand: Running command: /usr/sbin/parted --machine -- /dev/nvme0n1 print\nget_free_partition_index: get_free_partition_index: analyzing \r                                                                          \rBYT;\n/dev/nvme0n1:800GB:unknown:512:512:unknown:Unknown:;\n\ncreate_partition: Creating journal partition num 1 size 5120 on /dev/nvme0n1\ncommand_check_call: Running command: /usr/sbin/sgdisk --new=1:0:+5120M --change-name=1:ceph journal --partition-guid=1:26726772-a4e4-48ac-bce2-b6fc1bb6529c --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/nvme0n1\nupdate_partition: Calling partprobe on created device /dev/nvme0n1\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\ncommand: Running command: /usr/bin/flock -s /dev/nvme0n1 /usr/sbin/partprobe /dev/nvme0n1\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/nvme0n1p1 uuid path is /sys/dev/block/259:3/dm/uuid\nprepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c\nprepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nset_data_partition: Creating osd partition on /dev/sdb\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nptype_tobe_for_name: name = data\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\ncreate_partition: Creating data partition num 1 size 0 on /dev/sdb\ncommand_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6d2a74b6-041c-4a5c-b9da-6b11092c25aa --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb\nupdate_partition: Calling partprobe on created device /dev/sdb\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\ncommand: Running command: /usr/bin/flock -s /dev/sdb /usr/sbin/partprobe /dev/sdb\ncommand_check_call: Running command: /usr/bin/udevadm settle --timeout=600\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid\nget_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid\npopulate_data_path_device: Creating xfs fs on /dev/sdb1\ncommand_check_call: Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -f -- /dev/sdb1\nmkfs.xfs: cannot open /dev/sdb1: Device or resource busy\nTraceback (most recent call last):\n  File \"/usr/sbin/ceph-disk\", line 9, in <module>\n    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5343, in run\n    main(sys.argv[1:])\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5294, in main\n    args.func(args)\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1896, in main\n    Prepare.factory(args).prepare()\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1885, in prepare\n    self.prepare_locked()\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1916, in prepare_locked\n    self.data.prepare(self.journal)\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2583, in prepare\n    self.prepare_device(*to_prepare_list)\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2747, in prepare_device\n    self.populate_data_path_device(*to_prepare_list)\n  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2702, in populate_data_path_device\n    raise Error(e)\nceph_disk.main.Error: Error: Command '['/usr/sbin/mkfs', '-t', u'xfs', u'-f', u'-i', u'size=2048', '-f', '--', '/dev/sdb1']' returned non-zero exit status 1", "stderr_lines": ["+ case \"$KV_TYPE\" in", "+ source /config.static.sh", "++ set -e", "++ to_lowercase OSD_CEPH_DISK_PREPARE", "++ echo osd_ceph_disk_prepare", "+ CEPH_DAEMON=osd_ceph_disk_prepare", "+ create_mandatory_directories", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-osd", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-mds", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-rgw", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mon", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/osd", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mds", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/radosgw", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/tmp", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mgr", "+ mkdir -p /var/lib/ceph/mon/ceph-overcloud-cephstorage-2", "+ mkdir -p /var/run/ceph", "+ mkdir -p /var/lib/ceph/radosgw/overcloud-cephstorage-2", "+ mkdir -p /var/lib/ceph/mds/ceph-mds-overcloud-cephstorage-2", "+ mkdir -p /var/lib/ceph/mgr/ceph-", "+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rbd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp", "+ case \"$CEPH_DAEMON\" in", "+ source start_osd.sh", "++ set -e", "++ is_redhat", "++ get_package_manager", "++ is_available rpm", "++ command -v rpm", "++ OS_VENDOR=redhat", "++ [[ redhat == \\r\\e\\d\\h\\a\\t ]]", "++ source /etc/sysconfig/ceph", "+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728", "+++ CEPH_AUTO_RESTART_ON_UPGRADE=no", "+ OSD_TYPE=prepare", "+ start_osd", "+ get_config", "+ log 'static: does not generate config'", "+ '[' -z 'static: does not generate config' ']'", "++ date '+%F %T'", "+ TIMESTAMP='2018-02-28 02:45:48'", "+ echo '2018-02-28 02:45:48  /entrypoint.sh: static: does not generate config'", "+ return 0", "+ check_config", "+ [[ ! -e /etc/ceph/ceph.conf ]]", "+ '[' 0 -eq 1 ']'", "+ case \"$OSD_TYPE\" in", "+ source osd_disk_prepare.sh", "++ set -e", "+ osd_disk_prepare", "+ [[ -z /dev/sdb ]]", "+ [[ ! -e /dev/sdb ]]", "+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'", "+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health", "+ parted --script /dev/sdb print", "++ parted --script /dev/sdb print", "++ egrep '^ 1.*ceph data'", "+ [[ -n '' ]]", "+ [[ 0 -eq 1 ]]", "+ [[ 0 -eq 1 ]]", "+ ceph-disk -v prepare --cluster ceph --journal-uuid 26726772-a4e4-48ac-bce2-b6fc1bb6529c /dev/sdb /dev/nvme0n1", "command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid", "command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph", "command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph", "command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --log-file $run_dir/$cluster-osd-check.log --cluster ceph --setuser ceph --setgroup ceph", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type", "command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs", "command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "ptype_tobe_for_name: name = journal", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "command: Running command: /usr/sbin/parted --machine -- /dev/nvme0n1 print", "get_free_partition_index: get_free_partition_index: analyzing ", "                                                                          ", "BYT;", "/dev/nvme0n1:800GB:unknown:512:512:unknown:Unknown:;", "", "create_partition: Creating journal partition num 1 size 5120 on /dev/nvme0n1", "command_check_call: Running command: /usr/sbin/sgdisk --new=1:0:+5120M --change-name=1:ceph journal --partition-guid=1:26726772-a4e4-48ac-bce2-b6fc1bb6529c --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/nvme0n1", "update_partition: Calling partprobe on created device /dev/nvme0n1", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "command: Running command: /usr/bin/flock -s /dev/nvme0n1 /usr/sbin/partprobe /dev/nvme0n1", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/nvme0n1 uuid path is /sys/dev/block/259:0/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/nvme0n1p1 uuid path is /sys/dev/block/259:3/dm/uuid", "prepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c", "prepare_device: Journal is GPT partition /dev/disk/by-partuuid/26726772-a4e4-48ac-bce2-b6fc1bb6529c", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "set_data_partition: Creating osd partition on /dev/sdb", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "ptype_tobe_for_name: name = data", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "create_partition: Creating data partition num 1 size 0 on /dev/sdb", "command_check_call: Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:6d2a74b6-041c-4a5c-b9da-6b11092c25aa --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdb", "update_partition: Calling partprobe on created device /dev/sdb", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "command: Running command: /usr/bin/flock -s /dev/sdb /usr/sbin/partprobe /dev/sdb", "command_check_call: Running command: /usr/bin/udevadm settle --timeout=600", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid", "get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid", "populate_data_path_device: Creating xfs fs on /dev/sdb1", "command_check_call: Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -f -- /dev/sdb1", "mkfs.xfs: cannot open /dev/sdb1: Device or resource busy", "Traceback (most recent call last):", "  File \"/usr/sbin/ceph-disk\", line 9, in <module>", "    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5343, in run", "    main(sys.argv[1:])", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 5294, in main", "    args.func(args)", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1896, in main", "    Prepare.factory(args).prepare()", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1885, in prepare", "    self.prepare_locked()", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 1916, in prepare_locked", "    self.data.prepare(self.journal)", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2583, in prepare", "    self.prepare_device(*to_prepare_list)", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2747, in prepare_device", "    self.populate_data_path_device(*to_prepare_list)", "  File \"/usr/lib/python2.7/site-packages/ceph_disk/main.py\", line 2702, in populate_data_path_device", "    raise Error(e)", "ceph_disk.main.Error: Error: Command '['/usr/sbin/mkfs', '-t', u'xfs', u'-f', u'-i', u'size=2048', '-f', '--', '/dev/sdb1']' returned non-zero exit status 1"], "stdout": "VERBOSE: activating bash debugging mode.\n2018-02-28 02:45:48  /entrypoint.sh: static: does not generate config\nHEALTH_ERR 20544 pgs are stuck inactive for more than 300 seconds; 20544 pgs stuck inactive; 20544 pgs stuck unclean; no osds\nCreating new GPT entries.\nThe operation has completed successfully.\nThe operation has completed successfully.", "stdout_lines": ["VERBOSE: activating bash debugging mode.", "2018-02-28 02:45:48  /entrypoint.sh: static: does not generate config", "HEALTH_ERR 20544 pgs are stuck inactive for more than 300 seconds; 20544 pgs stuck inactive; 20544 pgs stuck unclean; no osds", "Creating new GPT entries.", "The operation has completed successfully.", "The operation has completed successfully."]}

Comment 6 Joe Talerico 2018-02-28 12:44:59 UTC

@Giulio - Another point I failed to make, is that this exact configuration worked fine when deployed with :

parameter_defaults:
  DnsServers: ["10.16.36.29","10.11.5.19"]

  ControllerCount: 3
  CephStorageCount: 3
  R620ComputeCount:  1
  R630ComputeCount: 0
  6018RComputeCount: 1
  R930ComputeCount: 1
  1029pComputeCount: 0
  1029uComputeCount: 1
  1028rComputeCount: 1
  R730ComputeCount: 1
  ComputeCount: 0

Comment 7 Giulio Fidente 2018-02-28 16:33:12 UTC

(In reply to Joe Talerico from comment #6)
> @Giulio - Another point I failed to make, is that this exact configuration
> worked fine when deployed with :
> 
> parameter_defaults:
>   DnsServers: ["10.16.36.29","10.11.5.19"]
> 
>   ControllerCount: 3
>   CephStorageCount: 3
>   R620ComputeCount:  1
>   R630ComputeCount: 0
>   6018RComputeCount: 1
>   R930ComputeCount: 1
>   1029pComputeCount: 0
>   1029uComputeCount: 1
>   1028rComputeCount: 1
>   R730ComputeCount: 1
>   ComputeCount: 0

Thanks Joe, good point.

I think this might have been fixed by: https://github.com/openstack/tripleo-heat-templates/commit/7b762a6a0c6a2931d3a11eecaad246a16f66f4e0

Can you check if you have that change in the templates and try applying it manually in case you did not?

Comment 8 Joe Talerico 2018-02-28 21:40:48 UTC

This has seemed to help Giulio!


2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.ObjectStoragePostConfig]: CREATE_COMPLETE  state changed
2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE  state changed
2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.R620ComputePostConfig]: CREATE_COMPLETE  state changed
2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps.6018rComputePostConfig]: CREATE_COMPLETE  state changed
2018-02-28 21:14:56Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  Stack CREATE completed successfully
2018-02-28 21:14:57Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  state changed
2018-02-28 21:14:57Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud CREATE_COMPLETE 

Overcloud Endpoint: http://172.168.30.19:5000/v2.0
Overcloud Deployed

real    165m31.403s
user    0m8.251s
sys     0m0.932s
Wed Feb 28 21:15:21 UTC 2018

Comment 11 Yogev Rabl 2018-04-12 16:21:30 UTC

verified on openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost.noarch

Comment 13 errata-xmlrpc 2018-06-27 13:45:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.