Deployment w/ ceph-container 2.4-4 (downstream jewel targetted for osp12 release) using 3 Mons + 3 ceph-storage with 12 HDDs + 3 SSDs as journals during task "ceph-osd : prepare ceph "filestore" containerized osd disk(s) non-collocated" if using ceph-ansible 3.0.12 or 3.0.13 but with 3.0.11 this error is not seen. Reproduced 3 times and verified once in a virtual environment by someone else too. 2017-11-17 08:41:50,142 p=17254 u=mistral | failed: [192.168.1.21] (item=[{'_ansible_parsed': True, 'stderr_lines': [u'Error: /dev/sdl: unrecognised disk label'], '_ansible_item_result': True, u'end': u'2017-11-17 13:41:37.687371', '_ansible_no_log': False, u'stdout': u'', u'cmd': u"parted --script /dev/sdl print | egrep -sq '^ 1.*ceph'", u'msg': u'non-zero return code', u'rc': 1, u'start': u'2017-11-17 13:41:37.681665', u'delta': u'0:00:00.005706', 'item': u'/dev/sdl', u'changed': False, u'invocation': {u'module_args': {u'creates': None, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdl print | egrep -sq '^ 1.*ceph'", u'removes': None, u'warn': True, u'chdir': None, u'stdin': None}}, 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'Error: /dev/sdl: unrecognised disk label', u'failed': False}, u'/dev/sdl', u'/dev/sdo']) => {"changed": true, "cmd": "docker run --net=host --pid=host --privileged=true --name=ceph-osd-prepare-overcloud-computehci-1-sdl -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -e DEBUG=verbose -e CLUSTER=ceph -e CEPH_DAEMON=OSD_CEPH_DISK_PREPARE -e OSD_DEVICE=/dev/sdl -e OSD_JOURNAL=/dev/sdo -e OSD_BLUESTORE=0 -e OSD_FILESTORE=1 -e OSD_DMCRYPT=0 -e OSD_JOURNAL_SIZE=5120 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:latest", "delta": "0:00:00.486229", "end": "2017-11-17 13:41:50.118883", "failed": true, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdl print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.005706", "end": "2017-11-17 13:41:37.687371", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdl print | egrep -sq '^ 1.*ceph'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": "/dev/sdl", "msg": "non-zero return code", "rc": 1, "start": "2017-11-17 13:41:37.681665", "stderr": "Error: /dev/sdl: unrecognised disk label", "stderr_lines": ["Error: /dev/sdl: unrecognised disk label"], "stdout": "", "stdout_lines": []}, "/dev/sdl", "/dev/sdo"], "msg": "non-zero return code", "rc": 1, "start": "2017-11-17 13:41:49.632654", "stderr": "+ case \"$KV_TYPE\" in\n+ source /config.static.sh\n++ set -e\n++ to_lowercase OSD_CEPH_DISK_PREPARE\n++ echo osd_ceph_disk_prepare\n+ CEPH_DAEMON=osd_ceph_disk_prepare\n+ create_mandatory_directories\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-osd\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-mds\n+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'\n++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring\n+ mkdir -p /var/lib/ceph/bootstrap-rgw\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mon\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/osd\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mds\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/radosgw\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/tmp\n+ for directory in mon osd mds radosgw tmp mgr\n+ mkdir -p /var/lib/ceph/mgr\n+ mkdir -p /var/lib/ceph/mon/ceph-overcloud-computehci-1\n+ mkdir -p /var/run/ceph\n+ mkdir -p /var/lib/ceph/radosgw/overcloud-computehci-1\n+ mkdir -p /var/lib/ceph/mds/ceph-mds-overcloud-computehci-1\n+ mkdir -p /var/lib/ceph/mgr/ceph-\n+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rbd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp\n+ case \"$CEPH_DAEMON\" in\n+ source start_osd.sh\n++ set -e\n++ is_redhat\n++ get_package_manager\n++ is_available rpm\n++ command -v rpm\n++ OS_VENDOR=redhat\n++ [[ redhat == \\r\\e\\d\\h\\a\\t ]]\n++ source /etc/sysconfig/ceph\n+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728\n+++ CEPH_AUTO_RESTART_ON_UPGRADE=no\n+ OSD_TYPE=prepare\n+ start_osd\n+ get_config\n+ log 'static: does not generate config'\n+ '[' -z 'static: does not generate config' ']'\n++ date '+%F %T'\n+ TIMESTAMP='2017-11-17 13:41:49'\n+ echo '2017-11-17 13:41:49 /entrypoint.sh: static: does not generate config'\n+ return 0\n+ check_config\n+ [[ ! -e /etc/ceph/ceph.conf ]]\n+ '[' 0 -eq 1 ']'\n+ case \"$OSD_TYPE\" in\n+ source osd_disk_prepare.sh\n++ set -e\n+ osd_disk_prepare\n+ [[ -z /dev/sdl ]]\n+ [[ ! -e /dev/sdl ]]\n+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'\n+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health\n+ parted --script /dev/sdl print\n+ [[ 0 -eq 1 ]]\n+ log 'Regarding parted, device /dev/sdl is inconsistent/broken/weird.'\n+ '[' -z 'Regarding parted, device /dev/sdl is inconsistent/broken/weird.' ']'\n++ date '+%F %T'\n+ TIMESTAMP='2017-11-17 13:41:50'\n+ echo '2017-11-17 13:41:50 /entrypoint.sh: Regarding parted, device /dev/sdl is inconsistent/broken/weird.'\n+ return 0\n+ log 'It would be too dangerous to destroy it without any notification.'\n+ '[' -z 'It would be too dangerous to destroy it without any notification.' ']'\n++ date '+%F %T'\n+ TIMESTAMP='2017-11-17 13:41:50'\n+ echo '2017-11-17 13:41:50 /entrypoint.sh: It would be too dangerous to destroy it without any notification.'\n+ return 0\n+ log 'Please set OSD_FORCE_ZAP to '\\''1'\\'' if you really want to zap this disk.'\n+ '[' -z 'Please set OSD_FORCE_ZAP to '\\''1'\\'' if you really want to zap this disk.' ']'\n++ date '+%F %T'\n+ TIMESTAMP='2017-11-17 13:41:50'\n+ echo '2017-11-17 13:41:50 /entrypoint.sh: Please set OSD_FORCE_ZAP to '\\''1'\\'' if you really want to zap this disk.'\n+ return 0\n+ exit 1", "stderr_lines": ["+ case \"$KV_TYPE\" in", "+ source /config.static.sh", "++ set -e", "++ to_lowercase OSD_CEPH_DISK_PREPARE", "++ echo osd_ceph_disk_prepare", "+ CEPH_DAEMON=osd_ceph_disk_prepare", "+ create_mandatory_directories", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-osd/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-osd", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-mds/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-mds", "+ for keyring in '$OSD_BOOTSTRAP_KEYRING' '$MDS_BOOTSTRAP_KEYRING' '$RGW_BOOTSTRAP_KEYRING'", "++ dirname /var/lib/ceph/bootstrap-rgw/ceph.keyring", "+ mkdir -p /var/lib/ceph/bootstrap-rgw", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mon", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/osd", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mds", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/radosgw", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/tmp", "+ for directory in mon osd mds radosgw tmp mgr", "+ mkdir -p /var/lib/ceph/mgr", "+ mkdir -p /var/lib/ceph/mon/ceph-overcloud-computehci-1", "+ mkdir -p /var/run/ceph", "+ mkdir -p /var/lib/ceph/radosgw/overcloud-computehci-1", "+ mkdir -p /var/lib/ceph/mds/ceph-mds-overcloud-computehci-1", "+ mkdir -p /var/lib/ceph/mgr/ceph-", "+ chown -R ceph. /var/run/ceph/ /var/lib/ceph/bootstrap-mds /var/lib/ceph/bootstrap-osd /var/lib/ceph/bootstrap-rbd /var/lib/ceph/bootstrap-rgw /var/lib/ceph/mds /var/lib/ceph/mgr /var/lib/ceph/mon /var/lib/ceph/osd /var/lib/ceph/radosgw /var/lib/ceph/tmp", "+ case \"$CEPH_DAEMON\" in", "+ source start_osd.sh", "++ set -e", "++ is_redhat", "++ get_package_manager", "++ is_available rpm", "++ command -v rpm", "++ OS_VENDOR=redhat", "++ [[ redhat == \\r\\e\\d\\h\\a\\t ]]", "++ source /etc/sysconfig/ceph", "+++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728", "+++ CEPH_AUTO_RESTART_ON_UPGRADE=no", "+ OSD_TYPE=prepare", "+ start_osd", "+ get_config", "+ log 'static: does not generate config'", "+ '[' -z 'static: does not generate config' ']'", "++ date '+%F %T'", "+ TIMESTAMP='2017-11-17 13:41:49'", "+ echo '2017-11-17 13:41:49 /entrypoint.sh: static: does not generate config'", "+ return 0", "+ check_config", "+ [[ ! -e /etc/ceph/ceph.conf ]]", "+ '[' 0 -eq 1 ']'", "+ case \"$OSD_TYPE\" in", "+ source osd_disk_prepare.sh", "++ set -e", "+ osd_disk_prepare", "+ [[ -z /dev/sdl ]]", "+ [[ ! -e /dev/sdl ]]", "+ '[' '!' -e /var/lib/ceph/bootstrap-osd/ceph.keyring ']'", "+ timeout 10 ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health", "+ parted --script /dev/sdl print", "+ [[ 0 -eq 1 ]]", "+ log 'Regarding parted, device /dev/sdl is inconsistent/broken/weird.'", "+ '[' -z 'Regarding parted, device /dev/sdl is inconsistent/broken/weird.' ']'", "++ date '+%F %T'", "+ TIMESTAMP='2017-11-17 13:41:50'", "+ echo '2017-11-17 13:41:50 /entrypoint.sh: Regarding parted, device /dev/sdl is inconsistent/broken/weird.'", "+ return 0", "+ log 'It would be too dangerous to destroy it without any notification.'", "+ '[' -z 'It would be too dangerous to destroy it without any notification.' ']'", "++ date '+%F %T'", "+ TIMESTAMP='2017-11-17 13:41:50'", "+ echo '2017-11-17 13:41:50 /entrypoint.sh: It would be too dangerous to destroy it without any notification.'", "+ return 0", "+ log 'Please set OSD_FORCE_ZAP to '\\''1'\\'' if you really want to zap this disk.'", "+ '[' -z 'Please set OSD_FORCE_ZAP to '\\''1'\\'' if you really want to zap this disk.' ']'", "++ date '+%F %T'", "+ TIMESTAMP='2017-11-17 13:41:50'", "+ echo '2017-11-17 13:41:50 /entrypoint.sh: Please set OSD_FORCE_ZAP to '\\''1'\\'' if you really want to zap this disk.'", "+ return 0", "+ exit 1"], "stdout": "VERBOSE: activating bash debugging mode.\n2017-11-17 13:41:49 /entrypoint.sh: static: does not generate config\nHEALTH_ERR no osds\n2017-11-17 13:41:50 /entrypoint.sh: Regarding parted, device /dev/sdl is inconsistent/broken/weird.\n2017-11-17 13:41:50 /entrypoint.sh: It would be too dangerous to destroy it without any notification.\n2017-11-17 13:41:50 /entrypoint.sh: Please set OSD_FORCE_ZAP to '1' if you really want to zap this disk.", "stdout_lines": ["VERBOSE: activating bash debugging mode.", "2017-11-17 13:41:49 /entrypoint.sh: static: does not generate config", "HEALTH_ERR no osds", "2017-11-17 13:41:50 /entrypoint.sh: Regarding parted, device /dev/sdl is inconsistent/broken/weird.", "2017-11-17 13:41:50 /entrypoint.sh: It would be too dangerous to destroy it without any notification.", "2017-11-17 13:41:50 /entrypoint.sh: Please set OSD_FORCE_ZAP to '1' if you really want to zap this disk."]}
In the following list of changes 3.0.11 goes up to Nov 13, 2017: https://github.com/ceph/ceph-ansible/commits/v3.0.12 So which OSD related change might be the cause of this?
Created attachment 1354183 [details] ceph-ansible run showing symptoms of bz 1514460
WORKAROUND: In ceph-ansible deploy with: ceph_osd_docker_prepare_env: "-e OSD_FORCE_ZAP=1" If you're using OSP12, this means deploy with: parameter_defaults: CephAnsibleExtraConfig: ceph_osd_docker_prepare_env: "-e OSD_FORCE_ZAP=1"
The workaround in #8 might make you wonder, why should we tell "ceph-disk prepare" to zap if we're already using ironic cleaning which does a "sgdisk -Z" during cleaning [1] prior to deployment (as is OSP best practice [2] when using Ceph). The difference is that ironic cleans the disk and doesn't create a GPT label (this should be the case as Ironic's job when cleaning is to remove labels) while asking ceph-disk to zap by OSD_FORCE_ZAP=1 to the container will create the GPT label. A change in luminous's ceph-disk made it such that it no longer required a GPT label to be on the disk prior to running and it would create it if it had to, but jewel's ceph-disk still does require the GPT label to be on the disk. ceph-ansible was updated so that it wouldn't create the label because luminous's ceph-disk made this unecessary. This broke jewel and caused this bug however. The fix in the attached ceph-ansible PR is to ensure we have the GPT label [3]. [1] https://bugs.launchpad.net/ironic-lib/+bug/1690458 [2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/11/html/red_hat_ceph_storage_for_the_overcloud/creation#Formatting_Ceph_Storage_Nodes_Disks_to_GPT [3] https://github.com/ceph/ceph-ansible/pull/2197
Upstream fix for this and the backpoert has merged. Fixed version tagged w/ ceph-ansible 3.0.14 Next step is to get a package for testing from kdreyer
https://github.com/ceph/ceph-ansible/tree/v3.0.14
Testing w/ ceph-ansible-3.0.14-1.el7cp.noarch.rpm
Verified fixed in ceph-ansible-3.0.14-1.el7cp.noarch.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387