Bug 1458512
Summary: | [ceph-ansible] [ceph-container] : osd activation failing when cluster name has numbers | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vasishta <vashastr> | ||||||||
Component: | Container | Assignee: | Guillaume Abrioux <gabrioux> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> | ||||||||
Severity: | medium | Docs Contact: | Erin Donnelly <edonnell> | ||||||||
Priority: | medium | ||||||||||
Version: | 2.3 | CC: | adeza, anharris, dang, edonnell, flucifre, gabrioux, gmeno, hchen, hnallurv, jim.curtis, kdreyer, pprakash, seb, tserlin | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 3.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | rhceph:ceph-3.0-rhel-7-docker-candidate-71465-20170804220045 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
.OSD activation no longer fails when running the `osd_disk_activate.sh` script in the Ceph container when a cluster name contains numbers
Previously, in the Ceph container image the `osd_disk_activate.sh` script considered all numbers included in a cluster name as an OSD ID. As a consequence, OSD activation failed when running the script because the script was seeking a keyring on a path based on an OSD ID that did not exist. The underlying issue has been fixed, and OSD activation no longer fails when the name of a cluster in a container contains numbers.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-12-05 23:18:20 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1437916, 1494421 | ||||||||||
Attachments: |
|
Description
Vasishta
2017-06-03 18:06:12 UTC
Created attachment 1284699 [details] ansible-playbook log and osds.yml, all.yml file snippets Faced this issue while working on BZ 1452316 . Though ansible-playbook ran successfully, osds couldn't get activated Attachment contains ansible-playbook log and osds.yml, all.yml file snippets [ubuntu@magna088 ~]$ cat /usr/share/ceph-ansible/group_vars/osds.yml | egrep -v ^# | grep -v ^$ --- dummy: copy_admin_key: true devices: - /dev/sdb - /dev/sdc osd_containerized_deployment: true ceph_osd_docker_prepare_env: -e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1 -e OSD_DMCRYPT=1 ceph_osd_docker_devices: "{{ devices }}" ceph_osd_docker_extra_env: -e CLUSTER={{ cluster }} -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_DMCRYPT=1 Created attachment 1284700 [details]
Contains log snipets of two different osds
Found two issues regarding two different osds in osd log
1) Jun 03 16:49:11 magna003 ceph-osd-run.sh[24497]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.m1sLEc
Jun 03 16:49:11 magna003 ceph-osd-run.sh[24497]: no valid command found; 10 closest matches:
2)Jun 03 17:00:45 magna013 ceph-osd-run.sh[9768]: command_check_call: Running command: /bin/mount -o noatime,inode64 -- /dev/mapper/704e3599-8ae9-40c5-b86c-24c36b5c15e8 /var/lib/ceph/osd/2_3-1
Jun 03 17:00:45 magna013 ceph-osd-run.sh[9768]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.fJ_tix
Jun 03 17:00:45 magna013 ceph-osd-run.sh[9768]: df: '/var/lib/ceph/osd/2_3-2/': No such file or directory
Jun 03 17:00:45 magna013 ceph-osd-run.sh[9768]: 2017-06-03 17:00:45.260147 7f15f1661700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/2_3-2//keyring: (2) No such file or directory
Please refer attachment for larger log snippet.
Sorry for not mentioning in the description that cluster has custom cluster name, '2_3' in this case.
Regards,
Vasishta
Created attachment 1284765 [details]
osd log snippet - dedicated journal
Hi,
OSD activation is failing in dedicated journal scenario also with similar log messages.
1> These lines commonly appears for all the osd logs (i,e in all nodes),
Jun 04 16:27:33 magna106 ceph-osd-run.sh[15897]: df: '/var/lib/ceph/osd/7_3_2_3-7/': No such file or directory
Jun 04 16:27:33 magna106 ceph-osd-run.sh[15897]: 2017-06-04 16:27:33.883293 7f73b17ba700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/7_3_2_3-7//keyring: (2) No such file or directory
Jun 04 16:27:33 magna106 ceph-osd-run.sh[15897]: 2017-06-04 16:27:33.883303 7f73b17ba700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Jun 04 16:27:33 magna106 ceph-osd-run.sh[15897]: 2017-06-04 16:27:33.883305 7f73b17ba700 0 librados: osd.7 initialization error (2) No such file or directory
Please refer the attachment for larger log snippet.
File also contains conf file, all.yml and osds.yml files (used for ceph-ansible)
Regards,
Vasishta
Alfredo please triage this. Alfredo please triage this. release note 2.3, but keep working... this is relevant to 3.0 Alfredo please disregard request to triage -- we're kicking this out of 2.3 This bug is affecting the osd_disk_activate.sh script when cluster name includes numbers. when osd_disk_activate.sh is being run, it seeks for OSD id by executing this command: OSD_ID=$(grep "${MOUNTED_PART}" /proc/mounts | awk '{print $2}' | grep -oh '[0-9]*') (see: https://github.com/ceph/ceph-docker/blob/master/ceph-releases/jewel/ubuntu/14.04/daemon/osd_scenarios/osd_disk_activate.sh#L43) In the case where the cluster name includes numbers, the last grep is wrong because '[0-9]*' will accept any numbers. ++ grep /dev/mapper/34c336ea-3594-4c1d-b5f1-9cdf18a8ba6c /proc/mounts ++ awk '{print $2}' ++ grep -oh '[0-9]*' + OSD_ID='23 For instance: [root@ceph-osd0 /]# grep /dev/mapper/34c336ea-3594-4c1d-b5f1-9cdf18a8ba6c /proc/mounts /dev/mapper/34c336ea-3594-4c1d-b5f1-9cdf18a8ba6c /var/lib/ceph/osd/23-0 xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0 [root@ceph-osd0 /]# grep /dev/mapper/34c336ea-3594-4c1d-b5f1-9cdf18a8ba6c /proc/mounts | awk '{print $2}' /var/lib/ceph/osd/23-0 [root@ceph-osd0 /]# grep /dev/mapper/34c336ea-3594-4c1d-b5f1-9cdf18a8ba6c /proc/mounts | awk '{print $2}' | grep -oh '[0-9]*' 23 0 [root@ceph-osd0 /]# we should only get '0'. fix: https://github.com/ceph/ceph-docker/pull/662 Guillaume, would you mind setting the "doc text" field for this BZ? It needs to go into the 2.3 Release Notes. Thanks! Erin Thank you Guillaume for the doc text info! I have updated it a bit--would you mind taking a look and letting me know if it looks ok? (In reply to Erin Donnelly from comment #13) > Thank you Guillaume for the doc text info! I have updated it a bit--would > you mind taking a look and letting me know if it looks ok? looks good to me yes just release note the bug, no doc addition is necessary. merge upstream, backport in progress Changing summary to an appropriate version as osd activation fails whether encrypted or not and whether collocated or dedicated journal device. Regards, Vasishta Tried using ceph-3.0-rhel-7-docker-candidate-31370-20171003232256, working fine, Moving BZ to VERIFIED state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3388 |