Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1605930

Summary:

osd failed to upgrade with "Error: No cluster conf found in /etc/ceph"

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Tiffany Nguyen <tunguyen>

Component:

Ceph-Ansible

Assignee:

Sébastien Han <shan>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

ceph-qe-bugs <ceph-qe-bugs>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.1

CC:

aschoen, ceph-eng-bugs, gmeno, nthomas, sankarshan, seb, tunguyen, vakulkar

Target Milestone:

Flags:

vakulkar: automate_bug?

Target Release:

3.1

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-08-07 18:58:59 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ansible log	none
all.yml	none
osds.yml	none
hosts file	none
ansible log	none

Description Tiffany Nguyen 2018-07-20 17:39:15 UTC

Created attachment 1466874 [details]
ansible log

Description of problem:
When running rolling_update.yml, osd failed to activate ceph-disk with error below:

failed: [c06-h09-6048r.rdu.openstack.engineering.redhat.com] (item=/dev/sdb) => {"changed": false, "cmd": ["ceph-disk", "activate", "/dev/sdb1"], "delta": "0:00:00.144141", "end": "2018-07-20 17:18:44.826203", "item": "/dev/sdb", "msg": "non-zero return code", "rc": 1, "start": "2018-07-20 17:18:44.682062", "stderr": "mount_activate: Failed to activate\nceph-disk: Error: No cluster conf found in /etc/ceph with fsid 9071b1aa-c5ea-451c-b1d0-06b2298c1901", "stderr_lines": ["mount_activate: Failed to activate", "ceph-disk: Error: No cluster conf found in /etc/ceph with fsid 9071b1aa-c5ea-451c-b1d0-06b2298c1901"], "stdout": "", "stdout_lines": []}

Version-Release number of selected component (if applicable):
Upgrade from 2.5 --> 3.1
 * 2.5 (10.2.10-17.el7cp)
 * 3.1 (build http://download.eng.bos.redhat.com/composes/auto/ceph-3.1-rhel-7/RHCEPH-3.1-RHEL-7-20180712.ci.2/) 

How reproducible:
* Cluster has 30% filled data
* Running 2.5 -> 3.1 rolling_upgrade.yml with I/O running in parallel

Steps to Reproduce:
1. Running a ceph 2.5 on cluster with 30% data
2. Start I/O using cosbench tool
3. Perform rolling_upgrade.yml to 3.1 build
4. Monitor the ansible log: upgrade error and cluster failed to upgrade

Comment 3 Tiffany Nguyen 2018-07-20 17:49:15 UTC

Created attachment 1466978 [details]
all.yml

Comment 4 Tiffany Nguyen 2018-07-20 17:49:40 UTC

Created attachment 1466984 [details]
osds.yml

Comment 5 Tiffany Nguyen 2018-07-20 17:50:01 UTC

Created attachment 1466988 [details]
hosts file

Comment 6 Tiffany Nguyen 2018-07-20 17:57:57 UTC

fsid info:
[root@c07-h29-6018r ~]# ceph fsid
9071b1aa-c5ea-451c-b1d0-06b2298c1901

Comment 7 Tiffany Nguyen 2018-07-20 23:52:50 UTC

Created attachment 1469625 [details]
ansible log

Re-run rolling_upgrade.yml, attaching new ansible log.  Upgrade still failing with error of pgs stuck degraded:
 cluster:
    id:     9071b1aa-c5ea-451c-b1d0-06b2298c1901
    health: HEALTH_WARN
            1012 pgs degraded
            6 pgs recovering
            1008 pgs recovery_wait
            1012 pgs stuck degraded
            1014 pgs stuck unclean
            recovery 1225343/184368312 objects degraded (0.665%)
            noout,noscrub,nodeep-scrub flag(s) set

Comment 8 seb 2018-07-25 13:42:29 UTC

What does your ceph.conf say about this fsid?