Bug 1518788

Summary:

[ceph-ansible] [ceph-container] : rolling update of dmcrypt OSDs failed searching for container expose_partitions_<disk>

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Vasishta <vashastr>

Component:

Container

Assignee:

Sébastien Han <shan>

Status:

CLOSED ERRATA

QA Contact:

Vasishta <vashastr>

Severity:

high

Docs Contact:

Aron Gunn <agunn>

Priority:

high

Version:

3.0

CC:

adeza, agunn, anharris, aschoen, ceph-eng-bugs, dang, gmeno, hchen, hnallurv, jim.curtis, kdreyer, me, nthomas, pprakash, sankarshan, shan

Target Milestone:

Target Release:

3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

rhceph:ceph-3.0-rhel-7-docker-candidate-53483-20180117211610

Doc Type:

Bug Fix

Doc Text:

Previously, when upgrading a containerized cluster to Red Hat Ceph Storage 3, the ceph-ansible utility failed to upgrade encrypted OSD nodes. As a consequence, such OSDs could not come up and journald logs included the following error message: "Error response from daemon: No such container: expose_partitions_<disk>" This bug has been fixed by modifying the underlying source code, and ceph-ansible upgrades containerized encrypted OSDs as expected.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-03-08 15:46:22 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1494421

Attachments:

Description	Flags
File contains contents of OSD journald log snippet after enabling verbose	none

Description Vasishta 2017-11-29 14:59:45 UTC

Created attachment 1360383 [details]
File contains contents of OSD journald log snippet after enabling verbose

Description of problem:
Rolling update of containerized cluster from 2.4 to 3.0 failed on dmcrypt OSDs failed searching for container expose_partitions_<disk>

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.14-1.el7cp.noarch
Container image - rhceph:3-2

Steps to Reproduce:
1. Initialize ceph 2.4 cluster.
2. Update ceph-ansible to 3.x and follow doc to update the cluster to 3.0


Actual results:
OSDs are failing to come up saying expose_partitions_sdd

Expected results:
OSDs must get updated successfully 

Additional info:

OSD configurations as mentioned in inventory file -

<node> ceph_osd_docker_prepare_env="-e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1 -e OSD_DMCRYPT=1" ceph_osd_docker_extra_env="-e CLUSTER={{ cluster }} -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_DMCRYPT=1" devices="['/dev/sdb','/dev/sdc','/dev/sdd']"


Contents of all.yml -
$ grep -Ev '(^#|^$)' group_vars/all.yml
---
dummy:
ceph_docker_image_tag: "3-2"
fetch_directory: ~/ceph-ansible-keys
cluster: humpty
monitor_interface: "eno1"
radosgw_interface: "eno1"
public_network: 10.8.128.0/21
docker: true
ceph_docker_image: "rhceph"
mon_containerized_deployment: true
ceph_mon_docker_interface: "eno1"
ceph_mon_docker_subnet: "{{ public_network }}"
ceph_docker_registry: "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888" #registry.access.redhat.com

Comment 4 Vasishta 2017-11-29 15:46:55 UTC

An additional information -

As only three OSD nodes were there, playbook failed after retrying 40 times  while running task "waiting for clean pgs..." as all osds were down on a node.

Comment 5 Sébastien Han 2017-11-29 19:13:01 UTC

That's a nasty bug with no workaround at the moment :/
Only the patch will fix the issue.

Comment 6 Harish NV Rao 2017-11-30 09:27:32 UTC

@Drew, Can you please add the summary of the decision made in yesterday's program call regarding this bug?

Comment 7 Harish NV Rao 2017-12-01 06:31:45 UTC

(In reply to Harish NV Rao from comment #6)
> @Drew, Can you please add the summary of the decision made in yesterday's
> program call regarding this bug?

@Drew, a gentle reminder. Please note that this bz is still having target release as 3.0.

Comment 8 Drew Harris 2017-12-01 16:01:11 UTC

We will release note this for 3.0 and target it for the next async. I'm tracking this bug manually for now until we have a new target location for it.

Comment 10 Sébastien Han 2017-12-06 17:01:25 UTC

lgtm

Comment 15 Vasishta 2018-02-23 15:33:36 UTC

Hi,

Working fine using ceph-ansible-3.0.26-1.el7cp.noarch to upgrade to
ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 from rhceph-3-rhel7.

While upgrading cluster having OSDs with collocated journals, faced issue
reported in Bug 1548357, Followed workaround mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1548357#c4 . (mention mgr's name at
top of mons group in inventory file)

Moving to VERIFIED state.

Regards,
Vasishta Shatsry
AQE, Ceph

Comment 18 errata-xmlrpc 2018-03-08 15:46:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0473