Bug 1518788 - [ceph-ansible] [ceph-container] : rolling update of dmcrypt OSDs failed searching for container expose_partitions_<disk>
Summary: [ceph-ansible] [ceph-container] : rolling update of dmcrypt OSDs failed searc...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Container
Version: 3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 3.0
Assignee: leseb
QA Contact: Vasishta
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1494421
TreeView+ depends on / blocked
 
Reported: 2017-11-29 14:59 UTC by Vasishta
Modified: 2018-03-08 15:46 UTC (History)
16 users (show)

Fixed In Version: rhceph:ceph-3.0-rhel-7-docker-candidate-53483-20180117211610
Doc Type: Bug Fix
Doc Text:
Previously, when upgrading a containerized cluster to Red Hat Ceph Storage 3, the ceph-ansible utility failed to upgrade encrypted OSD nodes. As a consequence, such OSDs could not come up and journald logs included the following error message: "Error response from daemon: No such container: expose_partitions_<disk>" This bug has been fixed by modifying the underlying source code, and ceph-ansible upgrades containerized encrypted OSDs as expected.
Clone Of:
Environment:
Last Closed: 2018-03-08 15:46:22 UTC
Target Upstream Version:


Attachments (Terms of Use)
File contains contents of OSD journald log snippet after enabling verbose (123.84 KB, text/plain)
2017-11-29 14:59 UTC, Vasishta
no flags Details


Links
System ID Priority Status Summary Last Updated
Github ceph ceph-container pull 854 None None None 2017-11-29 19:13:01 UTC
Red Hat Product Errata RHBA-2018:0473 normal SHIPPED_LIVE updated rhceph-3.0-rhel7 container image 2018-03-08 20:45:28 UTC

Description Vasishta 2017-11-29 14:59:45 UTC
Created attachment 1360383 [details]
File contains contents of OSD journald log snippet after enabling verbose

Description of problem:
Rolling update of containerized cluster from 2.4 to 3.0 failed on dmcrypt OSDs failed searching for container expose_partitions_<disk>

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.14-1.el7cp.noarch
Container image - rhceph:3-2

Steps to Reproduce:
1. Initialize ceph 2.4 cluster.
2. Update ceph-ansible to 3.x and follow doc to update the cluster to 3.0


Actual results:
OSDs are failing to come up saying expose_partitions_sdd

Expected results:
OSDs must get updated successfully 

Additional info:

OSD configurations as mentioned in inventory file -

<node> ceph_osd_docker_prepare_env="-e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1 -e OSD_DMCRYPT=1" ceph_osd_docker_extra_env="-e CLUSTER={{ cluster }} -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_DMCRYPT=1" devices="['/dev/sdb','/dev/sdc','/dev/sdd']"


Contents of all.yml -
$ grep -Ev '(^#|^$)' group_vars/all.yml
---
dummy:
ceph_docker_image_tag: "3-2"
fetch_directory: ~/ceph-ansible-keys
cluster: humpty
monitor_interface: "eno1"
radosgw_interface: "eno1"
public_network: 10.8.128.0/21
docker: true
ceph_docker_image: "rhceph"
mon_containerized_deployment: true
ceph_mon_docker_interface: "eno1"
ceph_mon_docker_subnet: "{{ public_network }}"
ceph_docker_registry: "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888" #registry.access.redhat.com

Comment 4 Vasishta 2017-11-29 15:46:55 UTC
An additional information -

As only three OSD nodes were there, playbook failed after retrying 40 times  while running task "waiting for clean pgs..." as all osds were down on a node.

Comment 5 leseb 2017-11-29 19:13:01 UTC
That's a nasty bug with no workaround at the moment :/
Only the patch will fix the issue.

Comment 6 Harish NV Rao 2017-11-30 09:27:32 UTC
@Drew, Can you please add the summary of the decision made in yesterday's program call regarding this bug?

Comment 7 Harish NV Rao 2017-12-01 06:31:45 UTC
(In reply to Harish NV Rao from comment #6)
> @Drew, Can you please add the summary of the decision made in yesterday's
> program call regarding this bug?

@Drew, a gentle reminder. Please note that this bz is still having target release as 3.0.

Comment 8 Drew Harris 2017-12-01 16:01:11 UTC
We will release note this for 3.0 and target it for the next async. I'm tracking this bug manually for now until we have a new target location for it.

Comment 10 leseb 2017-12-06 17:01:25 UTC
lgtm

Comment 15 Vasishta 2018-02-23 15:33:36 UTC
Hi,

Working fine using ceph-ansible-3.0.26-1.el7cp.noarch to upgrade to
ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 from rhceph-3-rhel7.

While upgrading cluster having OSDs with collocated journals, faced issue
reported in Bug 1548357, Followed workaround mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1548357#c4 . (mention mgr's name at
top of mons group in inventory file)

Moving to VERIFIED state.

Regards,
Vasishta Shatsry
AQE, Ceph

Comment 18 errata-xmlrpc 2018-03-08 15:46:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0473


Note You need to log in before you can comment on or make changes to this bug.