Created attachment 1360383 [details] File contains contents of OSD journald log snippet after enabling verbose Description of problem: Rolling update of containerized cluster from 2.4 to 3.0 failed on dmcrypt OSDs failed searching for container expose_partitions_<disk> Version-Release number of selected component (if applicable): ceph-ansible-3.0.14-1.el7cp.noarch Container image - rhceph:3-2 Steps to Reproduce: 1. Initialize ceph 2.4 cluster. 2. Update ceph-ansible to 3.x and follow doc to update the cluster to 3.0 Actual results: OSDs are failing to come up saying expose_partitions_sdd Expected results: OSDs must get updated successfully Additional info: OSD configurations as mentioned in inventory file - <node> ceph_osd_docker_prepare_env="-e CLUSTER={{ cluster }} -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_FORCE_ZAP=1 -e OSD_DMCRYPT=1" ceph_osd_docker_extra_env="-e CLUSTER={{ cluster }} -e CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE -e OSD_JOURNAL_SIZE={{ journal_size }} -e OSD_DMCRYPT=1" devices="['/dev/sdb','/dev/sdc','/dev/sdd']" Contents of all.yml - $ grep -Ev '(^#|^$)' group_vars/all.yml --- dummy: ceph_docker_image_tag: "3-2" fetch_directory: ~/ceph-ansible-keys cluster: humpty monitor_interface: "eno1" radosgw_interface: "eno1" public_network: 10.8.128.0/21 docker: true ceph_docker_image: "rhceph" mon_containerized_deployment: true ceph_mon_docker_interface: "eno1" ceph_mon_docker_subnet: "{{ public_network }}" ceph_docker_registry: "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888" #registry.access.redhat.com
An additional information - As only three OSD nodes were there, playbook failed after retrying 40 times while running task "waiting for clean pgs..." as all osds were down on a node.
That's a nasty bug with no workaround at the moment :/ Only the patch will fix the issue.
@Drew, Can you please add the summary of the decision made in yesterday's program call regarding this bug?
(In reply to Harish NV Rao from comment #6) > @Drew, Can you please add the summary of the decision made in yesterday's > program call regarding this bug? @Drew, a gentle reminder. Please note that this bz is still having target release as 3.0.
We will release note this for 3.0 and target it for the next async. I'm tracking this bug manually for now until we have a new target location for it.
lgtm
Hi, Working fine using ceph-ansible-3.0.26-1.el7cp.noarch to upgrade to ceph-3.0-rhel-7-docker-candidate-38019-20180222163657 from rhceph-3-rhel7. While upgrading cluster having OSDs with collocated journals, faced issue reported in Bug 1548357, Followed workaround mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1548357#c4 . (mention mgr's name at top of mons group in inventory file) Moving to VERIFIED state. Regards, Vasishta Shatsry AQE, Ceph
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0473