Bug 1577846 - After latest environment update all ceph-disk@dev-sdXX.service are in failed state
Summary: After latest environment update all ceph-disk@dev-sdXX.service are in failed ...
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 2.5
Hardware: All
OS: All
Target Milestone: z4
: 3.0
Assignee: leseb
QA Contact: Yogev Rabl
Depends On:
Blocks: 1578730 1581579 1583767
TreeView+ depends on / blocked
Reported: 2018-05-14 09:32 UTC by Alex Stupnikov
Modified: 2019-01-15 15:18 UTC (History)
16 users (show)

Fixed In Version: RHEL: ceph-ansible-3.0.35-1.el7cp Ubuntu: ceph-ansible_3.0.35-2redhat1
Doc Type: Bug Fix
Doc Text:
.Update to the `ceph-disk` Unit Files Previously, the transition to containerized Ceph left some "ceph-disk" unit files. The files were harmless, but appeared as failing, which could be distressing to the operator. With this update, executing the "switch-from-non-containerized-to-containerized-ceph-daemons.yml" playbook disables the "ceph-disk" unit files too.
Clone Of:
: 1581579 (view as bug list)
Last Closed: 2018-07-11 18:11:10 UTC

Attachments (Terms of Use)
journalctl logs and ceph-osd status (4.38 KB, text/plain)
2018-05-14 09:32 UTC, Alex Stupnikov
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2177 None None None 2018-07-11 18:12:03 UTC
Github ceph ceph-ansible pull 2595 None None None 2018-05-16 15:39:58 UTC

Description Alex Stupnikov 2018-05-14 09:32:38 UTC
Created attachment 1436089 [details]
journalctl logs and ceph-osd status

Description of problem:

CU asked us to investigate issue with his RHOSP12 + ceph environment. He reported the following issue: all ceph-disk@dev-sdXX.service units are in failed state.

I have tried to troubleshoot the issue for one specific ceph disk sdv (but the picture is the same for other ones). Please find the extract from journalctl logs and ``systemctl status --all`` command in attachments (to keep description shorter).

It looks like the following ceph-ansible v3.0.27 play masked all ceph-osd@N.service services and broken ceph-disk systemd units:

    - name: stop non-containerized ceph osd(s)
        name: "{{ item }}"
        state: stopped
        enabled: no
        masked: yes
      with_items: "{{ running_osds.stdout_lines | default([])}}"
      when: running_osds != []

I may be wrong with the clue above, but customers are still struggling, so please find additional information about customer's environment in comment #1.

Customer said that his ceph environment is running fine, but he is worried about failed systemd units. Please feel free to adjust severity if this problem is cosmetic.

Comment 4 leseb 2018-05-16 12:28:29 UTC
I don't think this a real issue, we simply don't change the ceph-disk unit file. We probably should disable it too. However, this does not affect the ceph-osd@XXX.service units and the cluster should be fine.

Comment 8 Guillaume Abrioux 2018-05-22 11:40:04 UTC
fix will be in v3.1.0rc4 and v3.0.35

Comment 9 Alex Stupnikov 2018-05-22 11:45:04 UTC
Guillaume, will Red Hat ship those ceph-ansible versions with RHCS 2.5?

BR, Alex

Comment 14 Guillaume Abrioux 2018-05-29 13:23:30 UTC
fixed in v3.1.0rc4

Comment 26 leseb 2018-06-08 02:12:48 UTC
Edu, the fix is in 2.5z1 as per comment https://bugzilla.redhat.com/show_bug.cgi?id=1577846#c10 I'm not sure how I can assist further.
Please let me know.

Comment 28 leseb 2018-06-08 08:01:09 UTC
I'm not sure how I can help here, the only thing I can tell you is that the patch is present in v3.0.35 and above.

When it comes to how fix faster or release date please ask Ken.

Comment 31 leseb 2018-06-12 06:28:57 UTC

Comment 32 Yogev Rabl 2018-06-26 12:06:28 UTC
Verified on rc9

Comment 34 errata-xmlrpc 2018-07-11 18:11:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.