Bug 1490716

Summary: [Regression] OSD's not starting after a reboot in RHCS 2.2
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Edu Alcaniz <ealcaniz>
Component: Ceph-DiskAssignee: Kefu Chai <kchai>
Status: CLOSED DUPLICATE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.3CC: ealcaniz, icolle, saime, vumrao
Target Milestone: rc   
Target Release: 2.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-13 14:49:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Edu Alcaniz 2017-09-12 06:22:51 UTC
Description of problem:
It seems that it could be a possible regression from the BZ 
https://bugzilla.redhat.com/show_bug.cgi?id=1391197


Version-Release number of selected component (if applicable):

$ grep ceph installed-rpms 
ceph-base-10.2.7-28.el7cp.x86_64                            Wed Sep  6 18:25:45 2017
ceph-common-10.2.7-28.el7cp.x86_64                          Wed Sep  6 17:49:15 2017
ceph-mon-10.2.7-28.el7cp.x86_64                             Wed Sep  6 18:26:29 2017
ceph-osd-10.2.7-28.el7cp.x86_64                             Wed Sep  6 18:26:31 2017
ceph-selinux-10.2.7-28.el7cp.x86_64                         Wed Sep  6 17:49:15 2017
libcephfs1-10.2.7-28.el7cp.x86_64                           Wed Sep  6 17:48:30 2017
puppet-ceph-2.3.0-5.el7ost.noarch                           Wed Sep  6 17:49:11 2017
python-cephfs-10.2.7-28.el7cp.x86_64                        Wed Sep  6 17:49:04 2017

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
$ less systemctl_list-units_--failed

  UNIT                              LOAD   ACTIVE SUB    DESCRIPTION
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdb1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdc1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdc2
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdc3
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdd1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdd2
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdd3
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sde1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdf1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdg1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdh1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdi1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdj1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdk1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdl1
* ceph-disk        loaded failed failed Ceph disk activation: /dev/sdm1
* dhcp-interface   loaded failed failed DHCP interface br/bond1
* dhcp-interface   loaded failed failed DHCP interface br/bond2
* dhcp-interface loaded failed failed DHCP interface ovs/system
LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

19 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

$ less systemctl_list-units_--all

* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdb1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdc1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdc2
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdc3
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdd1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdd2
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdd3
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sde1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdf1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdg1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdh1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdi1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdj1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdk1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdl1
* ceph-disk                                                                                                           loaded    failed   failed    Ceph disk activation: /dev/sdm1

Expected results:


Additional info:
increase the flock timeout to 900 as described in https://access.redhat.com/solutions/3164991 has fixed the issue.

Comment 16 Vikhyat Umrao 2017-10-13 14:49:26 UTC

*** This bug has been marked as a duplicate of bug 1458007 ***