Bug 1472409
| Summary: | ceph: not all OSDs are up when ceph node starts | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Alexander Chuzhoy <sasha> | |
| Component: | Ceph-Disk | Assignee: | Loic Dachary <ldachary> | |
| Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | |
| Severity: | urgent | Docs Contact: | Bara Ancincova <bancinco> | |
| Priority: | unspecified | |||
| Version: | 1.3.3 | CC: | arkady_kanevsky, audra_cooper, bniver, dcain, federico, gael_rehault, gfidente, goneri, icolle, jdurgin, johfulto, John_walsh, kdreyer, kurt_hey, ldachary, lhh, mburns, morazi, nlevine, Paul_Dardeau, rajini.karthik, randy_perryman, sasha, seb, smerrow, srevivo, tserlin, vakulkar | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | 2.4 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | RHEL: ceph-10.2.7-37.el7cp Ubuntu: ceph_10.2.7-36redhat1 | Doc Type: | Bug Fix | |
| Doc Text: |
.OSDs now wait up to three hours for other OSD to complete its initialization sequence
At boot time, an OSD daemon could fail to start when it took more than five minutes to wait for other OSD to complete its initialization sequence. As a consequence, such OSDs had to be started manually. With this update, OSDs wait up to three hours. As a result, OSDs no longer fail to start when the initialization sequence of other OSDs takes too long.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1532775 (view as bug list) | Environment: | ||
| Last Closed: | 2017-10-17 18:12:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1335596, 1356451, 1473436, 1479701 | |||
|
Description
Alexander Chuzhoy
2017-07-18 16:19:49 UTC
The issue reproduced on all 3 ceph nodes.
Exactly 3 osds were down after rebooting each:
3/24 in osds are down
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph -s
cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
health HEALTH_WARN
808 pgs degraded
808 pgs stuck degraded
808 pgs stuck unclean
808 pgs stuck undersized
808 pgs undersized
recovery 11/57 objects degraded (19.298%)
3/24 in osds are down
noout,norebalance flag(s) set
monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
osdmap e201: 24 osds: 21 up, 24 in; 808 remapped pgs
flags noout,norebalance,require_jewel_osds
pgmap v20349: 2240 pgs, 6 pools, 45659 kB data, 19 objects
1273 MB used, 22331 GB / 22333 GB avail
11/57 objects degraded (19.298%)
1432 active+clean
808 active+undersized+degraded
[root@overcloud-cephstorage-1 ~]# ceph -s
cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
health HEALTH_WARN
823 pgs degraded
823 pgs stuck degraded
823 pgs stuck unclean
823 pgs stuck undersized
823 pgs undersized
recovery 6/57 objects degraded (10.526%)
3/24 in osds are down
noout,norebalance flag(s) set
monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
osdmap e227: 24 osds: 21 up, 24 in; 823 remapped pgs
flags noout,norebalance,require_jewel_osds
pgmap v20481: 2240 pgs, 6 pools, 45659 kB data, 19 objects
1341 MB used, 22331 GB / 22333 GB avail
6/57 objects degraded (10.526%)
1417 active+clean
823 active+undersized+degraded
[heat-admin@overcloud-cephstorage-2 ~]$ sudo ceph status
cluster 1289fdf6-6b11-11e7-b06e-5254002376d6
health HEALTH_WARN
844 pgs degraded
844 pgs stuck degraded
844 pgs stuck unclean
844 pgs stuck undersized
844 pgs undersized
recovery 10/57 objects degraded (17.544%)
3/24 in osds are down
noout,norebalance flag(s) set
monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.124:6789/0,overcloud-controller-2=192.168.170.122:6789/0}
election epoch 32, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
osdmap e253: 24 osds: 21 up, 24 in; 844 remapped pgs
flags noout,norebalance,require_jewel_osds
pgmap v20615: 2240 pgs, 6 pools, 45659 kB data, 19 objects
1361 MB used, 22331 GB / 22333 GB avail
10/57 objects degraded (17.544%)
1396 active+clean
844 active+undersized+degraded
It could be that after a while the osds come up. Sasha, Are you proposing that we wait then check status, then run systemctl start ceph-disk on disks that are not up yet and then check results of these and then complete. Hi Arkady,
I was hoping that osds come up if we wait longer (something I thought I saw on one machine), but trying to prove that part - I verified that they don't (waited for more than 1 hour):
[root@overcloud-cephstorage-0 ~]# uptime
22:13:49 up 1:06, 1 user, load average: 0.03, 0.03, 0.05
[root@overcloud-cephstorage-0 ~]# ceph -s
cluster 9d071b3c-6d0d-11e7-91c2-525400141c5e
health HEALTH_WARN
612 pgs degraded
612 pgs stuck degraded
612 pgs stuck unclean
612 pgs stuck undersized
612 pgs undersized
recovery 6/57 objects degraded (10.526%)
2/24 in osds are down
noout,norebalance flag(s) set
monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.123:6789/0,overcloud-controller-2=192.168.170.126:6789/0}
election epoch 34, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
osdmap e208: 24 osds: 22 up, 24 in; 612 remapped pgs
flags noout,norebalance,require_jewel_osds
pgmap v13321: 2368 pgs, 6 pools, 45659 kB data, 19 objects
1313 MB used, 22331 GB / 22333 GB avail
6/57 objects degraded (10.526%)
1756 active+clean
612 active+undersized+degraded
So then I ran:
[root@overcloud-cephstorage-0 ~]# for i in `systemctl|awk '/ceph-disk/ {print $2}'`; do echo $i; systemctl start $i; done
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
ceph-disk
Checking the status again - all osds are up:
[root@overcloud-cephstorage-0 ~]# ceph -s
cluster 9d071b3c-6d0d-11e7-91c2-525400141c5e
health HEALTH_WARN
65 pgs peering
65 pgs stuck unclean
noout,norebalance flag(s) set
monmap e2: 3 mons at {overcloud-controller-0=192.168.170.128:6789/0,overcloud-controller-1=192.168.170.123:6789/0,overcloud-controller-2=192.168.170.126:6789/0}
election epoch 34, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
osdmap e214: 24 osds: 24 up, 24 in
flags noout,norebalance,require_jewel_osds
pgmap v13335: 2368 pgs, 6 pools, 45659 kB data, 19 objects
1320 MB used, 22331 GB / 22333 GB avail
2303 active+clean
65 peering
So comment #3 can be disregarded.
Dup of: https://bugzilla.redhat.com/show_bug.cgi?id=1457231 Not a puppet-ceph bug. Unfortunately, as Alfredo mentioned this is well-known. This is taking care of in ceph-disk. So I suspect we can close this and leave this in Ceph itself. Ian may have already tracked this down, but it appears to have been fixed in (not before) 2.3 per https://github.com/ceph/ceph/pull/12147/files. @Loic, is there any plan to backport this into 1.3.X? I don't know that there are plans to do that. The problem has been reproduced already two times this week with regular deployment of OSP11 (RH7-RHOS-11.0 2017-08-22.2). FYI - ceph --version on a ceph node shows "ceph version 10.2.7-28.el7cp (216cda64fd9a9b43c4b0c2f8c402d36753ee35f7)" Federico, can you escalate it? Thanks Hi Wayne Engineering believes the fix is likely: http://tracker.ceph.com/issues/18007. That is merged upstream but not yet available downstream. A manual fix [1] until it is available downstream is to set the following variable in systemd/ceph-disk@.service (default is 300) Environment=CEPH_DISK_TIMEOUT=10000 [1] https://github.com/ceph/ceph/pull/17133/files Can you give it a try? Sean Hi Loic, Can you please elaborate on the workaround? I have the one in comment 16 and they came back with the following: "I want to try out your suggestion, but the instructions and the links you provide are not specific enough. I don’t know where the file(s) I should change resides.. Can you point to more specifics?" Thanks, Sean Loic, Sean, I was able to test this simple work-around (having found the target files) and it appears to work fine in a single-node reboot scenario. I am testing a reboot-all-ceph-nodes (ipmi-soft) scenario now. Will let you know. Seems hopeful. Re: #16 - Reboot of all ceph nodes at once with this work-around installed also resulted in successful return of osd's to status "up". Just as an fyi, this also occurs on OSP10 using unlocked bits Loic, could we get a backport of the fix? @tserlin this is done at 5e20864e136ea532431b05de24f0e78f59b63c41 Do we have a patch for RHEL? Loic, Was going to see if there is any tuning guidance we should add to the documents w/r/t disk # or sizes and how it might interact with an appropriate timeout value. @Mike I think the timeout does not need tuning, it is large enough. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2903 Hi, we have set the timeout as noted in Comment 16 and rebooted the Ceph nodes, but there is still one OSD down on each storage node. I'm running an OSP9 upgraded to OSP10 and the ceph version is ceph version 10.2.7-48.el7cp (cf7751bcd460c757e596d3ee2991884e13c37b96. |