Bug 1332083
Summary: | log [ERR] : OSD full dropping all updates 99% full followed by FAILED assert(0 == "unexpected error") : ENOSPC handling not implemented | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vikhyat Umrao <vumrao> | |
Component: | RADOS | Assignee: | David Zafman <dzafman> | |
Status: | CLOSED ERRATA | QA Contact: | David Zafman <dzafman> | |
Severity: | high | Docs Contact: | Bara Ancincova <bancinco> | |
Priority: | high | |||
Version: | 1.2.3 | CC: | bengland, ceph-eng-bugs, dzafman, hnallurv, jbuchta, jdurgin, kchai, kdreyer, tpetr, vumrao | |
Target Milestone: | rc | |||
Target Release: | 3.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | RHEL: ceph-12.1.4-1.el7cp Ubuntu: ceph_12.1.4-2redhat1xenial | Doc Type: | Bug Fix | |
Doc Text: |
.Improvements in handling of full OSDs
When an OSD disk became so full that the OSD could not function, the OSD terminated unexpectedly with a confusing assert message. With this update:
* The error message has been improved.
* By default, no more than 25% of OSDs are automatically marked as `out`.
* The `statfs` calculation in FileStore or BlueStore back ends have been improved to better reflect the disk usage.
As a result, OSDs are less likely to become full and if they do, a more informative error message is added to the log.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1420417 (view as bug list) | Environment: | ||
Last Closed: | 2017-12-05 23:29:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1420417, 1494421 |
Description
Vikhyat Umrao
2016-05-02 07:18:16 UTC
The piece to verify for this is that recovery does not overfill the osds. To reproduce, you can set the 'mon osd full threshold' on the monitor to a low value, e.g. 0.1 (10%), fill a cluster up close to that point, and then mark one of the osds out. The other osds will start recovering. Once an osd reaches 10% full, it will be marked as full in the osdmap, and subsequent recovery operations should stall in the recovery_toofull state. Increasing the full ratio again to the default of 0.95 should let recovery complete. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387 |