Hide Forgot
Description of problem: We have had several cases opened with support where a customers site took a power hit and the Ceph cluster was unable to recover due to levelDB corruption on the monitors due to not properly having their failure domains configured for monitor nodes or battery backup available when using write-back cache. We could update section "2.3.7. Additional Considerations" in the Hardware Guide to include recommendations about separate power fees to customers racks to prevent such issues or make note of battery backup as an alternative. Version-Release number of selected component (if applicable): 1.3.x
KCS https://access.redhat.com/solutions/2518281 is a WIP for recovering from this issue.
This bug requires no QE verification. We want the new hardware guidelines to be reviewed by Kyle Bader if he is not the originator already.
The problem isn't that customers failed to put monitors in different failure domains. The problem is customers putting the monitor stores on volatile media. Monitor stores should be on SSDs that ensure data integrity during power loss (hint: use Intel DC series). If they are putting them on spinning media, then the disk write cache should be disabled, and any RAID controller should either be battery backed or have a supercap.
Use SSDs for monitor stores The monitor store can generate a significant amount of IO, making SSDs an ideal choice of storage media. To ensure data integrity during power loss, all caches in the data path need to either be disabled or safeguarded by hardware mechanisms like battery backup units or super capacitors coupled with non-volatile stores.