Bug 592863
Summary: | GFS: data lost after hard restart of cluster node | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Krzysztof Kopec <uniks> |
Component: | gfs | Assignee: | Robert Peterson <rpeterso> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 4 | CC: | edamato, swhiteho |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-07-12 13:34:29 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Krzysztof Kopec
2010-05-17 08:55:07 UTC
Using GFS doesn't prevent data loss in and of itself. In theory the journals should prevent metadata loss to a large degree, but even that isn't perfect. I'll explain why, and how to minimize data loss. When processes write to the GFS file system, the metadata is kept in the journals, and that metadata is synced to disk at the end of every transaction (write operations, renames, creates, etc.) Files and directories that have the "jdata" attribute will also have their data kept in the GFS journal as well as their metadata, and that provides more assurance of data integrity when nodes fail (the journal is simply replayed and the transactions are re-written in place). However, if the files or directories are not marked "jdata" only the metadata (file disk inodes, directories, block numbers, bit allocation information, etc.) is journaled. So then it all comes down to whether the data landed on disk before the system went down, which brings me to my next topic: hardware issues. Many modern storage devices have a large memory cache that remembers blocks written. Those devices tell the file system, such as GFS, that blocks are "written" as soon as the data has landed in the cache, but not actually on the storage media, e.g. disk. If the storage loses power, it will lose its write cache before it has a chance to write the actual data blocks to disk. Some storage devices are better than others at maintaining data integrity, even when power is lost. An OEM hard drive exported through iSCSI or gnbd will lose power suddenly when the host system loses power. On the other hand, most SAN storage will have separate power, so pulling the plug on the nodes won't affect their data integrity, but if the whole data center or building lose power, the SAN goes down at the same time, and then it's down to tricks they play with battery backups and such. So there are several things you can do to ensure data integrity: 1. Keep your GFS software up to date by running newer kernels. 2. Use good quality (expensive) storage devices that have separate power and/or internal data integrity mechanisms. 3. Use storage devices that are powered separately from the nodes. 4. Use a UPS and/or battery backup system to ensure the storage doesn't lose power, even if the nodes do. 5. Set the "journaled data" or "jdata" bits on most critical data. Obviously, that comes with a performance penalty. 6. Use RAID and similar concepts for storage redundancy. 7. Always make backups. I hope this helps. I'll leave this bug record open for a few days in case you have questions or specific concerns about GFS doing something wrong. I'm setting the NEEDINFO flag in the meantime. We don't have enough information to solve this problem. Feel free to reopen the bug record if more information is provided. Hopefully the information in comment #1 was sufficient to solve the issue. |