Bug 1103792
Summary: | Unmounting an RHEL7 XFS filesystem on an offlined drive hangs with the message "metadata I/O error". | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Vimal Kumar <vikumar> |
Component: | kernel | Assignee: | Eric Sandeen <esandeen> |
kernel sub component: | XFS | QA Contact: | Zorro Lang <zlang> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | dwysocha, eguan, esandeen, jkachuck, rsussman, swhiteho |
Version: | 7.1 | Keywords: | TestCaseProvided |
Target Milestone: | rc | ||
Target Release: | 7.3 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-07-21 23:29:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1113511, 1203710, 1295577, 1313485 |
Description
Vimal Kumar
2014-06-02 14:45:33 UTC
Moving to kernel; this isn't an xfsprogs (kernelspace) problem. The retry is intentional, as far as I know; this could be a fibrechannel failover, which might take considerable time. How can XFS know how long is "too long," and give up, leaving a corrupted filesystem? So that makes the "fail" option less clear cut. As for your second expected result, here is no way for an unmount to finish "properly" if there is no disk to write to. -Eric We'll reconsider the approach during the RHEL7.1 timeframe, but XFS by design intentionally keeps retrying. It would be nice to have some graceful way to recover, though. For what it's worth, this should work to recover from teh situation: # xfs_io -x -c shutdown /mount/point to shut down the filesystem. Then it can be unmounted, and the messages will stop. There are a few recent upstream changes that might help this behavior, thanks bfoster for reminding me of these on the list upstream:
> roughly commits 5e4b538 through d4a97a0 or so
d4a97a0 xfs: add missing bmap cancel calls in error paths
146e54b xfs: add helper to conditionally remove items from the AIL
f307080 xfs: fix btree cursor error cleanups
0ae120f xfs: clean up root inode properly on mount failure
a3f2001 xfs: checksum log record ext headers based on record size
fc0d165 xfs: fix broken icreate log item cancellation
78d57e4 xfs: icreate log item recovery and cancellation tracepoints
f0b2efa xfs: don't leave EFIs on AIL on mount failure
e32a1d1 xfs: use EFI refcount consistently in log recovery
6bc43af xfs: ensure EFD trans aborts on log recovery extent free failure
8d99fe9 xfs: fix efi/efd error handling to avoid fs shutdown hangs
d43ac29 xfs: return committed status from xfs_trans_roll()
This is resolved by the configurable error handling patch, which also sets xfs to "fail at unmount" by default. This behavior is present in kernel-3.10.0-428.el7 and newer. *** This bug has been marked as a duplicate of bug 1267042 *** To be more clear, the new default behavior is to terminate any outstanding, failing IOs when an unmount command is issued. |