Bug 1564186
Summary: | XFS may block endlessly in xlog_wait() on IO error | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Zdenek Kabelac <zkabelac> | |
Component: | kernel | Assignee: | Carlos Maiolino <cmaiolin> | |
kernel sub component: | XFS | QA Contact: | Zorro Lang <zlang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | unspecified | |||
Priority: | urgent | CC: | brdeoliv, dchinner, dhoward, dwysocha, xzhou | |
Version: | 7.5 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-3.10.0-879.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1611212 1657142 (view as bug list) | Environment: | ||
Last Closed: | 2018-10-30 08:54:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1657142 |
Description
Zdenek Kabelac
2018-04-05 15:27:42 UTC
Waiting in xlog_wait() implies that the log IO has not been completed. i.e. that whatever error occurred in the DM layer has not been correctly completed with an error, and XFS is still waiting for it. Before we go on a wild goose chase in XFS, can you first confirm that DM is completing all outstanding IOs correctly (e.g. via blktrace) and that XFS isn't actually waiting on a log IO completion that has got lost? -Dave. I've already analyzed this case with Carlos - so he will explain more details for XFS part of problem. DM is in faulty stage - switching to read-only mode - the ext4 handles same case easily. XFS gets stuck in mentioned function waiting on an action that cannot happen. So far we have not prepared more 'easier' test since it's nontrivial to hit table switch in the very same moment. Basic trable is - thin-pool is switch to 'out-of-metadata' read-only mode and when the time is 'right' the action for reporting IO error happens to early (at least that's what I remember from Carlos explanation). Also - there is already fix made by him - but having better test case that does not rely on faulty thin-pool metadata logic is still missing (and likely bug in DM target will get fixed soon - so not usable for further testing) Also feel free to use reproducer from bug 1563697. The only mandatory condition is - the PV in use must be on very fast device (nearly zero latency - I'd suggest ramdisk based loop device - as it's very tricky race condition.) (In reply to Dave Chinner from comment #3) > Waiting in xlog_wait() implies that the log IO has not been completed. i.e. > that whatever error occurred in the DM layer has not been correctly > completed with an error, and XFS is still waiting for it. > > Before we go on a wild goose chase in XFS, can you first confirm that DM is > completing all outstanding IOs correctly (e.g. via blktrace) and that XFS > isn't actually waiting on a log IO completion that has got lost? > > -Dave. Hi Dave, I'm still not 100% sure this might be a bug in XFS, mainly due the fact it is so far, only reproducible due a bug in device-mapper. There are more data than the stacktrace mentioned above though. The stacktrace comes from a fstrim process stuck on the log force. The IO has not been completed, yes, it failed. The IO error is detected by XFS in xlog_iodone(), during the io completion of the iclog submitted by xfs_log_force, during the fstrim process. When xfs_log_force_umount() is called, and it runs the xlog callbacks via xlog_state_do_callbacks() to wake up all the threads waiting on iclog->ic_force_wait, it does skip the thread waiting on the fstrim process, well, at least I thought it was skipping it (more about it below). At this point XFS has been shut down, and the waiting fstrim has never been woken up. I was tracking down why the fstrim thread was being skipped during the shutdown process, but I figured it out it's not actually skipping. xlog_state_do_callbacks() get stuck while trying to run the iclog's callback. I'm actually tracking down why it's getting stuck. Hoping to have some updates soon Hi carlos, Can you post the notes/details of the analysis to the bz? That way everyone can follow along with the analysis you've done, otherwise we're completely in the dark about what the problem you're seeing is and can't actually help at all short of duplicating everything you're already doing... Cheers, Dave. Hi Dave, (In reply to Dave Chinner from comment #6) > Hi carlos, > > Can you post the notes/details of the analysis to the bz? That way everyone > can follow along with the analysis you've done, otherwise we're completely > in the dark about what the problem you're seeing is and can't actually help > at all short of duplicating everything you're already doing... Yes, my apologies to have not updated it sooner, I was trying to get more data and have more details about it, but so far, what I tracked is: - xlog_iodone runs for the xlog where the IO error occurred, and it falls into the shutdown case, calling xfs_do_force_shutdown, which will end up callign xfs_log_force_umount(). In xfs_log_force_umount(), xlog_state_ioerror() is called and all iclogs are properly marked with XLOG_STATE_IOERROR. Then we fall into xlog_state_do_callback, which is supposed to wake up the processes sleeping in iclog->ic_force_wait of each iclog. Most of the iclogs clean up are done properly, but every time, the last iclog on the list get stuck. Its callback (xlog_cil_committed) is called from xlog_state_do_callback in: cb->cb_func(cb->cb_arg, aborted), which never returns back to xlog_state_do_callback(), so, it can't wake up the threads sleeping in ic_force_wait. xlog_cil_committed calls into xfs_trans_committed_bulk(), which will then unpin all the log items. in the if (aborted) conditional, it calls into the iop_unpin() operation of the log items. The last cil item which had been marked with XLOG_STATE_IOERROR above, will call its associated log item iop_unpin, which is xfs_buf_item_unpin() and get stuck there. Into xfs_buf_item_unpin, we try to lock the buffer into: } else if (freed && remove) { xfs_buf_lock() . . } And it stops trying to acquire the buffer's semaphore into xfs_buf_lock() down(&bp->b_sema); That's basically where I am now. The buffer semaphore is held somewhere else (where I still don't know), preventing the force umount to continue and wake up the fstrim threas sleeping on the ic_force_wait. I'll update once I have more information. Cheers Carlos, If we are hanging in xfs_buf_item_unpin() trying to lock a buffer, that implies we are holding the buffer locked while calling xfs_log_force(XFS_LOG_SYNC). XFS_LOG_SYNC should not be used whenever we hold metadata locked.... IOWs, I suspect the bug is in xfs_trim_extents(), where it reads and locks the AGF then does a synchronous log force. Moving the sync log force to before locking the AGF should avoid the hang..... Cheers, Dave. we need acks here. Patch(es) available on kernel-3.10.0-879.el7 Use scsi_debug to test this bug: # modprobe scsi_debug dev_size_mb=2048 lbpu=1 lbpws=1 1) Reproduced this bug on kernel 878.el7, which has both bug 1563697 and this bug: # ./check generic/500 FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 ibm-x3550m3-05 3.10.0-878.el7.x86_64 MKFS_OPTIONS -- -f -bsize=4096 /dev/sde MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sde /mnt/scratch generic/500 ^C^C 2) Tested on kernel 920.el7 which hasn't fixed bug 1563697, but fixed this bug: # ./check generic/500 FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 ibm-x3550m3-05 3.10.0-920.el7.x86_64.debug MKFS_OPTIONS -- -f -bsize=4096 /dev/sde MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sde /mnt/scratch generic/500 [failed, exit status 1]- output mismatch (see /root/git/xfstests-dev/results//generic/500.out.bad) --- tests/generic/500.out 2018-07-25 00:00:54.313201624 -0400 +++ /root/git/xfstests-dev/results//generic/500.out.bad 2018-08-17 09:09:44.766198368 -0400 @@ -1,2 +1,26 @@ QA output created by 500 -Silence is golden +fstrim: /mnt/scratch: FITRIM ioctl failed: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error +fstrim: cannot open /mnt/scratch: Input/output error ... (Run 'diff -u tests/generic/500.out /root/git/xfstests-dev/results//generic/500.out.bad' to see the entire diff) Ran: generic/500 Failures: generic/500 Failed 1 of 1 tests We can see XFS isn't blocked there even hit bug 1563697. So this bug has been fixed. 3) Finally, all test passed on kernel 933.el7: # ./check generic/500 FSTYP -- xfs (non-debug) PLATFORM -- Linux/x86_64 ibm-x3550m3-05 3.10.0-933.el7.x86_64 MKFS_OPTIONS -- -f -bsize=4096 /dev/sde MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sde /mnt/scratch generic/500 11s Ran: generic/500 Passed all 1 tests So Verify this bug now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3083 |