Bug 754828

Summary: WARNING: at fs/fs-writeback.c:968 __mark_inode_dirty Errors after FC Port failover
Product: Red Hat Enterprise Linux 6 Reporter: Gerardo Arceri <gea>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: dchinner, esandeen, james.young, jwest, lczerner, levy_jerome, peter.sjoberg, rwheeler, syeghiay, yaliu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-05 12:45:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
SOSReport generated by abrtd. none

Description Gerardo Arceri 2011-11-17 20:22:55 UTC
Created attachment 534312 [details]
SOSReport generated by abrtd.

Description of problem:
We are getting "WARNING: at fs/fs-writeback.c:968 __mark_inode_dirty" errors after a FC HBA Path fails and recovers.
We are using EMC PowerPath 5.6

Version-Release number of selected component (if applicable):
2.6.32-131.0.15.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Mount Multipathed SAN Filesystem, leave something writing to it
2.Fail one of the paths by disconnecting the cable
3.Reconnect the cable after a short while
  
Actual results:
Syslog show 5 of these errors, abrtd logs the crash into /var/spool/abrt

Expected results:
No errors should have been displayed.

Additional info:
This server is using EMC PowerPath 5.6 to access SAN Luns coming from an EMC VMAX Frame.
It has to be something in the in-kernel fs code as the errors do not happen if you write "raw" to one of the luns.

Server is a HP Proliant BL620c G7 fitted with QLA-2462 HBAs.

I'm attaching the sosreport as generated by abrtd

Comment 2 Jerry Levy 2011-11-18 13:28:41 UTC
Issue occurs with both ext3 and ext4 but not when doing a dd to the powerdevices. Also tried noatime (as we're hit just after touch_atime.

Comment 3 Ric Wheeler 2012-01-03 15:30:37 UTC
Are you seeing this in /var/log/messages? What I did see there was "slow" warnings (probably during an IO that was hung while powerpath waited for fail over?).

How long is that timeout?

Thanks!

Comment 4 Gerardo Arceri 2012-01-03 15:38:06 UTC
Yes, that was seen on /var/log/messages.
In no cases we've lost any data, but the messages kind of freaked us out and they would surely freak out our eyes-on-glass people.

Comment 5 Jerry Levy 2012-01-03 15:39:25 UTC
Timeouts were set to defaults. The question is why it would only happen on filesystem access and not on a dd... admittedly, the testing was a bit of a corner case but still the behavior is of concern.

Comment 6 Eric Sandeen 2012-01-03 19:26:16 UTC
So the message came from:

                        if (bdi_cap_writeback_dirty(bdi) &&
                            !test_bit(BDI_registered, &bdi->state)) {
                                WARN_ON(1);
                                printk(KERN_ERR "bdi-%s not registered\n",
                                                                bdi->name);
                        }

which was specifically added by:

commit 500b067c5e6ceea49cf280a02597b1169320e08c
Author: Jens Axboe <jens.axboe>
Date:   Wed Sep 9 09:10:25 2009 +0200

    writeback: check for registered bdi in flusher add and inode dirty
    
    Also a debugging aid. We want to catch dirty inodes being added to
    backing devices that don't do writeback.
    
    Signed-off-by: Jens Axboe <jens.axboe>


and after the warning and the stack dump we got:

Nov 10 14:34:41 schhyt16 kernel: bdi-block not registered

and we went down this path to get here:

Call Trace: 
[<ffffffff81067137>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff8106718a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff8119b678>] ? __mark_inode_dirty+0x108/0x160
[<ffffffff8118c03d>] ? touch_atime+0x12d/0x170
[<ffffffff8110ee60>] ? generic_file_aio_read+0x380/0x700
[<ffffffff8117255a>] ? do_sync_read+0xfa/0x140
[<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810cea0d>] ? audit_filter_rules+0x2d/0xa10
[<ffffffff812051a6>] ? security_file_permission+0x16/0x20
[<ffffffff81172f85>] ? vfs_read+0xb5/0x1a0
[<ffffffff810d1b62>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff811730c1>] ? sys_read+0x51/0x90
[<ffffffff8100b172>] ? system_call_fastpath+0x16/0x1b


Anyway, it seems like we got here with an unregistered bdi, somehow.

Comment 7 Eric Sandeen 2012-01-03 19:29:49 UTC
It seems quite possible that this is a race in powerpath.

Comment 8 Jerry Levy 2012-01-03 19:35:40 UTC
None of the functions in the path are PowerPath's, and PowerPath doesn't handle filesystem functions differently than direct block I/O; why would the problem only occur during filesystem stress and not on dd operations, and why would an unregistered BDI block error only show up on the former? I don't want to beat a dead horse, but I'd be much more comfortable accepting that this is a PowerPath problem if it occurred on direct writes as well; PowerPath doesn't care about inodes or atime operations per se.

Comment 9 Dave Chinner 2012-01-03 21:06:11 UTC
(In reply to comment #8)
> None of the functions in the path are PowerPath's, and PowerPath doesn't handle
> filesystem functions differently than direct block I/O; why would the problem
> only occur during filesystem stress and not on dd operations, and why would an
> unregistered BDI block error only show up on the former? I don't want to beat a
> dead horse, but I'd be much more comfortable accepting that this is a PowerPath
> problem if it occurred on direct writes as well;

Direct block IO only dirties one inode (the block device inode), and only then on the first write. So there's basically no window for a race to occur on such a test.

Filesystem stress dirties many different inodes, all the time, so there's plenty of scope for a bdi switch behind the back of the filesystem to be tripped over.

> PowerPath doesn't care about
> inodes or atime operations per se.

Sure, but filesystems care about BDIs always being valid and so PowerPath needs to be careful about switching them around if that is what it is doing....

Comment 10 Jerry Levy 2012-01-05 12:45:04 UTC
I'm setting this as a duplicate of BZ 655845 - known kernel bug which was fixed in 6.1.

*** This bug has been marked as a duplicate of bug 655845 ***