Bug 772874

Summary:

cifs: multiple process stuck waiting for page lock

Product:

Red Hat Enterprise Linux 6

Reporter:

Stefan Walter <walteste>

Component:

kernel

Assignee:

Sachin Prabhu <sprabhu>

Status:

CLOSED ERRATA

QA Contact:

Jian Li <jiali>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

6.2

CC:

baumanmo, dhoward, dhowells, eguan, jiali, jlayton, jwest, kzhang, mark.whidby, nfs-maint, nmurray, pbandark, rdassen, rwheeler, sprabhu

Target Milestone:

Keywords:

Regression, ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-232.el6

Doc Type:

Bug Fix

Doc Text:

In the Common Internet File System (CIFS), the oplock break jobs and async callback handlers both use the SLOW-WORK workqueue, which has a finite pool of threads. Previously, these oplock break jobs could end up taking all the running queues waiting for a page lock which blocks the callback required to free this page lock from being completed. This update separates the oplock break jobs into a separate workqueue VERY-SLOW-WORK, allowing the callbacks to be completed successfully and preventing the deadlock.

Story Points:

---

Clone Of:

Clones:

1020716 (view as bug list)

Environment:

Last Closed:

2012-06-20 08:13:34 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

789373, 1020716

Attachments:

Description	Flags
patch -- convert oplock break job to very_slow_work	none

Description Stefan Walter 2012-01-10 08:12:40 UTC

Description of problem:

We use CIFS for home directories at our site. After the upgrade to 6.2 we
observe that sometimes the login process stops before the desktop is loaded
or that that an application suddenly freezes. When this happens a 'sync'
command will never return and a machine cannot be cleanly shut down. 

After a while we see the following appear in dmesg:

Jan  9 08:46:06 muster kernel: INFO: task kslowd000:2219 blocked for more than 120 seconds.
Jan  9 08:46:06 muster kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  9 08:46:06 muster kernel: kslowd000     D 0000000000000009     0  2219      2 0x00000080
Jan  9 08:46:06 muster kernel: ffff8806122a7b90 0000000000000046 0000000000000000 0000000000000000
Jan  9 08:46:06 muster kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jan  9 08:46:06 muster kernel: ffff8805faac3038 ffff8806122a7fd8 000000000000f4e8 ffff8805faac3038
Jan  9 08:46:06 muster kernel: Call Trace:
Jan  9 08:46:06 muster kernel: [<ffffffff81110930>] ? sync_page+0x0/0x50
Jan  9 08:46:06 muster kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan  9 08:46:06 muster kernel: [<ffffffff8111096d>] sync_page+0x3d/0x50
Jan  9 08:46:06 muster kernel: [<ffffffff814edb0a>] __wait_on_bit_lock+0x5a/0xc0
Jan  9 08:46:06 muster kernel: [<ffffffff81110907>] __lock_page+0x67/0x70
Jan  9 08:46:06 muster kernel: [<ffffffff81090a50>] ? wake_bit_function+0x0/0x50
Jan  9 08:46:06 muster kernel: [<ffffffffa1034dbf>] cifs_writepages+0x63f/0x670 [cifs]
Jan  9 08:46:06 muster kernel: [<ffffffff81126171>] do_writepages+0x21/0x40
Jan  9 08:46:06 muster kernel: [<ffffffff8111108b>] __filemap_fdatawrite_range+0x5b/0x60
Jan  9 08:46:06 muster kernel: [<ffffffff8111158f>] filemap_fdatawrite+0x1f/0x30
Jan  9 08:46:06 muster kernel: [<ffffffffa1031d51>] cifs_oplock_break+0xe1/0x1c0 [cifs]
Jan  9 08:46:06 muster kernel: [<ffffffff81104673>] slow_work_execute+0x233/0x310
Jan  9 08:46:06 muster kernel: [<ffffffff811048a7>] slow_work_thread+0x157/0x360
Jan  9 08:46:06 muster kernel: [<ffffffff81090a10>] ? autoremove_wake_function+0x0/0x40
Jan  9 08:46:06 muster kernel: [<ffffffff81104750>] ? slow_work_thread+0x0/0x360
Jan  9 08:46:06 muster kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan  9 08:46:06 muster kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan  9 08:46:06 muster kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan  9 08:46:06 muster kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20

With wireshark we can see the last thing that the CIFS module does before
further I/O with the server stops is to receive to two 'locking AndX' requests:

    558 6.042417    129.132.10.2          192.168.50.100        SMB      NT Create AndX Request, FID: 0x0060, Path: \walteste\infk\Linux\.gconfd\saved_state
    559 6.043207    192.168.50.100        129.132.10.2          SMB      NT Create AndX Response, FID: 0x0060
    ...
    663 6.053846    129.132.10.2          192.168.50.100        SMB      NT Create AndX Request, FID: 0x0061, Path: \walteste\infk\Linux\.gconfd\saved_state
    664 6.054755    192.168.50.100        129.132.10.2          SMB      NT Create AndX Response, FID: 0x0061
    ...
    668 6.055824    192.168.50.100        129.132.10.2          SMB      Locking AndX Request, FID: 0x0061
    669 6.055833    192.168.50.100        129.132.10.2          SMB      Locking AndX Request, FID: 0x0060

We compared this to a wireshark log of a successful login on a 6.1 system. The
same two requests are also received there and a response is sent.
  
Version-Release number of selected component (if applicable):

kernel-2.6.32-220.2.1.el6.x86_64

How reproducible:

Always with a kernel >= 2.6.32-220. The problem did not exist with any 6.1
kernels.

Steps to Reproduce:

1. Configure a system to use CIFS home directories.
2. Log in to GNOME

Actual results:

Login or applications often hang.

Expected results:

Login and applications should not hang.

Additional info:

A workaround that we found is use 'directio' as a mount option which
apparently avoids triggering the bug.

Comment 2 Ric Wheeler 2012-01-10 09:09:09 UTC

Hi Stefan,

Can you please open a support ticket via your Red Hat official channels? That helps us a lot as we gather data and try to look at issues.

Thanks for the report!

Ric

Comment 3 Jeff Layton 2012-01-10 11:51:03 UTC

Looks like cifs_writepages is stuck trying to lock a page. That likely means that something else is holding the page lock on that page. The question is what's holding it...

What would probably be most helpful is a full dump of the task status (via sysrq-t).

Comment 4 Stefan Walter 2012-01-10 13:57:19 UTC

I have opened case 00584861 at support.redhat.com and attached the
content of /var/log/messages after a sysrq-t when login hangs.

Comment 5 Jeff Layton 2012-01-10 15:44:12 UTC

Thanks. I don't see any process that's obviously holding the page lock and not releasing it. The most likely culprit here is the changes that added async writepages capability to cifs, but I've looked over that code and don't see any way that we could end up with pages being locked after cifs_writepages does its send. Perhaps this is just exposing another existing bug? Would it be possible to get a vmcore from this host? With that I might be able to tell more about the page(s) that everything is waiting on.

Comment 7 Stefan Walter 2012-01-11 07:57:29 UTC

> Would it be possible
> to get a vmcore from this host? With that I might be able to tell more about
> the page(s) that everything is waiting on.

I have posted download links for two vmcore.bz2 files to case 00584861.
The files are too large to attach.

Comment 9 Pratik Pravin Bandarkar 2012-01-11 15:14:13 UTC

vmcore file: vmcore.bz2

Queued
Requestor : pbandark
Corefile(s) : vmcore.bz2
How do I check status?:
Simply get on irc in a channel with tambot (normally #gss on rhirc.redhat.com) and type the following:
tambot: cas_status 20120111100830

Comment 14 Sachin Prabhu 2012-01-23 12:23:56 UTC

The problem described in this case is a regression caused by changes introduced by the patch


* Tue Jul 12 2011 Kyle McMartin <kmcmarti> [2.6.32-168.el6]
- [fs] cifs: convert async write callback to slow_work (Jeff Layton) [708000]

We are considering various options to avoid the deadlock caused by this patch. If you do encounter this issue, please use an older version of the kernel which doesn't contain the patch above until we have a fix for this issue.

Comment 15 David Howells 2012-01-23 16:25:28 UTC

The problem is that you have (a) a single pool of threads with a fixed finite limit on it and (b) tasks of two types with a dependency.  You are always at risk of deadlocking the pool by having the running threads all taken up with the dependent type of tasks and no running dependee tasks.

This is true even if the number of threads currently in the pool can be increased - provided there's a hard ceiling.

This cannot be solved without making a second pool whereby each type of task is segregated into its own pool.

That said, if the dependent task is marked as SLOW_WORK_VERY_SLOW it should provide this effect.  The slow-work facility guarantees to keep at least one thread nominally earmarked for ordinary slow work free from very-slow-work tasks, even through it will usually let ordinary threads process very-slow-work tasks if there's nothing better to do.

Comment 16 Sachin Prabhu 2012-01-23 16:44:12 UTC

Summary: 

We see a deadlock caused by the following threads.

1) cifs_writepages waiting for the page writeback bit on a page to be cleared. It does this while holding the page lock.
2) There are 2 threads in the slow-work queue which are waiting for the pagelock held in 1. These occupy all running threads in the slow-work mechanism which blocks other slow-work tasks from running.
3) A task meant to clear the page writeback bit required by 1 is enqueued as a slow work task  behind the tasks in 2.

This leads to a deadlock.

The problem was introduced by the patch

commit 8dded88b7831e98dfc65ff15b6f53c1365117545
Author: Jeff Layton <jlayton>
Date:   Wed Jul 6 12:45:06 2011 -0400

    [fs] cifs: convert async write callback to slow_work
    
   
    RHEL6 doesn't have concurrency managed workqueues. Convert the async
    write callback code to use slow_work instead.
    
    Signed-off-by: Jeff Layton <jlayton>
    Signed-off-by: Kyle McMartin <kmcmarti>

which went into version 2.6.32-168.el6 ie. RHEL 6.2. This patch was part of a series of patches which introduced the async page write back performance for CIFS. 

The upstream version of the patch relies on the Concurrency managed workqueue mechanism available upstream. Since this mechanism is not available in RHEL 6.2, we decided to instead use the slow-work mechanism which can concurrently execute the tasks enqueued. However, in certain conditions such as the one reported here, this doesn't work. 

When slow-work detects that a new thread is required, it enqueues the task to create a new thread onto the same slow-work queue. However this can only be executed once atleast one of the threads executing exits. Since the 2 tasks currently running in the slow-work queue block all other tasks on the slow-work queue, we encounter this deadlock.

Comment 17 Jeff Layton 2012-01-23 17:55:47 UTC

Created attachment 557020 [details]
patch -- convert oplock break job to very_slow_work

(In reply to comment #15)

> That said, if the dependent task is marked as SLOW_WORK_VERY_SLOW it should
> provide this effect.  The slow-work facility guarantees to keep at least one
> thread nominally earmarked for ordinary slow work free from very-slow-work
> tasks, even through it will usually let ordinary threads process very-slow-work
> tasks if there's nothing better to do.

Ok, so converting the oplock break job to very_slow_work should do the right thing. Something like this patch, perhaps?

Comment 18 Stefan Walter 2012-01-24 14:53:05 UTC

In the light of the latest local root exploit fixed in kernel-2.6.32-131.4.1
we cannot stick with an old kernel any more (we use CIFS home directories
in public student labs).

I have rebuilt the latest kernel with the patch from Jeff's last comment. We
tested it and it seems to work reliably. We have now deployed this kernel in
our student labs.

Comment 20 RHEL Program Management 2012-01-24 15:29:43 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 23 Jian Li 2012-02-09 03:09:09 UTC

Hi, could you give me some tips about how to reproduce this bug? thanks

Comment 24 Sachin Prabhu 2012-02-09 12:45:03 UTC

Jian,

This was only reproduced on the user end. They have tested the new patch and haven't seen this issue reproduced.

Sachin Prabhu

Comment 25 Sachin Prabhu 2012-02-09 15:57:56 UTC

Stefan,

Would you be able to test the official fix for this kernel once it is available?

Sachin Prabhu

Comment 26 Jian Li 2012-02-10 01:27:23 UTC

(In reply to comment #24)
> Jian,
> 
> This was only reproduced on the user end. They have tested the new patch and
> haven't seen this issue reproduced.
> 
> Sachin Prabhu

thanks sachin, qa_ack+

Comment 27 Stefan Walter 2012-02-10 07:11:26 UTC

Sure, I can easily test any kernel you give me. 

BTW, the 2.6.32-220.4.1 kernel we built ourselves with the one change from
comment 17 has worked fine ever since we deployed it on all student lab
machines.

Comment 29 Aristeu Rozanski 2012-02-15 20:11:05 UTC

Patch(es) available on kernel-2.6.32-232.el6

Comment 33 Tomas Capek 2012-04-18 12:12:50 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In the Common Internet File System (CIFS), the oplock break jobs and async callback handlers both use the SLOW-WORK workqueue, which has a finite pool of threads. Previously, these oplock break jobs could end up taking all the running queues waiting for a page lock which blocks the callback required to free this page lock from being completed. This update separates the oplock break jobs into a separate workqueue VERY-SLOW-WORK, allowing the callbacks to be completed successfully and preventing the deadlock.

Comment 35 errata-xmlrpc 2012-06-20 08:13:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html