Bug 733733 - Task hangs in uninterruptible sleep
Summary: Task hangs in uninterruptible sleep
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-26 16:21 UTC by Slawomir Czarko
Modified: 2011-09-06 06:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-06 06:38:45 UTC


Attachments (Terms of Use)

Description Slawomir Czarko 2011-08-26 16:21:13 UTC
Description of problem:

I'm trying tup (http://gittup.org/tup/) as a replacement for make. tup uses fuse internally (not sure if that's relevant).

When running tup with option -jN where N is bigger than 1 from time to time I get a task stuck in uninterruptible sleep. Sometimes it's g++, sometimes it's as and sometimes it's tup itself.

To get kernel call trace I recompiled the kernel which comes with Fedora to enable CONFIG_DETECT_HUNG_TASK:

CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0

Everything else is the same as in the latest Fedora 15 PAE kernel (2.6.40.3-0).


Version-Release number of selected component (if applicable):

2.6.40.3-0

How reproducible:

This happens about 20% of the time.

Steps to Reproduce:
1. Setup a project with tup
2. tup upd -j6
3.
  
Actual results:

From time to time one of the tasks involved in the compilation hangs.


Expected results:

All tasks complete.

This is from dmesg output after enabling CONFIG_DETECT_HUNG_TASK:

Additional info:

[  600.787076] INFO: task cc1plus:11371 blocked for more than 120 seconds.
[  600.787079] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  600.787081] cc1plus         D c07fcb2b     0 11371      1 0x00000084
[  600.787084]  cb465cf8 00000086 cb465c78 c07fcb2b cb465c98 c043ca38 00000046 c0b32780
[  600.787089]  dfbac000 c0b32780 0200eb96 00000055 c968a800 00000000 c94b4bc0 cb465ce4
[  600.787093]  cb465cb4 c0450c2f c94b4bc0 cb465ccc c04511eb cb465cc4 cb465ccc cb465cd0
[  600.787097] Call Trace:
[  600.787104]  [<c07fcb2b>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[  600.787107]  [<c043ca38>] ? try_to_wake_up+0x15f/0x169
[  600.787111]  [<c0450c2f>] ? recalc_sigpending+0x42/0x65
[  600.787113]  [<c04511eb>] ? __set_task_blocked+0x6a/0x70
[  600.787116]  [<c046d738>] ? arch_local_irq_save+0x12/0x17
[  600.787118]  [<c07fcb2b>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[  600.787121]  [<c045b795>] ? prepare_to_wait+0x53/0x5a
[  600.787128]  [<ef325588>] fuse_request_send+0x164/0x1f5 [fuse]
[  600.787130]  [<c045b7fd>] ? remove_wait_queue+0x2c/0x2c
[  600.787136]  [<ef328fe9>] fuse_lookup_name+0xe1/0x1a0 [fuse]
[  600.787144]  [<c05a7f8b>] ? avc_has_perm_flags+0x62/0x70
[  600.787154]  [<ef3290ee>] fuse_lookup+0x46/0x14b [fuse]
[  600.787158]  [<c05a8f88>] ? inode_has_perm+0x3f/0x46
[  600.787164]  [<c04fbfe6>] d_alloc_and_lookup+0x34/0x52
[  600.787168]  [<c04fd032>] walk_component+0x1be/0x313
[  600.787173]  [<c04fdbfb>] do_last+0x101/0x52b
[  600.787177]  [<c04fe9fc>] path_openat+0xa5/0x28d
[  600.787181]  [<c04fec0f>] do_filp_open+0x2b/0x6c
[  600.787186]  [<c05eaa13>] ? strncpy_from_user+0x34/0x4e
[  600.787191]  [<c0507110>] ? alloc_fd+0x53/0xbf
[  600.787195]  [<c04f3fbc>] do_sys_open+0x5f/0xe5
[  600.787199]  [<c04f4068>] sys_open+0x26/0x2c
[  600.787204]  [<c08028df>] sysenter_do_call+0x12/0x28

Comment 1 Slawomir Czarko 2011-08-26 16:27:47 UTC
Here's the stack of the stuck task:

cat /proc/11371/stack
[<ef325588>] fuse_request_send+0x164/0x1f5 [fuse]
[<ef328fe9>] fuse_lookup_name+0xe1/0x1a0 [fuse]
[<ef3290ee>] fuse_lookup+0x46/0x14b [fuse]
[<c04fbfe6>] d_alloc_and_lookup+0x34/0x52
[<c04fd032>] walk_component+0x1be/0x313
[<c04fdbfb>] do_last+0x101/0x52b
[<c04fe9fc>] path_openat+0xa5/0x28d
[<c04fec0f>] do_filp_open+0x2b/0x6c
[<c04f3fbc>] do_sys_open+0x5f/0xe5
[<c04f4068>] sys_open+0x26/0x2c
[<c08028df>] sysenter_do_call+0x12/0x28
[<ffffffff>] 0xffffffff

Comment 2 Slawomir Czarko 2011-08-26 18:30:21 UTC
I have another system with Fedora 9 where this problem doesn't occur.

kernel version there is 2.6.27.25-78.2.56.fc9.i686.PAE

Comment 3 Slawomir Czarko 2011-08-27 09:09:30 UTC
Task which hangs initially hangs in state S+

After pressing Ctrl-C or trying to kill it it changes to state D.

Comment 4 Slawomir Czarko 2011-09-06 06:38:45 UTC
After further investigation this looks like application or fuse bug.


Note You need to log in before you can comment on or make changes to this bug.