Bug 585898 - delete_module function causes kernel hung task messages for busy modules
Summary: delete_module function causes kernel hung task messages for busy modules
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-26 11:14 UTC by Caspar Zhang
Modified: 2013-07-03 07:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-04-26 16:39:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Caspar Zhang 2010-04-26 11:14:04 UTC
Description of problem:

when execute a simple delete_module program like (the ext3 module is in-use):

main()
{
delete_module("ext3");
}

The program often hang and after several minutes, kernel oops message given:

INFO: task a.out:4172 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
a.out         D ffffffff80151248     0  4172      1                4002 (NOTLB)
 ffff810063b09ef8 0000000000000086 000000048008e16d ffffffff804a16f0
 ffffffff804a16f8 0000000000000007 ffff81007e8770c0 ffff81007fe487a0
 0000029cb2d0be13 00000000000049e9 ffff81007e8772a8 000000007e8770c0
Call Trace:
 [<ffffffff800a77c1>] sys_delete_module+0x15c/0x1c5
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

INFO: task a.out:4172 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
a.out         D ffffffff80151248     0  4172      1                4002 (NOTLB)
 ffff810063b09ef8 0000000000000086 000000048008e16d ffffffff804a16f0
 ffffffff804a16f8 0000000000000007 ffff81007e8770c0 ffff81007fe487a0
 0000029cb2d0be13 00000000000049e9 ffff81007e8772a8 000000007e8770c0
Call Trace:
 [<ffffffff800a77c1>] sys_delete_module+0x15c/0x1c5
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

Version-Release number of selected component (if applicable):
2.6.18-194.el5

How reproducible:
>80%

Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
kernel oops

Expected results:
no oops and program can finish execution

Additional info:

strace ./a.out:

execve("./a.out", ["./a.out"], [/* 23 vars */]) = 0
brk(0)                                  = 0x10bd4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac6b41bb000
uname({sys="Linux", node="dell-p690-01.rhts.eng.bos.redhat.com", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=81122, ...}) = 0
mmap(NULL, 81122, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2ac6b41bc000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\341\3022\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac6b41d0000
mmap(0x32c2e00000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x32c2e00000
mprotect(0x32c2f4e000, 2093056, PROT_NONE) = 0
mmap(0x32c314d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x32c314d000
mmap(0x32c3152000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x32c3152000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac6b41d1000
arch_prctl(ARCH_SET_FS, 0x2ac6b41d1210) = 0
mprotect(0x32c314d000, 16384, PROT_READ) = 0
mprotect(0x32c2c1b000, 4096, PROT_READ) = 0
munmap(0x2ac6b41bc000, 81122)           = 0
delete_module("ext3", O_RDONLY|O_ASYNC|O_LARGEFILE|O_NOFOLLOW|O_NOATIME|0xf5700028

Comment 1 Eric Sandeen 2010-04-26 16:12:24 UTC
This is not a crash, or an oops, it is a hung task.

Comment 2 Eric Sandeen 2010-04-26 16:30:54 UTC
If you compile your test looking for warnings, you will get:

# gcc -Wall -o test test.c
stupid.c:2: warning: return type defaults to ‘int’
stupid.c: In function ‘main’:
stupid.c:3: warning: implicit declaration of function ‘delete_module’
stupid.c:4: warning: control reaches end of non-void function

Note the implicit declaration of delete_module.

I think that you are missing a flags argument, which could cause the call to return -EBUSY rather than hang on a busy module.

I don't think this is a kernel bug.

-Eric

Comment 3 Eric Sandeen 2010-04-26 16:39:05 UTC
With the right combination of (random, unspecified) flags, this is the same as doing "rmmod -w":

       -w --wait
              Normally,  rmmod will refuse to unload modules which are in use.
              With this option, rmmod will isolate the module, and wait  until
              the module is no longer used.  Noone new will be able to use the
              module, but it’s up to you to make sure the current users  even-
              tually  finish  with it.  See lsmod(8)) for information on usage
              counts.


Note You need to log in before you can comment on or make changes to this bug.