Bug 585898

Summary: delete_module function causes kernel hung task messages for busy modules
Product: Red Hat Enterprise Linux 5 Reporter: Caspar Zhang <czhang>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: esandeen, qcai
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-26 16:39:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Caspar Zhang 2010-04-26 11:14:04 UTC
Description of problem:

when execute a simple delete_module program like (the ext3 module is in-use):

main()
{
delete_module("ext3");
}

The program often hang and after several minutes, kernel oops message given:

INFO: task a.out:4172 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
a.out         D ffffffff80151248     0  4172      1                4002 (NOTLB)
 ffff810063b09ef8 0000000000000086 000000048008e16d ffffffff804a16f0
 ffffffff804a16f8 0000000000000007 ffff81007e8770c0 ffff81007fe487a0
 0000029cb2d0be13 00000000000049e9 ffff81007e8772a8 000000007e8770c0
Call Trace:
 [<ffffffff800a77c1>] sys_delete_module+0x15c/0x1c5
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

INFO: task a.out:4172 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
a.out         D ffffffff80151248     0  4172      1                4002 (NOTLB)
 ffff810063b09ef8 0000000000000086 000000048008e16d ffffffff804a16f0
 ffffffff804a16f8 0000000000000007 ffff81007e8770c0 ffff81007fe487a0
 0000029cb2d0be13 00000000000049e9 ffff81007e8772a8 000000007e8770c0
Call Trace:
 [<ffffffff800a77c1>] sys_delete_module+0x15c/0x1c5
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

Version-Release number of selected component (if applicable):
2.6.18-194.el5

How reproducible:
>80%

Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
kernel oops

Expected results:
no oops and program can finish execution

Additional info:

strace ./a.out:

execve("./a.out", ["./a.out"], [/* 23 vars */]) = 0
brk(0)                                  = 0x10bd4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac6b41bb000
uname({sys="Linux", node="dell-p690-01.rhts.eng.bos.redhat.com", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=81122, ...}) = 0
mmap(NULL, 81122, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2ac6b41bc000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\341\3022\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac6b41d0000
mmap(0x32c2e00000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x32c2e00000
mprotect(0x32c2f4e000, 2093056, PROT_NONE) = 0
mmap(0x32c314d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x32c314d000
mmap(0x32c3152000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x32c3152000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac6b41d1000
arch_prctl(ARCH_SET_FS, 0x2ac6b41d1210) = 0
mprotect(0x32c314d000, 16384, PROT_READ) = 0
mprotect(0x32c2c1b000, 4096, PROT_READ) = 0
munmap(0x2ac6b41bc000, 81122)           = 0
delete_module("ext3", O_RDONLY|O_ASYNC|O_LARGEFILE|O_NOFOLLOW|O_NOATIME|0xf5700028

Comment 1 Eric Sandeen 2010-04-26 16:12:24 UTC
This is not a crash, or an oops, it is a hung task.

Comment 2 Eric Sandeen 2010-04-26 16:30:54 UTC
If you compile your test looking for warnings, you will get:

# gcc -Wall -o test test.c
stupid.c:2: warning: return type defaults to ‘int’
stupid.c: In function ‘main’:
stupid.c:3: warning: implicit declaration of function ‘delete_module’
stupid.c:4: warning: control reaches end of non-void function

Note the implicit declaration of delete_module.

I think that you are missing a flags argument, which could cause the call to return -EBUSY rather than hang on a busy module.

I don't think this is a kernel bug.

-Eric

Comment 3 Eric Sandeen 2010-04-26 16:39:05 UTC
With the right combination of (random, unspecified) flags, this is the same as doing "rmmod -w":

       -w --wait
              Normally,  rmmod will refuse to unload modules which are in use.
              With this option, rmmod will isolate the module, and wait  until
              the module is no longer used.  Noone new will be able to use the
              module, but it’s up to you to make sure the current users  even-
              tually  finish  with it.  See lsmod(8)) for information on usage
              counts.