Bug 162027 - reiserfs hangs system
reiserfs hangs system
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-29 05:49 EDT by Tom Eicher
Modified: 2015-01-04 17:20 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-07-05 15:36:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
trace.txt (43.37 KB, text/plain)
2005-07-05 05:19 EDT, Tom Eicher
no flags Details

  None (edit)
Description Tom Eicher 2005-06-29 05:49:17 EDT
Description of problem:
After a while of normal operation, the system arrives in a state with one or
more unterminatable processes. Even root cannot 'kill -9' those processes.
Already running processes, x apps etc continue as usual for a while, but will
eventually block, supposedly when they make a certain system call.

Everytime this occurred, I had at least one 'id' process in the set of
uninterruptable processes - I am not sure whether id is the culprit however. At
the most recent occurance of the problem, I had the following processes running
uninterruptibly under my user id after killing X:

gtar
cp (in the middle of unpacking a zip-file)
java (from sun-jdk, running eclipse)
id

Once in this state, the system has to be killed using the power switch. Pressing
Ctrl+Alt+Del will issue the usual console message, but not proceed any further.

Once in this state, 'ps -fe' will hang uninterruptibly, even as root. It will
print out a number of process lines, but then hang.

Version-Release number of selected component (if applicable):
coreutils-5.2.1-48
kernel-2.6.11-1.1369_FC4

How reproducible:
No steps, but happened 3 times in the past 24 hours.

Severity is high for me, but no steps.
Comment 1 Tom Eicher 2005-07-04 09:34:27 EDT
Upping severity as this means lost data and happens frequently.

The hang of processes seem not directly related to 'id' - I was able to
reproduce multiple times since the first occurrences, sometimes with an 'id'
process, sometimes without. Also, 'ps' did not always get blocked.

When running beagled (on a modified kernel with inotify support, see
http://beaglewiki.org/Main_Page) and entering a query into 'best', I can
reproduce the problem within 30secs.

Another set of processes that become uninterruptible:

root      2874     1  0 Jun29 ?        00:00:00 login -- tei
tei      32368     1  0 12:39 ?        00:01:40 [java] <defunct>
tei        717 32368  0 12:49 ?        00:00:59 [java] <defunct>
tei       3085     1  0 15:52 ?        00:00:00
/usr/lib/firefox-1.0.4/firefox-bin -UILocale en-US
tei       3216     1  0 15:53 ?        00:00:00 id -gn
tei       3293  2874  0 15:55 tty2     00:00:00 -bash
tei       3332     1  0 15:55 tty2     00:00:00 id -gn
root      3388  3352  0 15:56 tty3     00:00:00 su tei
tei       3389  3388  0 15:56 tty3     00:00:00 bash
tei       3391     1  0 15:56 tty3     00:00:00 id -gn
Comment 2 Tom Eicher 2005-07-05 05:17:11 EDT
Changing summary as I have found evidence that reiserfs is involved.

Once again in the system state that this PR is about, I got stack dumps using
Ctrl+ScrollLock. The set of uninterruptible user processes this time was:

  PID TTY      STAT   TIME COMMAND
 4141 ?        D      0:00 id -gn
 4132 ?        D      0:00 /bin/sh -c /home/tei/bin/childProcesses.sh 4011
 4012 ?        D      0:04 /opt/jdks/jdk1.5.0_03/bin/java -Xms128m -Xmx256m
/opt/eclipse/installs/I20050627-gtk/eclipse/startup.jar [...]
 2798 ?        Zsl    0:15 [nautilus] <defunct>

The matching stack traces follow below (will attach full log). Each of them is
waiting inside some reiserfs function. My system uses a mixture of reiserfs and
ext3 partitions:

/dev/hdc3 on / type ext3 (rw,user_xattr)
/dev/hdc1 on /boot type ext3 (rw)
/dev/hdc5 on /home type reiserfs (rw,user_xattr)
/dev/hdc6 on /opt type reiserfs (rw,user_xattr)


-------- stack dump obtained by pressing Ctrl+ScrollLock (excerpt) ---------

id            D 0000003D  2408  4141   4185                     (NOTLB)
f5b50d0c 00200082 0009a980 0000003d 000001fe 001de5fc d5a8d0cc e4ad8000 
       f6737c30 00000000 c5061640 000f4476 00000000 e4ad8000 c5061640 000f4476 
       00000002 e4ad8000 00000000 00000000 f8a27000 f5b50000 f8a27098 000a0fe0 
Call Trace:
 [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs]
 [<c011be01>] default_wake_function+0x0/0xc
 [<f8cd835e>] do_journal_begin_r+0x11f/0x2f0 [reiserfs]
 [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs]
 [<f8cc64f2>] reiserfs_dirty_inode+0x40/0x72 [reiserfs]
 [<c01f84c8>] selinux_inode_follow_link+0x58/0x64
 [<c01abf08>] __mark_inode_dirty+0x28/0x2e8
 [<c01281d4>] current_fs_time+0x4e/0x69
 [<c01a0444>] update_atime+0x45/0x72
 [<c018e287>] __link_path_walk+0x2ce/0x12db
 [<c015391d>] filemap_nopage+0x2c9/0x379
 [<c018f2d7>] link_path_walk+0x43/0xdb
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c018f582>] path_lookup+0x79/0x152
 [<c018fd2d>] open_namei+0x90/0x61b
 [<c018f582>] path_lookup+0x79/0x152
 [<c017a6dd>] filp_open+0x27/0x46
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c017ad8f>] sys_open+0x35/0xad
 [<c0103a51>] syscall_call+0x7/0xb

sh            D 0000003D  2828  4132      1          4389  4012 (NOTLB)
f5ae4d0c 00200082 00090001 0000003d 000001fe 001de5fc dc4f68b4 dbb94000 
       c03cb944 f5ae4d84 653b3780 000f4472 001e8480 dbb94000 653b3780 000f4472 
       653b3780 000f4472 01406f40 00000000 f8a27000 f5ae4000 f8a27098 000a0fe0 
Call Trace:
 [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs]
 [<c011be01>] default_wake_function+0x0/0xc
 [<f8cd835e>] do_journal_begin_r+0x11f/0x2f0 [reiserfs]
 [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs]
 [<f8cc64f2>] reiserfs_dirty_inode+0x40/0x72 [reiserfs]
 [<c01f84c8>] selinux_inode_follow_link+0x58/0x64
 [<c01abf08>] __mark_inode_dirty+0x28/0x2e8
 [<c01281d4>] current_fs_time+0x4e/0x69
 [<c01a0444>] update_atime+0x45/0x72
 [<c018e287>] __link_path_walk+0x2ce/0x12db
 [<c015391d>] filemap_nopage+0x2c9/0x379
 [<c018f2d7>] link_path_walk+0x43/0xdb
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c018f582>] path_lookup+0x79/0x152
 [<c018fd2d>] open_namei+0x90/0x61b
 [<c018f582>] path_lookup+0x79/0x152
 [<c017a6dd>] filp_open+0x27/0x46
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c017ad8f>] sys_open+0x35/0xad
 [<c0103a51>] syscall_call+0x7/0xb

java          D 0000000B  1176  4012      1          4132  3764 (NOTLB)
f5b8ac40 00200082 f5b8ace0 0000000b 00000000 00000000 f5b8ad0c f5920400 
       00000001 00000002 d913e480 000f4428 00000000 e4831aa0 d913e480 000f4428 
       00200086 00000000 92ce65c0 00000000 f5920288 ddd3d050 e4831aa0 c038df47 
Call Trace:
 [<c0373b1f>] rwsem_down_read_failed+0xaf/0x2b8
 [<f8ccec7c>] pathrelse+0x26/0x3b [reiserfs]
 [<f8cbbeea>] init_inode+0x348/0x4c8 [reiserfs]
 [<f8cde685>] .text.lock.xattr+0xd5/0x250 [reiserfs]
 [<c01f6546>] inode_doinit_with_dentry+0x410/0x66c
 [<c019ae29>] d_splice_alias+0x13c/0x2c8
 [<f8cbc754>] reiserfs_iget+0x52/0x86 [reiserfs]
 [<f8cbc5d0>] reiserfs_init_locked_inode+0x0/0xe [reiserfs]
 [<f8cb7d2f>] reiserfs_lookup+0xd9/0x108 [reiserfs]
 [<c018db67>] real_lookup+0xb8/0xd7
 [<c018df9f>] do_lookup+0x7c/0x96
 [<c018e8e3>] __link_path_walk+0x92a/0x12db
 [<c018f2d7>] link_path_walk+0x43/0xdb
 [<c018f582>] path_lookup+0x79/0x152
 [<c018f7a1>] __user_walk+0x21/0x31
 [<c018841e>] vfs_stat+0x14/0x3a
 [<c01889ad>] sys_stat64+0xf/0x28
 [<c0214f56>] copy_to_user+0x42/0x60
 [<c01277ed>] sys_gettimeofday+0x22/0x58
 [<c0103a51>] syscall_call+0x7/0xb

nautilus      X B7EEA708  1580  2798      1          2830  2417 (L-TLB)
f4213f54 00200046 c011e0b4 b7eea708 00000001 c01240ff 00000000 00000000 
       00000000 c1b29980 9095c4c0 000f44ba 00000000 f3ab6550 9095c4c0 000f44ba 
       9095c4c0 000f44ba cbda0e80 00000002 f3ab6550 f7d5bde0 00000000 00000100 
Call Trace:
 [<c011e0b4>] mm_release+0x7c/0x83
 [<c01240ff>] exit_notify+0x33/0x9e6
 [<c0124d91>] do_exit+0x2df/0x500
 [<c03719e5>] schedule+0x55/0x7b3
 [<c0145338>] do_futex+0x75/0x7c
 [<c012507c>] do_group_exit+0x9e/0x1f5
 [<c0103a51>] syscall_call+0x7/0xb
nautilus      D 00000000  1264  2830      1          2831  2798 (NOTLB)
f09cdd98 00200082 001e2243 00000000 00000000 002cffff 000106ec 00000000 
       00000000 00000001 d5ae2440 000f4428 0001b207 f09c9550 d5ae2440 000f4428 
       d5ae2440 000f4428 16694e00 00000000 f5920288 dc97cb90 f09c9550 c038df47 
Call Trace:
 [<c0373b1f>] rwsem_down_read_failed+0xaf/0x2b8
 [<f8cbd408>] reiserfs_new_inode+0x5b0/0x738 [reiserfs]
 [<f8cde685>] .text.lock.xattr+0xd5/0x250 [reiserfs]
 [<c01f6546>] inode_doinit_with_dentry+0x410/0x66c
 [<c019a4f5>] d_instantiate+0xb0/0x1c2
 [<f8cb84bc>] reiserfs_create+0x1db/0x1f4 [reiserfs]
 [<c018f9e9>] vfs_create+0xd9/0x139
 [<c0190203>] open_namei+0x566/0x61b
 [<c017a6dd>] filp_open+0x27/0x46
 [<c018d4f7>] getname+0x87/0xc5
 [<c017ad8f>] sys_open+0x35/0xad
 [<c0103a51>] syscall_call+0x7/0xb
nautilus      D 00000468  1280  2831      1          3527  2830 (NOTLB)
f09cfc80 00200082 00200246 00000468 00000030 f5920000 0000f529 00000034 
       00000000 f8ccf785 d913e480 000f4428 00000000 f09c9000 d913e480 000f4428 
       00000001 00000000 b1251f00 00000000 f8a27000 f09cf000 f8a27098 f5920400 
Call Trace:
 [<f8ccf785>] search_by_key+0x70f/0xf32 [reiserfs]
 [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs]
 [<c011be01>] default_wake_function+0x0/0xc
 [<f8cd8fbe>] check_journal_end+0x13d/0x281 [reiserfs]
 [<f8cd9685>] do_journal_end+0xef/0x9cf [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<f8cb9d6e>] reiserfs_delete_inode+0x1e/0xe5 [reiserfs]
 [<f8cd8292>] do_journal_begin_r+0x53/0x2f0 [reiserfs]
 [<c0197aa6>] dput+0x12c/0x6c1
 [<c019d49e>] destroy_inode+0x3f/0x4e
 [<f8cd84cb>] do_journal_begin_r+0x28c/0x2f0 [reiserfs]
 [<f8cdcd05>] __xattr_readdir+0x27c/0x3d2 [reiserfs]
 [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs]
 [<f8cb8a0a>] reiserfs_rmdir+0x38/0x2a3 [reiserfs]
 [<c0371cad>] schedule+0x31d/0x7b3
 [<c0197aa6>] dput+0x12c/0x6c1
 [<c01909e1>] dentry_unhash+0x1df/0x31d
 [<c018d696>] permission+0x49/0xb5
 [<c0190d0f>] vfs_rmdir+0x1f0/0x209
 [<f8cdd7b5>] reiserfs_delete_xattrs+0x157/0x1ec [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<f8cb9d9d>] reiserfs_delete_inode+0x4d/0xe5 [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<c019fdaf>] generic_delete_inode+0xb0/0x340
 [<c0197aa6>] dput+0x12c/0x6c1
 [<c01a03a5>] iput+0x35/0x68
 [<c0190f88>] vfs_unlink+0x196/0x1fd
 [<c01a03cc>] iput+0x5c/0x68
 [<c01910cb>] sys_unlink+0xdc/0x119
 [<c0103a51>] syscall_call+0x7/0xb
Comment 3 Tom Eicher 2005-07-05 05:19:08 EDT
Created attachment 116347 [details]
trace.txt

Full process dump obtained when the said four processes were hung.
Comment 4 Tom Eicher 2005-07-05 09:52:09 EDT
confirm that moving everything to ext3 / not having any reiserfs mounts solves
the problem.
Comment 5 Dave Jones 2005-07-05 15:36:05 EDT
reiserfs isn't supported by Fedora, your best bet to get this resolved is to
file a report at http://bugme.osdl.org

Fedora will then inherit any fixes that occur upstream in reiserfs.
Comment 6 Tom Eicher 2005-07-05 15:59:57 EDT
(In reply to comment #5)
> file a report at http://bugme.osdl.org

http://bugme.osdl.org/show_bug.cgi?id=4850

Note You need to log in before you can comment on or make changes to this bug.