Hide Forgot
Created attachment 556212 [details] strace of vi hanging in an unlink() call Description of problem: I have an encrypted ext2 volume which I have been using under F15 for storing passwords and other private data. It's a 10MB regular file. For working with it, I loop-mount the file, open the loop device with cryptsetup luksOpen, and mount the crypt device. Since updating to F16, when I try to open a file on this device with "vi", the process will hang in uninterruptable sleep forever. The file system can't be unmounted any more and the system won't be able to shut down cleanly. Version-Release number of selected component (if applicable): 3.1.9-1.fc16.x86_64 How reproducible: So far, every time I tried (3x) Steps to Reproduce: 1. mount encrypted volume (I am using the options "noatime,nodiratime,sync") 2. open some file on the volume with "vi" Actual results: vi never opens the file and hangs forever, not killable with kill -9 Expected results: file is opened cleanly Additional info: Strace attached. Before mounting this volume, I did an e2fsck that found a few minor problems in the file system (caused by the previous occurence of this problem) which e2fsck was able to fix: /dev/mapper/vol2 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Extended attribute block 306 has reference count 246, should be 245. Fix<y>? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (7098, counted=7113). Fix<y>? yes Free blocks count wrong (7098, counted=7113). Fix<y>? yes Inode bitmap differences: -256 -(258--259) Fix<y>? yes Free inodes count wrong for group #0 (1788, counted=1792). Fix<y>? yes Free inodes count wrong (1788, counted=1792). Fix<y>? yes /dev/mapper/vol2: ***** FILE SYSTEM WAS MODIFIED ***** /dev/mapper/vol2: 256/2048 files (0.8% non-contiguous), 1079/8192 blocks From /proc/mounts: /dev/mapper/vol2 /mnt/vol2 ext2 rw,sync,seclabel,noatime,nodiratime,user_xattr,acl,barrier=1 0 0
Created attachment 556213 [details] dmesg with output of sysrq-l, sysrq-w, sysrq-t
Just for clarification, this never happened to me with F15, but so far every time on F16.
This has nothing todo with loop mount or encryption, only with the "sync" mount option. I just reproduced it with a normal LVM volume mounted with "-o sync". Doesn't happen on the same volume if "-o sync" is left out. /dev/mapper/vg-S /mnt/james2 ext2 rw,sync,seclabel,relatime,user_xattr,acl,barrier=1 0 0
Here is a tiny reproducer in python. Works on any sync-mounted ex2 fs. from os import open as osopen, rename, unlink, close, O_RDWR, O_CREAT f1=osopen("f1", O_RDWR|O_CREAT) close(f1) unlink("f1") # this works f2=osopen("f2", O_RDWR|O_CREAT) f1=osopen("f1", O_RDWR|O_CREAT) close(f1) unlink("f1") # this hangs forever
In F16 (and newer) we're using the ext4 module to mount ext2/ext3 filesystems. I wonder if that has anything to do with this behavior. Eric?
This is how the stack of the blocked processes looks like. [ 1515.121557] python D ffff8801200ae080 0 3295 3034 0x00000000 [ 1515.121561] ffff8800b0a87b58 0000000000000086 ffff880100000000 0000000000000000 [ 1515.121565] ffff8801200adcc0 ffff8800b0a87fd8 ffff8800b0a87fd8 ffff8800b0a87fd8 [ 1515.121570] ffffffff81a0d020 ffff8801200adcc0 ffff8800b0a87b58 000000018109824d [ 1515.121573] Call Trace: [ 1515.121577] [<ffffffff811a07b0>] ? __wait_on_buffer+0x30/0x30 [ 1515.121579] [<ffffffff815d343f>] schedule+0x3f/0x60 [ 1515.121582] [<ffffffff815d34ef>] io_schedule+0x8f/0xd0 [ 1515.121585] [<ffffffff811a07be>] sleep_on_buffer+0xe/0x20 [ 1515.121588] [<ffffffff815d3bba>] __wait_on_bit_lock+0x5a/0xc0 [ 1515.121591] [<ffffffff811a07b0>] ? __wait_on_buffer+0x30/0x30 [ 1515.121594] [<ffffffff815d3c9c>] out_of_line_wait_on_bit_lock+0x7c/0x90 [ 1515.121597] [<ffffffff8108e860>] ? autoremove_wake_function+0x50/0x50 [ 1515.121601] [<ffffffff811a0806>] __lock_buffer+0x36/0x40 [ 1515.121604] [<ffffffff811a1a85>] __sync_dirty_buffer+0xc5/0xd0 [ 1515.121607] [<ffffffff811a1aa3>] sync_dirty_buffer+0x13/0x20 [ 1515.121610] [<ffffffff81217370>] __ext4_handle_dirty_metadata+0xb0/0x120 [ 1515.121613] [<ffffffff8108e810>] ? remove_wait_queue+0x50/0x50 [ 1515.121616] [<ffffffff81225e84>] ext4_xattr_release_block+0xd4/0x190 [ 1515.121619] [<ffffffff81227c2e>] ext4_xattr_delete_inode+0xbe/0x100 [ 1515.121622] [<ffffffff811ec441>] ext4_free_inode+0x71/0x590 [ 1515.121625] [<ffffffff811f2bc7>] ? ext4_mark_inode_dirty+0x87/0x210 [ 1515.121628] [<ffffffff811f4dd6>] ? ext4_evict_inode+0x276/0x430 [ 1515.121631] [<ffffffff811f4eb3>] ext4_evict_inode+0x353/0x430 [ 1515.121634] [<ffffffff8118bd9a>] evict+0x9a/0x1a0 [ 1515.121636] [<ffffffff8118bfa3>] iput+0x103/0x200 [ 1515.121639] [<ffffffff811815d3>] do_unlinkat+0x153/0x1c0 [ 1515.121642] [<ffffffff8108239a>] ? sys_rt_sigaction+0xba/0xd0 [ 1515.121645] [<ffffffff81182a86>] sys_unlink+0x16/0x20 [ 1515.121648] [<ffffffff815dd1c2>] system_call_fastpath+0x16/0x1b
(In reply to comment #5) > In F16 (and newer) we're using the ext4 module to mount ext2/ext3 filesystems. > I wonder if that has anything to do with this behavior. The same test works fine on an ext4-formatted partition.
The problem *only* occurs when mounting a *ext2-formatted block device*. It doesn't happen with ext4- or ext3-formatted devices, even if I use "mount -t ext2" (which is only legal for ext3-formatted device, anyway). [root@cooper ~]# mke2fs -t ext4 /dev/vg/Q [root@cooper ~]# mount -t ext4 -o sync /dev/vg/Q /mnt/q [root@cooper ~]# (cd /mnt/q; python /tmp/test.py) -> OK [root@cooper ~]# umount /mnt/q [root@cooper ~]# mount -t ext3 -o sync /dev/vg/Q /mnt/q -> mount fails [root@cooper ~]# mount -t ext2 -o sync /dev/vg/Q /mnt/q -> mount fails [root@cooper ~]# mke2fs -t ext3 /dev/vg/Q [root@cooper ~]# mount -t ext3 -o sync /dev/vg/Q /mnt/q [root@cooper ~]# (cd /mnt/q; python /tmp/test.py) -> OK [root@cooper ~]# umount /mnt/q [root@cooper ~]# mount -t ext2 -o sync /dev/vg/Q /mnt/q [root@cooper ~]# (cd /mnt/q; python /tmp/test.py) -> OK [root@cooper ~]# umount /mnt/q [root@cooper ~]# mke2fs /dev/vg/Q [root@cooper ~]# mount -t ext2 -o sync /dev/vg/Q /mnt/q [root@cooper ~]# (cd /mnt/q; python /tmp/test.py) -> HANGS
If you do your test mounting it with an explicit selinux context does the problem go away? something like: mount -o context=system_u:object_r:nfs_t:s0 -o sync .... it's hanging up in the xattr sync path, presumably that's selinux stuff going on. Weird that it's in io_schedule though...
It happens also with the the follwing options: /dev/mapper/vg-R /mnt/r ext2 rw,sync,context=system_u:object_r:nfs_t:s0,relatime,user_xattr,acl,barrier=1 0 0 However I found that the reproducer in comment 4 is not always sufficient to reproduce the problem. "vi" is always sufficient though.
* Please forget what I wrote in comment #8. * My reproducer was obviously not reliable. With "vi" as reproducer, I am seeing this problem also on ext4 partition mounted with -o sync: /dev/mapper/vg-T /mnt/t ext4 rw,sync,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0 As I said in the previous comment already, mounting with fixed selinux context doesn't help, either.
Created attachment 558402 [details] suggested patch The problem also occurs with upstream 3.3.0-rc1 when the ext4 module is used for an ext2 file system. I just sent this patch to the linux-ext4 mailing list. It solves the problem for me, but I'm certainly not knowledgeable enough to judge if it's correct.
To avoid confusion, I should add that my statement in comment #11 was wrong. I wish I could delete that comment. I have done more extensive testing and seen that the problem occurs only when ext2 filesystems are mounted with "-o sync" using the ext4 driver, as I correctly noted in comment #8. The options "nouser_xattr,noacl" do not change anything. Anyway, please review the patch from comment #12.
Martin, thanks for sending the patch, apologies for not digging into this sooner myself. I'll take a look. -Eric
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
I think this problem has been not solved yet, has it ? I've encountered the same or similar problem with rhel5.5, 2.6.18-194.el5. and I've also found out the cause of the problem, ... maybe. So could you verify my hypothesis is whether true or not. by the way, all written code below is the 2.6.18-194.el5. then, Here is the one of the stack traces in my case ------------------------------------------------------------------------ PID: 10146 TASK: ffff81043ff21860 CPU: 11 COMMAND: "run.sh" #0 [ffff8106dd6ed988] schedule at ffffffff80063f96 #1 [ffff8106dd6eda60] io_schedule at ffffffff800647ea #2 [ffff8106dd6eda80] sync_page at ffffffff80028b1a #3 [ffff8106dd6eda90] __wait_on_bit_lock at ffffffff8006492e #4 [ffff8106dd6edac0] __lock_page at ffffffff8003ff92 #5 [ffff8106dd6edb10] truncate_inode_pages_range at ffffffff8002b86a #6 [ffff8106dd6edbe0] vmtruncate at ffffffff800ced98 #7 [ffff8106dd6edc00] ll_setattr_raw at ffffffff889a1038 #8 [ffff8106dd6edea0] do_filp_open at ffffffff800274fb #9 [ffff8106dd6edf50] do_sys_open at ffffffff80019e1e #10 [ffff8106dd6edf80] system_call at ffffffff8005e116 RIP: 00000036d2ec58f0 RSP: 00007fffcf8e43f8 RFLAGS: 00000246 RAX: 0000000000000002 RBX: ffffffff8005e116 RCX: 0000000000000000 RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 000000000fb23080 RBP: 0000000000000000 R8: 0000000000000020 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000fb23080 R13: 0000000000000241 R14: 00007fffcf8e4620 R15: 000000000fb24650 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b ------------------------------------------------------------------------ __lock_page() calls __wait_on_bit_lock() with the sync_page() symbol which is the one in the same file, mm/filemap.c so, __wait_on_bit_lock() comes to call sync_page() in its while-loop and, obviously expect action(), in the case sync_page() thought, to returns non-zero value.but, as you can look at the below code, sync_page() never return non-zero value in this case. ------------------------------------------------------------------------ void fastcall __lock_page(struct page *page) { DEFINE_WAIT_BIT(wait, &page->flags, PG_locked); __wait_on_bit_lock(page_waitqueue(page), &wait, sync_page, TASK_UNINTERRUPTIBLE); } EXPORT_SYMBOL(__lock_page); ------------------------------------------------------------------------ int __sched fastcall __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q, int (*action)(void *), unsigned mode) { int ret = 0; do { prepare_to_wait_exclusive(wq, &q->wait, mode); if (test_bit(q->key.bit_nr, q->key.flags)) { if ((ret = (*action)(q->key.flags))) break; } } while (test_and_set_bit(q->key.bit_nr, q->key.flags)); finish_wait(wq, &q->wait); return ret; } EXPORT_SYMBOL(__wait_on_bit_lock); ------------------------------------------------------------------------ static int sync_page(void *word) { struct address_space *mapping; struct page *page; page = container_of((unsigned long *)word, struct page, flags); smp_mb(); mapping = page_mapping(page); if (mapping && mapping->a_ops && mapping->a_ops->sync_page) mapping->a_ops->sync_page(page); io_schedule(); return 0; } ------------------------------------------------------------------------ and I think the other problems like my case or have similar stack trace are caused by the same issue, i mean, q->key.bit_nr in __wait_on_bit() and __wait_on_bit_lock() is already set and action() never returns non-zero value. anyway, I know this site is not for the support, but I'm glad if you give me an answer or some opinion to my hypothesis.
Nozaki-san, > I think this problem has been not solved yet, has it ? I will respond to this separately later. > I've encountered the same or similar problem with rhel5.5, 2.6.18-194.el5. and > I've also found out the cause of the problem, ... maybe. So could you verify my > hypothesis is whether true or not. This bugzilla is against teh Fedora kernel, which is much newer than the RHEL5 kernel you are referring to. You should open a new bug. > so, __wait_on_bit_lock() comes to call sync_page() in its while-loop and, > obviously expect action(), in the case sync_page() thought, to returns >non-zero > value.but, as you can look at the below code, sync_page() never return >non-zero > value in this case. That doesn't matter because the condition that__wait_on_bit_lock waits for is actually "test_and_set_bit(q->key.bit_nr, q->key.flags)", not the result of action(). The non-zero result of action() exists only to signal an error condition. The fact that sync_page() never returns non-zero simply means that errors can't occur on this code path. If the kernel gets stuck with the stack trace you have shown, it means that the page you're looking at stays locked (PH_locked bit set) forever. This may or may not indicate a locking problem in the kernel.
This bug is indeed not fixed in FC16 kernel 3.3.2-6.fc16
The patch from comment #12, with Eric's additions, is scheduled for inclusion in kernel 3.4. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c1bb05a657fb3d8c6179a4ef7980261fae4521d7
Hi, Martin. I'm glad to receive your reply. >This bugzilla is against teh Fedora kernel, which is much newer than the RHEL5 >kernel you are referring to. You should open a new bug. I'm sorry. I'll do definitely that when I find out some bugs. >That doesn't matter because the condition that__wait_on_bit_lock waits for is >actually "test_and_set_bit(q->key.bit_nr, q->key.flags)", not the result of >action(). The non-zero result of action() exists only to signal an error >condition. The fact that sync_page() never returns non-zero simply means that >errors can't occur on this code path. Oh,i got it. It really makes sense what's you said, I never came up with the signal stuff. Then, now, I started suspecting another possibility and have found 2nd suspect in Lustre bugzilla. I'm not sure it gives you a hint to this case, but anyway, I left the link to the article here. it seems that the similar bug are caused by kmalloc()'s flag. (and I haven't applied this patch to my custom Lustre, actually). https://bugzilla.lustre.org/show_bug.cgi?id=21983 so, I'll recommend you to investigate kmalloc() flag is right or not. and thank you for your co-operation to me !!!
Looks like we should have submitted this to -stable. I'll do that.
Patch should be present in 3.3.7, which I think just got built for F16. Please retest when that hits the repos! Thanks, -Eric
I'm going to close this out for now. 3.3.7 was released a while ago, and we've been on 3.4 for a while now too. If this problem still occurs in Fedora with the latest 3.4 or 3.5 kernel, please reopen.