Bug 1033965
Summary: | kernel-3.13.0-0.rc1 kernels "break" sync - sync() fails to return, hangs, ... | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tom London <selinux> |
Component: | kernel | Assignee: | Eric Paris <eparis> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | aavati, eparis, esandeen, gansalmon, ikent, itamar, jonathan, kernel-maint, madhu.chinakonda, robatino, rwheeler, selinux |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-12-16 13:17:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tom London
2013-11-24 19:55:59 UTC
See this in journal: Nov 24 11:55:40 tlondon.localhost.org kernel: INFO: task sync:2388 blocked for more than 120 seconds. Nov 24 11:55:40 tlondon.localhost.org kernel: Not tainted 3.13.0-0.rc1.git1.1.fc21.x86_64 #1 Nov 24 11:55:40 tlondon.localhost.org kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 24 11:55:40 tlondon.localhost.org kernel: sync D ffff88008d8fcd70 4280 2388 2385 0x00000081 Nov 24 11:55:40 tlondon.localhost.org kernel: ffff88005df95e48 0000000000000046 ffff88005dd633a0 ffff88005df95fd8 Nov 24 11:55:40 tlondon.localhost.org kernel: 00000000001d5200 00000000001d5200 ffff88005dd633a0 ffff88005dd633a0 Nov 24 11:55:40 tlondon.localhost.org kernel: ffff8800a599d6d0 fffffffeffffffff ffff8800a599d6d8 ffffffff81230890 Nov 24 11:55:40 tlondon.localhost.org kernel: Call Trace: Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff81230890>] ? generic_write_sync+0x60/0x60 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff81758089>] schedule+0x29/0x70 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff8175c115>] rwsem_down_read_failed+0xc5/0x120 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff8138e544>] call_rwsem_down_read_failed+0x14/0x30 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff8175ba83>] ? down_read+0x83/0xa0 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff811fd69c>] ? iterate_supers+0x9c/0x110 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff811fd69c>] iterate_supers+0x9c/0x110 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff81230b42>] sys_sync+0x42/0xa0 Nov 24 11:55:40 tlondon.localhost.org kernel: [<ffffffff81766f50>] tracesys+0xdd/0xe2 Nov 24 11:55:40 tlondon.localhost.org kernel: 1 lock held by sync/2388: Nov 24 11:55:40 tlondon.localhost.org kernel: #0: (&type->s_umount_key#48){.+.+..}, at: [<ffffffff811fd69c>] iterate_supers+0x9c/0x110 What kind of filesystems are in use here? [tbl@tlondon ~]$ mount | grep '^/' /dev/mapper/vg_tlondon-lv_root on / type ext4 (rw,relatime,seclabel,data=ordered) /dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered) /dev/sdb1 on /run/media/tbl/LIVE type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,showexec,utf8,flush,errors=remount-ro) [tbl@tlondon ~]$ So, guessing ext4. The SD card is not overtly touched... Removed SD card and rebooted (to remove any possible confusion). 'sync' still hangs with just ext4 file systems: [tbl@tlondon ~]$ mount | grep '^/' /dev/mapper/vg_tlondon-lv_root on / type ext4 (rw,relatime,seclabel,data=ordered) /dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered) [tbl@tlondon ~]$ strace sync execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0 brk(0) = 0x1b24000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f747735f000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=166458, ...}) = 0 mmap(NULL, 166458, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7477336000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\342\v?\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2161176, ...}) = 0 mmap(0x3f0be00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3f0be00000 mprotect(0x3f0bfbe000, 2097152, PROT_NONE) = 0 mmap(0x3f0c1be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3f0c1be000 mmap(0x3f0c1c4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3f0c1c4000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7477335000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7477333000 arch_prctl(ARCH_SET_FS, 0x7f7477333740) = 0 mprotect(0x605000, 4096, PROT_READ) = 0 mprotect(0x3f0c1be000, 16384, PROT_READ) = 0 mprotect(0x3f0bc21000, 4096, PROT_READ) = 0 munmap(0x7f7477336000, 166458) = 0 brk(0) = 0x1b24000 brk(0x1b45000) = 0x1b45000 brk(0) = 0x1b45000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106353888, ...}) = 0 mmap(NULL, 106353888, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7470dc5000 close(3) = 0 sync( Looks like down_read(&sb->s_umount); in iterate_supers failed, and we're waiting for it. Not sure who would have it... ext4 doesn't manipulate this lock directly... not sure what to make of this yet. Still hangs with kernel-3.13.0-0.rc1.git2.1.fc21.x86_64 Still hanging with kernel-3.13.0-0.rc2.git0.1.fc21.x86_64. Last working kernel for me: kernel-3.13.0-0.rc0.git10.1.fc21.x86_64 [tbl@tlondon ~]$ strace sync execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0 brk(0) = 0x2089000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5797caa000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=166679, ...}) = 0 mmap(NULL, 166679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5797c81000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\242\3=\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2161008, ...}) = 0 mmap(0x3d03a00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3d03a00000 mprotect(0x3d03bbe000, 2097152, PROT_NONE) = 0 mmap(0x3d03dbe000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3d03dbe000 mmap(0x3d03dc4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3d03dc4000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5797c80000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5797c7e000 arch_prctl(ARCH_SET_FS, 0x7f5797c7e740) = 0 mprotect(0x605000, 4096, PROT_READ) = 0 mprotect(0x3d03dbe000, 16384, PROT_READ) = 0 mprotect(0x3d03821000, 4096, PROT_READ) = 0 munmap(0x7f5797c81000, 166679) = 0 brk(0) = 0x2089000 brk(0x20aa000) = 0x20aa000 brk(0) = 0x20aa000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0 mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f579170d000 close(3) = 0 sync( More I can do/help? I'm not able to reproduce this with 3.13.0-rc1 upstream on my Fedora 18 system so a git bisect can't be done. It doesn't look related to user space due to the strace. The only other way we might be able get some leads on this is to inspect a crash dump. Ian (In reply to Ian Kent from comment #10) > I'm not able to reproduce this with 3.13.0-rc1 upstream on my > Fedora 18 system so a git bisect can't be done. It also doesn't occur on a newly installed and updated f19 vm. Problem continues for me with kernel-3.13.0-0.rc2.git2.1.fc21.x86_64: [tbl@tlondon ~]$ strace sync execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0 brk(0) = 0x10db000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b50392000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=166679, ...}) = 0 mmap(NULL, 166679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6b50369000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\242\3=\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2161008, ...}) = 0 mmap(0x3d03a00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3d03a00000 mprotect(0x3d03bbe000, 2097152, PROT_NONE) = 0 mmap(0x3d03dbe000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3d03dbe000 mmap(0x3d03dc4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3d03dc4000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b50368000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b50366000 arch_prctl(ARCH_SET_FS, 0x7f6b50366740) = 0 mprotect(0x605000, 4096, PROT_READ) = 0 mprotect(0x3d03dbe000, 16384, PROT_READ) = 0 mprotect(0x3d03821000, 4096, PROT_READ) = 0 munmap(0x7f6b50369000, 166679) = 0 brk(0) = 0x10db000 brk(0x10fc000) = 0x10fc000 brk(0) = 0x10fc000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0 mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6b49df5000 close(3) = 0 sync( Works for me with latest nodebug kernel kernel-3.12.2-3.fc21.x86_64: [tbl@tlondon ~]$ strace sync execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0 brk(0) = 0x159c000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0a5adfb000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=166679, ...}) = 0 mmap(NULL, 166679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a5add2000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\242\3=\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2161008, ...}) = 0 mmap(0x3d03a00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3d03a00000 mprotect(0x3d03bbe000, 2097152, PROT_NONE) = 0 mmap(0x3d03dbe000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3d03dbe000 mmap(0x3d03dc4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3d03dc4000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0a5add1000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0a5adcf000 arch_prctl(ARCH_SET_FS, 0x7f0a5adcf740) = 0 mprotect(0x605000, 4096, PROT_READ) = 0 mprotect(0x3d03dbe000, 16384, PROT_READ) = 0 mprotect(0x3d03821000, 4096, PROT_READ) = 0 munmap(0x7f0a5add2000, 166679) = 0 brk(0) = 0x159c000 brk(0x15bd000) = 0x15bd000 brk(0) = 0x15bd000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0 mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a5485e000 close(3) = 0 sync() = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ [tbl@tlondon ~]$ Broken for me when using a 3.13 kernel (including 3.13.0-0.rc2.git2.1.fc21.x86_64). Worked with 3.12. Still broken with kernel-3.13.0-0.rc2.git3.1.fc21.x86_64: Dec 06 06:47:40 tlondon.localhost.org kernel: INFO: task sync:3308 blocked for more than 120 seconds. Dec 06 06:47:41 tlondon.localhost.org kernel: Not tainted 3.13.0-0.rc2.git3.1.fc21.x86_64 #1 Dec 06 06:47:41 tlondon.localhost.org kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 06 06:47:41 tlondon.localhost.org kernel: sync D ffffffff81c134c0 4272 3308 3305 0x00000081 Dec 06 06:47:41 tlondon.localhost.org kernel: ffff88000bc7fe48 0000000000000046 ffff880113280000 ffff88000bc7ffd8 Dec 06 06:47:41 tlondon.localhost.org kernel: 00000000001d5200 00000000001d5200 ffff880113280000 ffff880113280000 Dec 06 06:47:41 tlondon.localhost.org kernel: ffff88009c5d4588 fffffffeffffffff ffff88009c5d4590 ffffffff81230f30 Dec 06 06:47:41 tlondon.localhost.org kernel: Call Trace: Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff81230f30>] ? generic_write_sync+0x60/0x60 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff81758d49>] schedule+0x29/0x70 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff8175ce55>] rwsem_down_read_failed+0xc5/0x120 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff8138ebd4>] call_rwsem_down_read_failed+0x14/0x30 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff8175c7c3>] ? down_read+0x83/0xa0 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff811fdddc>] ? iterate_supers+0x9c/0x110 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff811fdddc>] iterate_supers+0x9c/0x110 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff812311e2>] sys_sync+0x42/0xa0 Dec 06 06:47:41 tlondon.localhost.org kernel: [<ffffffff81767c50>] tracesys+0xdd/0xe2 Dec 06 06:47:41 tlondon.localhost.org kernel: 1 lock held by sync/3308: Dec 06 06:47:41 tlondon.localhost.org kernel: #0: (&type->s_umount_key#48){.+.+..}, at: [<ffffffff811fdddc>] iterate_supers+0x9c/0x110 [tbl@tlondon ~]$ strace sync execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0 brk(0) = 0x2573000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6515b80000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=166473, ...}) = 0 mmap(NULL, 166473, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6515b57000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\302\2674\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156912, ...}) = 0 mmap(0x34b7c00000, 3961408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34b7c00000 mprotect(0x34b7dbe000, 2093056, PROT_NONE) = 0 mmap(0x34b7fbd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x34b7fbd000 mmap(0x34b7fc3000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34b7fc3000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6515b56000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6515b54000 arch_prctl(ARCH_SET_FS, 0x7f6515b54740) = 0 mprotect(0x605000, 4096, PROT_READ) = 0 mprotect(0x34b7fbd000, 16384, PROT_READ) = 0 mprotect(0x34b7a21000, 4096, PROT_READ) = 0 munmap(0x7f6515b57000, 166473) = 0 brk(0) = 0x2573000 brk(0x2594000) = 0x2594000 brk(0) = 0x2594000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0 mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f650f5e3000 close(3) = 0 sync( Same hang with kernel-3.13.0-0.rc2.git6.1.fc21.x86_64 I've found no hint of this issue on the LKML mailing list. Seems to be Fedora only. (In reply to Andre Robatino from comment #17) > I've found no hint of this issue on the LKML mailing list. Seems to be > Fedora only. It not being on LKML doesn't really mean anything. It could just mean it's difficult to hit unless you have the right variables in play. I hit this once with 3.13.0-0.rc1.git3, but haven't been able to hit it again. I'll poke at it more tomorrow. Tom, so far we know 3.12 doesn't have this issue for you. Since you seem to be able to hit this fairly easily, could you go through the rc0.gitX builds in koji and try them one at a time to see which kernel first introduces the issue for you? http://koji.fedoraproject.org/koji/packageinfo?packageID=8 Also, if there are things you're doing before running sync, that would be helpful to know. Or do you get the sync hang if you run sync immediately after boot? The last "working for me" kernel is kernel-3.13.0-0.rc0.git10.1.fc21.x86_64 (I'll retest this just to be sure). The first "doesn't work for me kernel" is (as reported above) kernel-3.13.0-0.rc1.git0.1.fc21.x86_64. Also, the 3.12 nodebug kernels work for me: I'm currently booting kernel-3.12.2-3.fc21.x86_64. I tried to bisect this on my own (using wiki https://fedoraproject.org/wiki/User:Ignatenkobrain/Kernel/Bisection as a cookbook). Unfortunately, after the first step, the builds fail: ~/build/BUILD/kernel-3.11.fc21/linux-3.12.0-0.rc2.git999.1.3ea7a560.fc21.x86_64 /var/tmp/rpm-tmp.yTd1pO: line 295: pushd: tools/thermal/tmon/: No such file or directory error: Bad exit status from /var/tmp/rpm-tmp.yTd1pO (%build) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.yTd1pO (%build) Child return code was: 1 I tried skipping once, but the error remained. (I must be doing something wrong....). This is how I set it up: git bisect bad v3.13-rc1 git bisect good v3.12-10928-g527d151 ~/kernel-package/kernel-package.py mock -r fedora-rawhide-x86_64 --rebuild sources/*.src.rpm --resultdir sources/rpms [BTW, I discovered a few things attempting this. 1) If 'sync' hangs, then so does mock, and 2) kernel builds take a really long time, especially on my Thinkpad X200 .... ;-)] From the reports on this ticket, I think I've tried most of the kernels after kernel-3.13.0-0.rc1.git0.1.fc21.x86_64. Would it be helpful for me to be comprehensive? Regarding how I setup/run sync: I boot to gnome, bring up a terminal window, and run either 'sync' or 'strace sync'. Appears to hang if I do this immediately, or if I do some "other stuff" and then try to sync. Yup, works with kernel-3.13.0-0.rc0.git10.1.fc21.x86_64 [tbl@tlondon ~]$ uname -a Linux tlondon.localhost.org 3.13.0-0.rc0.git10.1.fc21.x86_64 #1 SMP Thu Nov 21 01:03:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [tbl@tlondon ~]$ strace sync execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0 brk(0) = 0xad6000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faaf4a2a000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=166473, ...}) = 0 mmap(NULL, 166473, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faaf4a01000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\302\2674\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2156912, ...}) = 0 mmap(0x34b7c00000, 3961408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34b7c00000 mprotect(0x34b7dbe000, 2093056, PROT_NONE) = 0 mmap(0x34b7fbd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x34b7fbd000 mmap(0x34b7fc3000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34b7fc3000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faaf4a00000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faaf49fe000 arch_prctl(ARCH_SET_FS, 0x7faaf49fe740) = 0 mprotect(0x605000, 4096, PROT_READ) = 0 mprotect(0x34b7fbd000, 16384, PROT_READ) = 0 mprotect(0x34b7a21000, 4096, PROT_READ) = 0 munmap(0x7faaf4a01000, 166473) = 0 brk(0) = 0xad6000 brk(0xaf7000) = 0xaf7000 brk(0) = 0xaf7000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0 mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faaee48d000 close(3) = 0 sync() = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? +++ exited with 0 +++ [tbl@tlondon ~]$ Tom, can you install the kernel here: http://jwboyer.fedorapeople.org/pub/bug1033965/ and see if it recreates? It's between rc0-git10 and rc1. At a minimum it will help narrow down the range where this broke. kernel-3.13.0-0.rc0.git11.1.fc21.x86_64.rpm still hangs for me: Dec 09 06:45:13 tlondon.localhost.org kernel: INFO: task sync:2300 blocked for more than 120 seconds. Dec 09 06:45:13 tlondon.localhost.org kernel: Not tainted 3.13.0-0.rc0.git11.1.fc21.x86_64 #1 Dec 09 06:45:13 tlondon.localhost.org kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 09 06:45:13 tlondon.localhost.org kernel: sync D ffff880134d499d0 4280 2300 2297 0x00000081 Dec 09 06:45:13 tlondon.localhost.org kernel: ffff880062973e48 0000000000000046 ffff880083444d70 ffff880062973fd8 Dec 09 06:45:13 tlondon.localhost.org kernel: 00000000001d5200 00000000001d5200 ffff880083444d70 ffff880083444d70 Dec 09 06:45:13 tlondon.localhost.org kernel: ffff8800989fd6d0 fffffffeffffffff ffff8800989fd6d8 ffffffff81231060 Dec 09 06:45:13 tlondon.localhost.org kernel: Call Trace: Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff81231060>] ? generic_write_sync+0x60/0x60 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff81758f19>] schedule+0x29/0x70 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff8175cf85>] rwsem_down_read_failed+0xc5/0x120 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff8138f014>] call_rwsem_down_read_failed+0x14/0x30 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff8175c8f3>] ? down_read+0x83/0xa0 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff811fddfc>] ? iterate_supers+0x9c/0x110 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff811fddfc>] iterate_supers+0x9c/0x110 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff81231312>] sys_sync+0x42/0xa0 Dec 09 06:45:13 tlondon.localhost.org kernel: [<ffffffff81767dd0>] tracesys+0xdd/0xe2 Dec 09 06:45:13 tlondon.localhost.org kernel: 1 lock held by sync/2300: Dec 09 06:45:13 tlondon.localhost.org kernel: #0: (&type->s_umount_key#47){.+.+..}, at: [<ffffffff811fddfc>] iterate_supers+0x9c/0x110 Same problem with kernel-3.13.0-0.rc3.git0.2.fc21.x86_64. If I go directly to a VT and do a sync there, before logging into GNOME, it works. After logging into GNOME, it fails. Wow.... Nice....I didn't catch that. I installed kernel-3.13.0-0.rc3.git0.2.fc21.x86_64 and rebooted (permissive) to run level 3. Logged in as root and ran 'strace sync'. It completed. I repeated. It completed. I then ran 'telinit 5', logged in to gnome/gdm, created terminal window, and ran 'strace sync' as root. It hung as above. Dazed, confused, ... I can recreate this by logging into gnome as well. It also recreates on the raw upstream git commit corresponding to the Fedora git11 snapshot, so it isn't Fedora specific. Doing a git bisect at the moment. The only relevant thing I've noticed when sync hangs: Showing all locks held in the system: [ 243.534376] 2 locks held by fusermount/1415: [ 243.534377] #0: (&type->s_umount_key#46/1){+.+.+.}, at: [<ffffffff811fd4b7>] sget+0x237/0x460 [ 243.534383] #1: (&sbsec->lock){+.+.+.}, at: [<ffffffff8131c15c>] selinux_set_mnt_opts+0x7c/0x680 [ 243.534402] 1 lock held by sync/2078: [ 243.534403] #0: (&type->s_umount_key#52){.+.+..}, at: [<ffffffff811fddfc>] iterate_supers+0x9c/0x110 [ 243.534408] 3 locks held by bash/2229: [ 243.534409] #0: (sb_writers#4){.+.+.+}, at: [<ffffffff811fa243>] vfs_write+0x173/0x1f0 [ 243.534413] #1: (sysrq_key_table_lock){......}, at: [<ffffffff8146aac4>] __handle_sysrq+0x24/0x1a0 [ 243.534417] #2: (tasklist_lock){.+.?..}, at: [<ffffffff810ccf73>] debug_show_all_locks+0x43/0x2a0 @Tom: I didn't have to boot in runlevel 3. I just booted in runlevel 5, as usual, then logged into a VT instead of GNOME, and ran sync, which worked. After logging into GNOME, it doesn't work (in either GNOME itself or the VT). I bisected this down to the commit below. Eric, Anand, thoughts? commit 102aefdda4d8275ce7d7100bc16c88c74272b260 Author: Anand Avati <avati> Date: Tue Apr 16 18:56:19 2013 -0400 selinux: consider filesystem subtype in policies Not considering sub filesystem has the following limitation. Support for SELinux in FUSE is dependent on the particular userspace filesystem, which is identified by the subtype. For e.g, GlusterFS, a FUSE based filesystem supports SELinux (by mounting and processing FUSE requests in different threads, avoiding the mount time deadlock), whereas other FUSE based filesystems (identified by a different subtype) have the mount time deadlock. By considering the subtype of the filesytem in the SELinux policies, allows us to specify a filesystem subtype, in the following way: fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0); This way not all FUSE filesystems are put in the same bucket and subjected to the limitations of the other subtypes. Signed-off-by: Anand Avati <avati> Signed-off-by: Eric Paris <eparis> The backtraces for the two processes involved in the locks are here: [ 152.923866] fusermount S ffff88031cee5c00 4176 1420 1401 0x00000080 [ 152.923869] ffff880309367c10 0000000000000046 00000000001d5140 ffff880309367fd8 [ 152.923873] ffff880309367fd8 00000000001d5140 ffff88030954ae00 ffff8803090d67b0 [ 152.923876] ffff880309367c50 ffff8803090d69b0 0000000000000000 ffff88030954ae00 [ 152.923880] Call Trace: [ 152.923883] [<ffffffff81705959>] schedule+0x29/0x70 [ 152.923888] [<ffffffffa0661d15>] __fuse_get_req+0x185/0x270 [fuse] [ 152.923892] [<ffffffff81096850>] ? wake_up_bit+0x30/0x30 [ 152.923896] [<ffffffffa0661e10>] fuse_get_req+0x10/0x20 [fuse] [ 152.923900] [<ffffffffa06651df>] fuse_getxattr+0x4f/0x160 [fuse] [ 152.923903] [<ffffffff812eb4b5>] sb_finish_set_opts+0x215/0x340 [ 152.923906] [<ffffffff812eb841>] selinux_set_mnt_opts+0x261/0x610 [ 152.923909] [<ffffffff812e5e2a>] ? selinux_parse_opts_str+0x1ba/0x2a0 [ 152.923912] [<ffffffff812ebc77>] selinux_sb_kern_mount+0x87/0x150 [ 152.923916] [<ffffffff812e0cf6>] security_sb_kern_mount+0x16/0x20 [ 152.923919] [<ffffffff811da20a>] mount_fs+0x8a/0x1b0 [ 152.923922] [<ffffffff811f7aa3>] vfs_kern_mount+0x63/0xf0 [ 152.923925] [<ffffffff811fa36e>] do_mount+0x23e/0xa20 [ 152.923928] [<ffffffff81166964>] ? __get_free_pages+0x14/0x50 [ 152.923931] [<ffffffff811f9fb6>] ? copy_mount_options+0x36/0x170 [ 152.923934] [<ffffffff811fabd3>] SyS_mount+0x83/0xc0 [ 152.923937] [<ffffffff81711499>] system_call_fastpath+0x16/0x1b [ 152.936687] sync D ffff88031cee1700 4176 2023 1987 0x00000080 [ 152.936691] ffff8802fc507e48 0000000000000046 00000000001d5140 ffff8802fc507fd8 [ 152.936694] ffff8802fc507fd8 00000000001d5140 ffff8802da9e9700 ffff8802da9e9700 [ 152.936697] ffff8803090d11b0 fffffffeffffffff ffff8803090d11b8 ffffffff81209340 [ 152.936701] Call Trace: [ 152.936704] [<ffffffff81209340>] ? generic_write_sync+0x70/0x70 [ 152.936708] [<ffffffff81705959>] schedule+0x29/0x70 [ 152.936710] [<ffffffff81706f5d>] rwsem_down_read_failed+0xbd/0x120 [ 152.936714] [<ffffffff81357de4>] call_rwsem_down_read_failed+0x14/0x30 [ 152.936717] [<ffffffff81704b53>] ? down_read+0x83/0xa0 [ 152.936721] [<ffffffff811d9b5c>] ? iterate_supers+0x9c/0x110 [ 152.936724] [<ffffffff811d9b5c>] iterate_supers+0x9c/0x110 [ 152.936727] [<ffffffff812095c5>] sys_sync+0x35/0x90 [ 152.936730] [<ffffffff81711499>] system_call_fastpath+0x16/0x1b Showing all locks held in the system: [ 152.938636] 2 locks held by fusermount/1420: [ 152.938637] #0: (&type->s_umount_key#44/1){+.+.+.}, at: [<ffffffff811d919a>] sget+0x2ca/0x660 [ 152.938646] #1: (&sbsec->lock){+.+.+.}, at: [<ffffffff812eb654>] selinux_set_mnt_opts+0x74/0x610 [ 152.938668] 1 lock held by sync/2023: [ 152.938669] #0: (&type->s_umount_key#50){.+.+..}, at: [<ffffffff811d9b5c>] iterate_supers+0x9c/0x110 I saw a tentative patch from eparis, so tag, you're it. ;) I reverted the commit that causes the hang. It should show up with kernel-3.13.0-0.rc3.git1.2.fc21 in tomorrow's rawhide. I'll leave the bug in modified until we figure out what the upstream 3.13 solution will be. Appears fixed in kernel-3.13.0-0.rc3.git1.2.fc21.x86_64. Sync and shutdown work. I was not able to do logout or shutdown graphically from a GNOME login session, but it does work after killing X with Ctrl-Alt-Backspace and then using "Power off" from the gdm screen. Confirmed: works for me as well. [Didn't have Andre's gdm/gnome shutdown issue.] Tried again, and it worked normally without the GNOME shutdown issue. I've seen the same issue occasionally before, but not often, so only mentioned it on the off chance it might have been related. jwb, thanks for digging through that bisect! Upstream went with the revert for now as well. Closing this out. |