Bug 1033965

Summary: kernel-3.13.0-0.rc1 kernels "break" sync - sync() fails to return, hangs, ...
Product: [Fedora] Fedora Reporter: Tom London <selinux>
Component: kernelAssignee: Eric Paris <eparis>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: aavati, eparis, esandeen, gansalmon, ikent, itamar, jonathan, kernel-maint, madhu.chinakonda, robatino, rwheeler, selinux
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-16 13:17:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom London 2013-11-24 19:55:59 UTC
Description of problem:
sync/sync() hang/fail-to-return with both kernel-3.13.0-0.rc1.git0.1.fc21.x86_64 and kernel-3.13.0-0.rc1.git1.1.fc21.x86_64,

Believe this is causing shutdown to hang/fail to complete.

Here is the output of 'strace sync':

[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0xcf4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4763bde000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166458, ...}) = 0
mmap(NULL, 166458, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4763bb5000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\342\v?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2161176, ...}) = 0
mmap(0x3f0be00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3f0be00000
mprotect(0x3f0bfbe000, 2097152, PROT_NONE) = 0
mmap(0x3f0c1be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3f0c1be000
mmap(0x3f0c1c4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3f0c1c4000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4763bb4000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4763bb2000
arch_prctl(ARCH_SET_FS, 0x7f4763bb2740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x3f0c1be000, 16384, PROT_READ) = 0
mprotect(0x3f0bc21000, 4096, PROT_READ) = 0
munmap(0x7f4763bb5000, 166458)          = 0
brk(0)                                  = 0xcf4000
brk(0xd15000)                           = 0xd15000
brk(0)                                  = 0xd15000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106353888, ...}) = 0
mmap(NULL, 106353888, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f475d644000
close(3)                                = 0
sync(     <----- Hangs here "forever"




Version-Release number of selected component (if applicable):
kernel-3.13.0-0.rc1.git0.1.fc21.x86_64
kernel-3.13.0-0.rc1.git1.1.fc21.x86_64

How reproducible:
Every time

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Tom London 2013-11-24 20:00:16 UTC
See this in journal:

Nov 24 11:55:40 tlondon.localhost.org kernel: INFO: task sync:2388 blocked for more than 120 seconds.
Nov 24 11:55:40 tlondon.localhost.org kernel:       Not tainted 3.13.0-0.rc1.git1.1.fc21.x86_64 #1
Nov 24 11:55:40 tlondon.localhost.org kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 24 11:55:40 tlondon.localhost.org kernel: sync            D ffff88008d8fcd70  4280  2388   2385 0x00000081
Nov 24 11:55:40 tlondon.localhost.org kernel:  ffff88005df95e48 0000000000000046 ffff88005dd633a0 ffff88005df95fd8
Nov 24 11:55:40 tlondon.localhost.org kernel:  00000000001d5200 00000000001d5200 ffff88005dd633a0 ffff88005dd633a0
Nov 24 11:55:40 tlondon.localhost.org kernel:  ffff8800a599d6d0 fffffffeffffffff ffff8800a599d6d8 ffffffff81230890
Nov 24 11:55:40 tlondon.localhost.org kernel: Call Trace:
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff81230890>] ? generic_write_sync+0x60/0x60
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff81758089>] schedule+0x29/0x70
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff8175c115>] rwsem_down_read_failed+0xc5/0x120
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff8138e544>] call_rwsem_down_read_failed+0x14/0x30
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff8175ba83>] ? down_read+0x83/0xa0
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff811fd69c>] ? iterate_supers+0x9c/0x110
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff811fd69c>] iterate_supers+0x9c/0x110
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff81230b42>] sys_sync+0x42/0xa0
Nov 24 11:55:40 tlondon.localhost.org kernel:  [<ffffffff81766f50>] tracesys+0xdd/0xe2
Nov 24 11:55:40 tlondon.localhost.org kernel: 1 lock held by sync/2388:
Nov 24 11:55:40 tlondon.localhost.org kernel:  #0:  (&type->s_umount_key#48){.+.+..}, at: [<ffffffff811fd69c>] iterate_supers+0x9c/0x110

Comment 2 Josh Boyer 2013-11-25 13:47:16 UTC
What kind of filesystems are in use here?

Comment 3 Tom London 2013-11-25 14:24:41 UTC
[tbl@tlondon ~]$ mount | grep '^/'
/dev/mapper/vg_tlondon-lv_root on / type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sdb1 on /run/media/tbl/LIVE type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,showexec,utf8,flush,errors=remount-ro)
[tbl@tlondon ~]$ 

So, guessing ext4.

The SD card is not overtly touched...

Comment 4 Tom London 2013-11-25 14:56:50 UTC
Removed SD card and rebooted (to remove any possible confusion).

'sync' still hangs with just ext4 file systems:

[tbl@tlondon ~]$ mount | grep '^/'
/dev/mapper/vg_tlondon-lv_root on / type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)
[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0x1b24000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f747735f000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166458, ...}) = 0
mmap(NULL, 166458, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7477336000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\342\v?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2161176, ...}) = 0
mmap(0x3f0be00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3f0be00000
mprotect(0x3f0bfbe000, 2097152, PROT_NONE) = 0
mmap(0x3f0c1be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3f0c1be000
mmap(0x3f0c1c4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3f0c1c4000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7477335000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7477333000
arch_prctl(ARCH_SET_FS, 0x7f7477333740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x3f0c1be000, 16384, PROT_READ) = 0
mprotect(0x3f0bc21000, 4096, PROT_READ) = 0
munmap(0x7f7477336000, 166458)          = 0
brk(0)                                  = 0x1b24000
brk(0x1b45000)                          = 0x1b45000
brk(0)                                  = 0x1b45000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106353888, ...}) = 0
mmap(NULL, 106353888, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7470dc5000
close(3)                                = 0
sync(

Comment 5 Eric Sandeen 2013-11-25 17:37:28 UTC
Looks like

                 down_read(&sb->s_umount);

in iterate_supers failed, and we're waiting for it.  Not sure who would have it...

ext4 doesn't manipulate this lock directly... not sure what to make of this yet.

Comment 8 Tom London 2013-11-26 15:40:50 UTC
Still hangs with kernel-3.13.0-0.rc1.git2.1.fc21.x86_64

Comment 9 Tom London 2013-11-30 17:50:48 UTC
Still hanging with kernel-3.13.0-0.rc2.git0.1.fc21.x86_64.

Last working kernel for me: kernel-3.13.0-0.rc0.git10.1.fc21.x86_64

[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0x2089000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5797caa000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166679, ...}) = 0
mmap(NULL, 166679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5797c81000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\242\3=\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2161008, ...}) = 0
mmap(0x3d03a00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3d03a00000
mprotect(0x3d03bbe000, 2097152, PROT_NONE) = 0
mmap(0x3d03dbe000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3d03dbe000
mmap(0x3d03dc4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3d03dc4000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5797c80000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5797c7e000
arch_prctl(ARCH_SET_FS, 0x7f5797c7e740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x3d03dbe000, 16384, PROT_READ) = 0
mprotect(0x3d03821000, 4096, PROT_READ) = 0
munmap(0x7f5797c81000, 166679)          = 0
brk(0)                                  = 0x2089000
brk(0x20aa000)                          = 0x20aa000
brk(0)                                  = 0x20aa000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0
mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f579170d000
close(3)                                = 0
sync(

More I can do/help?

Comment 10 Ian Kent 2013-12-02 10:14:46 UTC
I'm not able to reproduce this with 3.13.0-rc1 upstream on my
Fedora 18 system so a git bisect can't be done.

It doesn't look related to user space due to the strace.

The only other way we might be able get some leads on
this is to inspect a crash dump.

Ian

Comment 11 Ian Kent 2013-12-02 12:14:50 UTC
(In reply to Ian Kent from comment #10)
> I'm not able to reproduce this with 3.13.0-rc1 upstream on my
> Fedora 18 system so a git bisect can't be done.

It also doesn't occur on a newly installed and updated f19
vm.

Comment 12 Tom London 2013-12-03 14:53:32 UTC
Problem continues for me with kernel-3.13.0-0.rc2.git2.1.fc21.x86_64:

[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0x10db000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b50392000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166679, ...}) = 0
mmap(NULL, 166679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6b50369000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\242\3=\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2161008, ...}) = 0
mmap(0x3d03a00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3d03a00000
mprotect(0x3d03bbe000, 2097152, PROT_NONE) = 0
mmap(0x3d03dbe000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3d03dbe000
mmap(0x3d03dc4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3d03dc4000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b50368000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b50366000
arch_prctl(ARCH_SET_FS, 0x7f6b50366740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x3d03dbe000, 16384, PROT_READ) = 0
mprotect(0x3d03821000, 4096, PROT_READ) = 0
munmap(0x7f6b50369000, 166679)          = 0
brk(0)                                  = 0x10db000
brk(0x10fc000)                          = 0x10fc000
brk(0)                                  = 0x10fc000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0
mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6b49df5000
close(3)                                = 0
sync(

Comment 13 Tom London 2013-12-03 15:17:04 UTC
Works for me with latest nodebug kernel kernel-3.12.2-3.fc21.x86_64:

[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0x159c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0a5adfb000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166679, ...}) = 0
mmap(NULL, 166679, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a5add2000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\242\3=\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2161008, ...}) = 0
mmap(0x3d03a00000, 3965504, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3d03a00000
mprotect(0x3d03bbe000, 2097152, PROT_NONE) = 0
mmap(0x3d03dbe000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x3d03dbe000
mmap(0x3d03dc4000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3d03dc4000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0a5add1000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0a5adcf000
arch_prctl(ARCH_SET_FS, 0x7f0a5adcf740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x3d03dbe000, 16384, PROT_READ) = 0
mprotect(0x3d03821000, 4096, PROT_READ) = 0
munmap(0x7f0a5add2000, 166679)          = 0
brk(0)                                  = 0x159c000
brk(0x15bd000)                          = 0x15bd000
brk(0)                                  = 0x15bd000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0
mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0a5485e000
close(3)                                = 0
sync()                                  = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
[tbl@tlondon ~]$

Comment 14 Andre Robatino 2013-12-05 13:12:04 UTC
Broken for me when using a 3.13 kernel (including 3.13.0-0.rc2.git2.1.fc21.x86_64). Worked with 3.12.

Comment 15 Tom London 2013-12-06 14:49:52 UTC
Still broken with kernel-3.13.0-0.rc2.git3.1.fc21.x86_64:

Dec 06 06:47:40 tlondon.localhost.org kernel: INFO: task sync:3308 blocked for more than 120 seconds.
Dec 06 06:47:41 tlondon.localhost.org kernel:       Not tainted 3.13.0-0.rc2.git3.1.fc21.x86_64 #1
Dec 06 06:47:41 tlondon.localhost.org kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 06 06:47:41 tlondon.localhost.org kernel: sync            D ffffffff81c134c0  4272  3308   3305 0x00000081
Dec 06 06:47:41 tlondon.localhost.org kernel:  ffff88000bc7fe48 0000000000000046 ffff880113280000 ffff88000bc7ffd8
Dec 06 06:47:41 tlondon.localhost.org kernel:  00000000001d5200 00000000001d5200 ffff880113280000 ffff880113280000
Dec 06 06:47:41 tlondon.localhost.org kernel:  ffff88009c5d4588 fffffffeffffffff ffff88009c5d4590 ffffffff81230f30
Dec 06 06:47:41 tlondon.localhost.org kernel: Call Trace:
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff81230f30>] ? generic_write_sync+0x60/0x60
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff81758d49>] schedule+0x29/0x70
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff8175ce55>] rwsem_down_read_failed+0xc5/0x120
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff8138ebd4>] call_rwsem_down_read_failed+0x14/0x30
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff8175c7c3>] ? down_read+0x83/0xa0
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff811fdddc>] ? iterate_supers+0x9c/0x110
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff811fdddc>] iterate_supers+0x9c/0x110
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff812311e2>] sys_sync+0x42/0xa0
Dec 06 06:47:41 tlondon.localhost.org kernel:  [<ffffffff81767c50>] tracesys+0xdd/0xe2
Dec 06 06:47:41 tlondon.localhost.org kernel: 1 lock held by sync/3308:
Dec 06 06:47:41 tlondon.localhost.org kernel:  #0:  (&type->s_umount_key#48){.+.+..}, at: [<ffffffff811fdddc>] iterate_supers+0x9c/0x110



[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0x2573000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6515b80000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166473, ...}) = 0
mmap(NULL, 166473, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6515b57000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\302\2674\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2156912, ...}) = 0
mmap(0x34b7c00000, 3961408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34b7c00000
mprotect(0x34b7dbe000, 2093056, PROT_NONE) = 0
mmap(0x34b7fbd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x34b7fbd000
mmap(0x34b7fc3000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34b7fc3000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6515b56000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6515b54000
arch_prctl(ARCH_SET_FS, 0x7f6515b54740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x34b7fbd000, 16384, PROT_READ) = 0
mprotect(0x34b7a21000, 4096, PROT_READ) = 0
munmap(0x7f6515b57000, 166473)          = 0
brk(0)                                  = 0x2573000
brk(0x2594000)                          = 0x2594000
brk(0)                                  = 0x2594000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0
mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f650f5e3000
close(3)                                = 0
sync(

Comment 16 Tom London 2013-12-07 17:36:54 UTC
Same hang with kernel-3.13.0-0.rc2.git6.1.fc21.x86_64

Comment 17 Andre Robatino 2013-12-08 13:14:05 UTC
I've found no hint of this issue on the LKML mailing list. Seems to be Fedora only.

Comment 18 Josh Boyer 2013-12-08 16:47:16 UTC
(In reply to Andre Robatino from comment #17)
> I've found no hint of this issue on the LKML mailing list. Seems to be
> Fedora only.

It not being on LKML doesn't really mean anything.  It could just mean it's difficult to hit unless you have the right variables in play.

I hit this once with 3.13.0-0.rc1.git3, but haven't been able to hit it again.  I'll poke at it more tomorrow.

Tom, so far we know 3.12 doesn't have this issue for you.  Since you seem to be able to hit this fairly easily, could you go through the rc0.gitX builds in koji and try them one at a time to see which kernel first introduces the issue for you?

http://koji.fedoraproject.org/koji/packageinfo?packageID=8

Also, if there are things you're doing before running sync, that would be helpful to know.  Or do you get the sync hang if you run sync immediately after boot?

Comment 19 Tom London 2013-12-08 20:31:16 UTC
The last "working for me" kernel is kernel-3.13.0-0.rc0.git10.1.fc21.x86_64 (I'll retest this just to be sure).

The first "doesn't work for me kernel" is (as reported above) kernel-3.13.0-0.rc1.git0.1.fc21.x86_64.

Also, the 3.12 nodebug kernels work for me: I'm currently booting kernel-3.12.2-3.fc21.x86_64.

I tried to bisect this on my own (using wiki https://fedoraproject.org/wiki/User:Ignatenkobrain/Kernel/Bisection as a cookbook). Unfortunately, after the first step, the builds fail:

~/build/BUILD/kernel-3.11.fc21/linux-3.12.0-0.rc2.git999.1.3ea7a560.fc21.x86_64
/var/tmp/rpm-tmp.yTd1pO: line 295: pushd: tools/thermal/tmon/: No such file or directory
error: Bad exit status from /var/tmp/rpm-tmp.yTd1pO (%build)
RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.yTd1pO (%build)
Child return code was: 1

I tried skipping once, but the error remained. (I must be doing something wrong....).

This is how I set it up:

    git bisect bad v3.13-rc1
    git bisect good  v3.12-10928-g527d151
    ~/kernel-package/kernel-package.py 
    mock -r fedora-rawhide-x86_64 --rebuild sources/*.src.rpm --resultdir sources/rpms

[BTW, I discovered a few things attempting this. 1) If 'sync' hangs, then so does mock, and 2) kernel builds take a really long time, especially on my Thinkpad X200 .... ;-)]

From the reports on this ticket, I think I've tried most of the kernels after kernel-3.13.0-0.rc1.git0.1.fc21.x86_64. Would it be helpful for me to be comprehensive?

Regarding how I setup/run sync: I boot to gnome, bring up a terminal window, and run either 'sync' or 'strace sync'. Appears to hang if I do this immediately, or if I do some "other stuff" and then try to sync.

Comment 20 Tom London 2013-12-08 20:49:08 UTC
Yup, works with kernel-3.13.0-0.rc0.git10.1.fc21.x86_64

[tbl@tlondon ~]$ uname -a
Linux tlondon.localhost.org 3.13.0-0.rc0.git10.1.fc21.x86_64 #1 SMP Thu Nov 21 01:03:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[tbl@tlondon ~]$ strace sync
execve("/usr/bin/sync", ["sync"], [/* 58 vars */]) = 0
brk(0)                                  = 0xad6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faaf4a2a000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=166473, ...}) = 0
mmap(NULL, 166473, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faaf4a01000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\302\2674\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2156912, ...}) = 0
mmap(0x34b7c00000, 3961408, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x34b7c00000
mprotect(0x34b7dbe000, 2093056, PROT_NONE) = 0
mmap(0x34b7fbd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x34b7fbd000
mmap(0x34b7fc3000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x34b7fc3000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faaf4a00000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faaf49fe000
arch_prctl(ARCH_SET_FS, 0x7faaf49fe740) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x34b7fbd000, 16384, PROT_READ) = 0
mprotect(0x34b7a21000, 4096, PROT_READ) = 0
munmap(0x7faaf4a01000, 166473)          = 0
brk(0)                                  = 0xad6000
brk(0xaf7000)                           = 0xaf7000
brk(0)                                  = 0xaf7000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106366160, ...}) = 0
mmap(NULL, 106366160, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faaee48d000
close(3)                                = 0
sync()                                  = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
[tbl@tlondon ~]$

Comment 21 Josh Boyer 2013-12-09 14:32:32 UTC
Tom, can you install the kernel here:

http://jwboyer.fedorapeople.org/pub/bug1033965/

and see if it recreates?  It's between rc0-git10 and rc1.  At a minimum it will help narrow down the range where this broke.

Comment 22 Tom London 2013-12-09 14:47:34 UTC
kernel-3.13.0-0.rc0.git11.1.fc21.x86_64.rpm still hangs for me:

Dec 09 06:45:13 tlondon.localhost.org kernel: INFO: task sync:2300 blocked for more than 120 seconds.
Dec 09 06:45:13 tlondon.localhost.org kernel:       Not tainted 3.13.0-0.rc0.git11.1.fc21.x86_64 #1
Dec 09 06:45:13 tlondon.localhost.org kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 09 06:45:13 tlondon.localhost.org kernel: sync            D ffff880134d499d0  4280  2300   2297 0x00000081
Dec 09 06:45:13 tlondon.localhost.org kernel:  ffff880062973e48 0000000000000046 ffff880083444d70 ffff880062973fd8
Dec 09 06:45:13 tlondon.localhost.org kernel:  00000000001d5200 00000000001d5200 ffff880083444d70 ffff880083444d70
Dec 09 06:45:13 tlondon.localhost.org kernel:  ffff8800989fd6d0 fffffffeffffffff ffff8800989fd6d8 ffffffff81231060
Dec 09 06:45:13 tlondon.localhost.org kernel: Call Trace:
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff81231060>] ? generic_write_sync+0x60/0x60
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff81758f19>] schedule+0x29/0x70
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff8175cf85>] rwsem_down_read_failed+0xc5/0x120
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff8138f014>] call_rwsem_down_read_failed+0x14/0x30
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff8175c8f3>] ? down_read+0x83/0xa0
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff811fddfc>] ? iterate_supers+0x9c/0x110
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff811fddfc>] iterate_supers+0x9c/0x110
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff81231312>] sys_sync+0x42/0xa0
Dec 09 06:45:13 tlondon.localhost.org kernel:  [<ffffffff81767dd0>] tracesys+0xdd/0xe2
Dec 09 06:45:13 tlondon.localhost.org kernel: 1 lock held by sync/2300:
Dec 09 06:45:13 tlondon.localhost.org kernel:  #0:  (&type->s_umount_key#47){.+.+..}, at: [<ffffffff811fddfc>] iterate_supers+0x9c/0x110

Comment 23 Andre Robatino 2013-12-10 13:03:52 UTC
Same problem with kernel-3.13.0-0.rc3.git0.2.fc21.x86_64. If I go directly to a VT and do a sync there, before logging into GNOME, it works. After logging into GNOME, it fails.

Comment 24 Tom London 2013-12-10 15:24:04 UTC
Wow.... Nice....I didn't catch that.

I installed kernel-3.13.0-0.rc3.git0.2.fc21.x86_64 and rebooted (permissive) to run level 3.

Logged in as root and ran 'strace sync'. It completed. I repeated. It completed.

I then ran 'telinit 5', logged in to gnome/gdm, created terminal window, and ran 'strace sync' as root. It hung as above.

Dazed, confused, ...

Comment 25 Josh Boyer 2013-12-10 15:44:16 UTC
I can recreate this by logging into gnome as well.  It also recreates on the raw upstream git commit corresponding to the Fedora git11 snapshot, so it isn't Fedora specific.  Doing a git bisect at the moment.

The only relevant thing I've noticed when sync hangs:

Showing all locks held in the system:
[  243.534376] 2 locks held by fusermount/1415:
[  243.534377]  #0:  (&type->s_umount_key#46/1){+.+.+.}, at: [<ffffffff811fd4b7>] sget+0x237/0x460
[  243.534383]  #1:  (&sbsec->lock){+.+.+.}, at: [<ffffffff8131c15c>] selinux_set_mnt_opts+0x7c/0x680
[  243.534402] 1 lock held by sync/2078:
[  243.534403]  #0:  (&type->s_umount_key#52){.+.+..}, at: [<ffffffff811fddfc>] iterate_supers+0x9c/0x110
[  243.534408] 3 locks held by bash/2229:
[  243.534409]  #0:  (sb_writers#4){.+.+.+}, at: [<ffffffff811fa243>] vfs_write+0x173/0x1f0
[  243.534413]  #1:  (sysrq_key_table_lock){......}, at: [<ffffffff8146aac4>] __handle_sysrq+0x24/0x1a0
[  243.534417]  #2:  (tasklist_lock){.+.?..}, at: [<ffffffff810ccf73>] debug_show_all_locks+0x43/0x2a0

Comment 26 Andre Robatino 2013-12-10 15:51:42 UTC
@Tom: I didn't have to boot in runlevel 3. I just booted in runlevel 5, as usual, then logged into a VT instead of GNOME, and ran sync, which worked. After logging into GNOME, it doesn't work (in either GNOME itself or the VT).

Comment 27 Josh Boyer 2013-12-10 19:53:26 UTC
I bisected this down to the commit below.  Eric, Anand, thoughts?

commit 102aefdda4d8275ce7d7100bc16c88c74272b260
Author: Anand Avati <avati>
Date:   Tue Apr 16 18:56:19 2013 -0400

    selinux: consider filesystem subtype in policies
    
    Not considering sub filesystem has the following limitation. Support
    for SELinux in FUSE is dependent on the particular userspace
    filesystem, which is identified by the subtype. For e.g, GlusterFS,
    a FUSE based filesystem supports SELinux (by mounting and processing
    FUSE requests in different threads, avoiding the mount time
    deadlock), whereas other FUSE based filesystems (identified by a
    different subtype) have the mount time deadlock.
    
    By considering the subtype of the filesytem in the SELinux policies,
    allows us to specify a filesystem subtype, in the following way:
    
    fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0);
    
    This way not all FUSE filesystems are put in the same bucket and
    subjected to the limitations of the other subtypes.
    
    Signed-off-by: Anand Avati <avati>
    Signed-off-by: Eric Paris <eparis>

The backtraces for the two processes involved in the locks are here:

[  152.923866] fusermount      S ffff88031cee5c00  4176  1420   1401 0x00000080
[  152.923869]  ffff880309367c10 0000000000000046 00000000001d5140 ffff880309367fd8
[  152.923873]  ffff880309367fd8 00000000001d5140 ffff88030954ae00 ffff8803090d67b0
[  152.923876]  ffff880309367c50 ffff8803090d69b0 0000000000000000 ffff88030954ae00
[  152.923880] Call Trace:
[  152.923883]  [<ffffffff81705959>] schedule+0x29/0x70
[  152.923888]  [<ffffffffa0661d15>] __fuse_get_req+0x185/0x270 [fuse]
[  152.923892]  [<ffffffff81096850>] ? wake_up_bit+0x30/0x30
[  152.923896]  [<ffffffffa0661e10>] fuse_get_req+0x10/0x20 [fuse]
[  152.923900]  [<ffffffffa06651df>] fuse_getxattr+0x4f/0x160 [fuse]
[  152.923903]  [<ffffffff812eb4b5>] sb_finish_set_opts+0x215/0x340
[  152.923906]  [<ffffffff812eb841>] selinux_set_mnt_opts+0x261/0x610
[  152.923909]  [<ffffffff812e5e2a>] ? selinux_parse_opts_str+0x1ba/0x2a0
[  152.923912]  [<ffffffff812ebc77>] selinux_sb_kern_mount+0x87/0x150
[  152.923916]  [<ffffffff812e0cf6>] security_sb_kern_mount+0x16/0x20
[  152.923919]  [<ffffffff811da20a>] mount_fs+0x8a/0x1b0
[  152.923922]  [<ffffffff811f7aa3>] vfs_kern_mount+0x63/0xf0
[  152.923925]  [<ffffffff811fa36e>] do_mount+0x23e/0xa20
[  152.923928]  [<ffffffff81166964>] ? __get_free_pages+0x14/0x50
[  152.923931]  [<ffffffff811f9fb6>] ? copy_mount_options+0x36/0x170
[  152.923934]  [<ffffffff811fabd3>] SyS_mount+0x83/0xc0
[  152.923937]  [<ffffffff81711499>] system_call_fastpath+0x16/0x1b

[  152.936687] sync            D ffff88031cee1700  4176  2023   1987 0x00000080
[  152.936691]  ffff8802fc507e48 0000000000000046 00000000001d5140 ffff8802fc507fd8
[  152.936694]  ffff8802fc507fd8 00000000001d5140 ffff8802da9e9700 ffff8802da9e9700
[  152.936697]  ffff8803090d11b0 fffffffeffffffff ffff8803090d11b8 ffffffff81209340
[  152.936701] Call Trace:
[  152.936704]  [<ffffffff81209340>] ? generic_write_sync+0x70/0x70
[  152.936708]  [<ffffffff81705959>] schedule+0x29/0x70
[  152.936710]  [<ffffffff81706f5d>] rwsem_down_read_failed+0xbd/0x120
[  152.936714]  [<ffffffff81357de4>] call_rwsem_down_read_failed+0x14/0x30
[  152.936717]  [<ffffffff81704b53>] ? down_read+0x83/0xa0
[  152.936721]  [<ffffffff811d9b5c>] ? iterate_supers+0x9c/0x110
[  152.936724]  [<ffffffff811d9b5c>] iterate_supers+0x9c/0x110
[  152.936727]  [<ffffffff812095c5>] sys_sync+0x35/0x90
[  152.936730]  [<ffffffff81711499>] system_call_fastpath+0x16/0x1b


Showing all locks held in the system:
[  152.938636] 2 locks held by fusermount/1420:
[  152.938637]  #0:  (&type->s_umount_key#44/1){+.+.+.}, at: [<ffffffff811d919a>] sget+0x2ca/0x660
[  152.938646]  #1:  (&sbsec->lock){+.+.+.}, at: [<ffffffff812eb654>] selinux_set_mnt_opts+0x74/0x610
[  152.938668] 1 lock held by sync/2023:
[  152.938669]  #0:  (&type->s_umount_key#50){.+.+..}, at: [<ffffffff811d9b5c>] iterate_supers+0x9c/0x110

Comment 28 Eric Sandeen 2013-12-10 21:56:37 UTC
I saw a tentative patch from eparis, so tag, you're it.  ;)

Comment 29 Josh Boyer 2013-12-10 23:30:34 UTC
I reverted the commit that causes the hang.  It should show up with kernel-3.13.0-0.rc3.git1.2.fc21 in tomorrow's rawhide.

I'll leave the bug in modified until we figure out what the upstream 3.13 solution will be.

Comment 30 Andre Robatino 2013-12-11 12:57:12 UTC
Appears fixed in kernel-3.13.0-0.rc3.git1.2.fc21.x86_64. Sync and shutdown work. I was not able to do logout or shutdown graphically from a GNOME login session, but it does work after killing X with Ctrl-Alt-Backspace and then using "Power off" from the gdm screen.

Comment 31 Tom London 2013-12-11 14:53:19 UTC
Confirmed: works for me as well. [Didn't have Andre's gdm/gnome shutdown issue.]

Comment 32 Andre Robatino 2013-12-11 22:49:57 UTC
Tried again, and it worked normally without the GNOME shutdown issue. I've seen the same issue occasionally before, but not often, so only mentioned it on the off chance it might have been related.

Comment 33 Eric Sandeen 2013-12-11 22:52:24 UTC
jwb, thanks for digging through that bisect!

Comment 34 Josh Boyer 2013-12-16 13:17:45 UTC
Upstream went with the revert for now as well.  Closing this out.