Description of problem: -------------------------- On 1x3 volume (with profiling enabled and mtsh set to 32) I was running a perl script which was doing below open a file, keep writing to the file with overlapping writes infinitely While the above was going on I did a reboot(or killed a brick) of n1(ie client-c0) after that I saw different abnormalities such as below: 1)doing ls -l from another mount was taking very long time even for displaying 10 files 2) gluster v heal info was not responding even after 5min(only one file needs to be healed) 3) fuse client process (glusterfs) from where the perl script was run had consumed 5.5GB memory 4)file size on the backend were all different on each of the 1x3 bricks Version-Release number of selected component (if applicable): =================== 3.8.4-54.12 How reproducible: -------------- 1/1 Steps to Reproduce: 1.create a 1x3 volume mount it on fuse client 2.run the perl script attached on a input file where IOs must happen(infinite loop where writes) eg: script.pl <some_filename> 3. now reboot brick
Created attachment 1451270 [details] run as "
FYI, there were some kernel hang messages in fuse client (possibly during ls, but not sure) [Thu Jun 14 13:03:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:03:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:03:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:03:49 2018] Call Trace: [Thu Jun 14 13:03:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:03:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:03:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:03:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:03:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:03:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:03:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:03:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:03:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:03:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:03:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:03:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:03:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:03:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:03:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:03:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:03:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:03:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:03:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:03:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:03:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:03:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:03:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:03:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:03:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:05:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:05:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:05:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:05:49 2018] Call Trace: [Thu Jun 14 13:05:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:05:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:05:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:05:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:05:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:05:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:05:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:05:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:05:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:05:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:05:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:05:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:05:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:05:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:05:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:05:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:05:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:05:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:05:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:05:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:05:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:05:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:05:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:05:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:05:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:07:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:07:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:07:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:07:49 2018] Call Trace: [Thu Jun 14 13:07:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:07:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:07:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:07:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:07:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:07:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:07:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:07:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:07:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:07:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:07:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:07:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:07:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:07:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:07:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:07:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:07:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:07:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:07:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:07:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:07:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:07:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:07:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:07:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:07:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:09:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:09:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:09:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:09:49 2018] Call Trace: [Thu Jun 14 13:09:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:09:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:09:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:09:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:09:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:09:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:09:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:09:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:09:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:09:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:09:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:09:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:09:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:09:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:09:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:09:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:09:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:09:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:09:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:09:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:09:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:09:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:09:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:09:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:09:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:11:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:11:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:11:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:11:49 2018] Call Trace: [Thu Jun 14 13:11:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:11:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:11:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:11:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:11:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:11:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:11:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:11:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:11:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:11:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:11:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:11:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:11:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:11:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:11:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:11:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:11:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:11:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:11:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:11:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:11:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:11:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:11:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:11:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:11:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:13:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:13:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:13:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:13:49 2018] Call Trace: [Thu Jun 14 13:13:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:13:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:13:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:13:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:13:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:13:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:13:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:13:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:13:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:13:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:13:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:13:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:13:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:13:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:13:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:13:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:13:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:13:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:13:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:13:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:13:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:13:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:13:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:13:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:13:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:15:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:15:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:15:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:15:49 2018] Call Trace: [Thu Jun 14 13:15:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:15:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:15:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:15:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:15:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:15:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:15:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:15:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:15:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:15:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:15:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:15:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:15:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:15:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:15:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:15:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:15:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:15:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:15:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:15:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:15:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:15:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:15:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:15:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:15:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:17:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:17:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:17:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:17:49 2018] Call Trace: [Thu Jun 14 13:17:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:17:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:17:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:17:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:17:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:17:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:17:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:17:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:17:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:17:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:17:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:17:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:17:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:17:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:17:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:17:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:17:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:17:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:17:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:17:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:17:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:17:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:17:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:17:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:17:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:19:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:19:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:19:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:19:49 2018] Call Trace: [Thu Jun 14 13:19:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:19:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:19:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:19:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:19:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:19:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:19:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:19:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:19:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:19:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:19:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:19:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:19:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:19:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:19:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:19:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:19:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:19:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:19:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:19:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:19:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:19:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:19:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:19:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:19:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:21:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds. [Thu Jun 14 13:21:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Jun 14 13:21:49 2018] glusterfs D ffffa0c833ee5ee0 0 7383 1 0x00000080 [Thu Jun 14 13:21:49 2018] Call Trace: [Thu Jun 14 13:21:49 2018] [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0 [Thu Jun 14 13:21:49 2018] [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0 [Thu Jun 14 13:21:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:21:49 2018] [<ffffffffb5713f79>] schedule+0x29/0x70 [Thu Jun 14 13:21:49 2018] [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0 [Thu Jun 14 13:21:49 2018] [<ffffffffb5711f00>] ? bit_wait+0x50/0x50 [Thu Jun 14 13:21:49 2018] [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130 [Thu Jun 14 13:21:49 2018] [<ffffffffb5713528>] io_schedule+0x18/0x20 [Thu Jun 14 13:21:49 2018] [<ffffffffb5711f11>] bit_wait_io+0x11/0x50 [Thu Jun 14 13:21:49 2018] [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0 [Thu Jun 14 13:21:49 2018] [<ffffffffb5193634>] __lock_page+0x74/0x90 [Thu Jun 14 13:21:49 2018] [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40 [Thu Jun 14 13:21:49 2018] [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460 [Thu Jun 14 13:21:49 2018] [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse] [Thu Jun 14 13:21:49 2018] [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse] [Thu Jun 14 13:21:49 2018] [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse] [Thu Jun 14 13:21:49 2018] [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse] [Thu Jun 14 13:21:49 2018] [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse] [Thu Jun 14 13:21:49 2018] [<ffffffffb521a533>] do_sync_write+0x93/0xe0 [Thu Jun 14 13:21:49 2018] [<ffffffffb521b010>] vfs_write+0xc0/0x1f0 [Thu Jun 14 13:21:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:21:49 2018] [<ffffffffb521be3f>] SyS_write+0x7f/0xf0 [Thu Jun 14 13:21:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [Thu Jun 14 13:21:49 2018] [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21 [Thu Jun 14 13:21:49 2018] [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160 [root@rhs-client26 ~]# date Thu Jun 14 16:33:45 IST 2018
Volume Name: cross3 Type: Replicate Volume ID: 432b19c6-ad2e-47bf-b937-89aa10db2214 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-arch-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cross3 Brick2: rhs-arch-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cross3 Brick3: rhs-arch-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cross3 Options Reconfigured: cluster.post-op-delay-secs: 1 cluster.eager-lock: on cluster.shd-max-threads: 32 transport.address-family: inet nfs.disable: on diagnostics.latency-measurement: on diagnostics.count-fop-hits: on [root@rhs-arch-srv2 ~]# gluster v status cross3 Status of volume: cross3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhs-arch-srv1.lab.eng.blr.redhat.com: /gluster/brick3/cross3 49152 0 Y 4438 Brick rhs-arch-srv2.lab.eng.blr.redhat.com: /gluster/brick3/cross3 49155 0 Y 28799 Brick rhs-arch-srv4.lab.eng.blr.redhat.com: /gluster/brick3/cross3 49154 0 Y 20078 Self-heal Daemon on localhost N/A N/A Y 28820 Self-heal Daemon on rhs-arch-srv1.lab.eng.b lr.redhat.com N/A N/A Y 25013 Self-heal Daemon on rhs-arch-srv4.lab.eng.b lr.redhat.com N/A N/A Y 17605 Task Status of Volume cross3 ------------------------------------------------------------------------------ There are no active volume tasks [root@rhs-arch-srv2 ~]#
glusterfsd resource details [root@rhs-arch-srv1 01PERL]# top -n 1 -b|egrep "gluster|RES" PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2695 root 20 0 672776 10680 4392 S 0.0 0.1 0:03.40 glusterd 4438 root 20 0 3190444 1.2g 4792 S 0.0 7.8 89:34.41 glusterfsd 25013 root 20 0 1371120 20960 3388 S 0.0 0.1 0:11.96 glusterfs [root@rhs-arch-srv2 28799]# top -n 1 -b|egrep "gluster|RES" PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28799 root 20 0 4245780 2.1g 4832 S 0.0 13.4 1323:35 glusterfsd 28820 root 20 0 1354728 12204 3424 S 0.0 0.1 0:10.14 glusterfs 29969 root 20 0 672776 13872 4524 S 0.0 0.1 0:09.39 glusterd [root@rhs-arch-srv4 ~]# top -n 1 -b|egrep "gluster|RES" PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17605 root 20 0 1362924 11388 3424 S 0.0 0.1 31:27.77 glusterfs 19792 root 20 0 672776 11544 4460 S 0.0 0.1 0:09.61 glusterd 20078 root 20 0 2450856 330316 4944 S 0.0 2.0 959:07.62 glusterfsd restarted the volume and heal info is as below [root@rhs-arch-srv2 ~]# gluster v heal cross3 info Brick rhs-arch-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cross3 Status: Connected Number of entries: 0 Brick rhs-arch-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cross3 <gfid:519f0014-a343-477d-ada7-9a89bdac0e2b> Status: Connected Number of entries: 1 Brick rhs-arch-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cross3 Status: Connected Number of entries: 0
I am able to recreate it with the script from Nag. I am finding that there are locks that are not getting unlocked at all even after 10 minutes (Most likely these are stale locks). Need to figure out why these locks are not getting unlocked which is leading to I/O hangs leading to too many blocked locks making every operation take a very long time. Will update once I find RC for this.
Observations so far: 1) I tried to find if there are any stale lock issues but if I kill the perl process, everything is coming back to normal. 2) I went through both old and new code bases once more and did a comparison didn't find any issues. 3) I was also able to recreate the issue with eager-lock off. 4) I minimized the steps to re-create as a) run I/O with the script Nag gave b) do gluster volume heal <volname> info. 5) Did a lot of debugging to see if there are any problems in the locking code but didn't find any even in the case of no eager-lock. 6) As a last resort I tried this heal info case from step 4) on rhgs-3.3.1 which didn't have my fixes. I was able to recreate the stuck locks and locking taking too much time with eager-lock off even on the stable release. So at the moment, I am of the opinion that the application Nag wrote is leading to lot of locks in system which is slowing down the entire transaction leading to this problem. Will spend some more time tomorrow trying to recreate the issues on rhgs-3.3.1 with eager-lock on with more debugging logs to see if this is a problem that always existed or not. Good script Nag :-).
The following patch fixed the issue for me. But this can affect the I/O performance while self-heal is in progress. Considering that this issue is day-1 issue and the application I/O is the one causing the issue and we haven't seen this kind of workload in the last 5 years since this feature has been present at the moment I am thinking of fixing the issue in locks xlator data structure which is a very big change and can't be done for 3.3.1 or 3.4.0. If we are able to see similar issue with any of the existing workloads like block/RHHI then I am open to send the fix for this issue: diff --git a/xlators/cluster/afr/src/afr-transaction.c b/xlators/cluster/afr/src/afr-transaction.c index 89d03c015..1ecdac78a 100644 --- a/xlators/cluster/afr/src/afr-transaction.c +++ b/xlators/cluster/afr/src/afr-transaction.c @@ -2176,9 +2176,7 @@ __afr_eager_lock_handle (afr_local_t *local, gf_boolean_t *take_lock, afr_local_t *owner_local = NULL; xlator_t *this = local->transaction.frame->this; - if (local->fd && !afr_are_multiple_fds_opened (local, this)) { - local->transaction.eager_lock_on = _gf_true; - } + local->transaction.eager_lock_on = _gf_true; lock = &local->inode_ctx->lock[local->transaction.type]; if (__need_previous_lock_unlocked (local)) { @@ -2197,8 +2195,6 @@ __afr_eager_lock_handle (afr_local_t *local, gf_boolean_t *take_lock, lock->delay_timer = NULL; } } - if (!local->transaction.eager_lock_on) - goto out; } if (lock->release) {
(In reply to Pranith Kumar K from comment #8) > The following patch fixed the issue for me. But this can affect the I/O > performance while self-heal is in progress. Considering that this issue is > day-1 issue and the application I/O is the one causing the issue and we > haven't seen this kind of workload in the last 5 years since this feature > has been present at the moment I am thinking of fixing the issue in locks > xlator data structure which is a very big change and can't be done for 3.3.1 > or 3.4.0. If we are able to see similar issue with any of the existing > workloads like block/RHHI then I am open to send the fix for this issue: > > diff --git a/xlators/cluster/afr/src/afr-transaction.c > b/xlators/cluster/afr/src/afr-transaction.c > index 89d03c015..1ecdac78a 100644 > --- a/xlators/cluster/afr/src/afr-transaction.c > +++ b/xlators/cluster/afr/src/afr-transaction.c > @@ -2176,9 +2176,7 @@ __afr_eager_lock_handle (afr_local_t *local, > gf_boolean_t *take_lock, > afr_local_t *owner_local = NULL; > xlator_t *this = local->transaction.frame->this; > > - if (local->fd && !afr_are_multiple_fds_opened (local, this)) { > - local->transaction.eager_lock_on = _gf_true; > - } > + local->transaction.eager_lock_on = _gf_true; > > lock = &local->inode_ctx->lock[local->transaction.type]; > if (__need_previous_lock_unlocked (local)) { > @@ -2197,8 +2195,6 @@ __afr_eager_lock_handle (afr_local_t *local, > gf_boolean_t *take_lock, > lock->delay_timer = NULL; > } > } > - if (!local->transaction.eager_lock_on) > - goto out; > } > > if (lock->release) { Nag, I am sending out the patch on master. Will it be possible to add a glusto case for this? Pranith.
I'm not sure if this bug needs to continue with severity as 'urgent' though. Pranith - What'd it take for us to get this bug fixed in upstream. I see you have proposed a patch which is already being merged upstream. So if we're to pull this in one of the batch updates, what all things we need to consider apart from performance?
(In reply to Atin Mukherjee from comment #12) > I'm not sure if this bug needs to continue with severity as 'urgent' though. > > Pranith - What'd it take for us to get this bug fixed in upstream. I see you > have proposed a patch which is already being merged upstream. So if we're to > pull this in one of the batch updates, what all things we need to consider > apart from performance? I believe the patch that fixes this issue already went in as part of eager-lock not depending on fd-count changes needed for performance improvement. Downstream patch: https://code.engineering.redhat.com/gerrit/150700 Downstream Bug: 1630688 What should be done in this scenario with the bug?
(In reply to Pranith Kumar K from comment #13) > (In reply to Atin Mukherjee from comment #12) > > I'm not sure if this bug needs to continue with severity as 'urgent' though. > > > > Pranith - What'd it take for us to get this bug fixed in upstream. I see you > > have proposed a patch which is already being merged upstream. So if we're to > > pull this in one of the batch updates, what all things we need to consider > > apart from performance? > > I believe the patch that fixes this issue already went in as part of > eager-lock not depending on fd-count changes needed for performance > improvement. > Downstream patch: https://code.engineering.redhat.com/gerrit/150700 > Downstream Bug: 1630688 > > What should be done in this scenario with the bug? if you think this bug has been fixed, with above patch, you should be moving it to on_qa(after getting the necessary acks)
Sunil, Could you please help here. Pranith
Verified the fix as per the test-plan, All abnormalities seen per comment #0 were fixed. Setting the bug to verified.