1591208 – Writes taking very long time leading to system hogging

Bug 1591208 - Writes taking very long time leading to system hogging

Summary: Writes taking very long time leading to system hogging

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	Anees Patel
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1625961 1635975 1635977 1635979
TreeView+	depends on / blocked

Reported:	2018-06-14 09:41 UTC by Nag Pavan Chilakam
Modified:	2019-07-10 06:37 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.12.2-20
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1625961 (view as bug list)
Environment:
Last Closed:	2018-11-26 07:28:13 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
run as " (397 bytes, application/x-perl) 2018-06-14 09:43 UTC, Nag Pavan Chilakam	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1619357	1	None	None	None	2024-09-18 00:48:42 UTC

Internal Links: 1619357

Description Nag Pavan Chilakam 2018-06-14 09:41:49 UTC

Description of problem:
--------------------------
On 1x3 volume (with profiling enabled and mtsh set to 32) 
I was running a perl script which was doing below
open a file, keep writing to the file with overlapping writes infinitely

While the above was going on I did a reboot(or killed a brick) of n1(ie client-c0)

after that I saw different abnormalities such as below:
1)doing ls -l from another mount was taking very long time even for displaying 10 files
2) gluster v heal info was not responding even after 5min(only one file needs to be healed)
3) fuse client process (glusterfs) from where the perl script was run had consumed 5.5GB memory
4)file size on the backend were all different on each of the 1x3 bricks

Version-Release number of selected component (if applicable):
===================
3.8.4-54.12

How reproducible:
--------------
1/1

Steps to Reproduce:
1.create a 1x3 volume mount it on fuse client
2.run the perl script attached on a input file where IOs must happen(infinite loop where writes)
eg: script.pl <some_filename>
3. now reboot brick

Comment 2 Nag Pavan Chilakam 2018-06-14 09:43:38 UTC

Created attachment 1451270 [details]
run as "

Comment 3 Nag Pavan Chilakam 2018-06-14 11:05:37 UTC

FYI,
there were some kernel hang messages in fuse client
(possibly during ls, but not sure)

[Thu Jun 14 13:03:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:03:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:03:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:03:49 2018] Call Trace:
[Thu Jun 14 13:03:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:03:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:03:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:03:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:03:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:03:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:03:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:03:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:03:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:03:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:03:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:03:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:03:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:03:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:03:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:05:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:05:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:05:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:05:49 2018] Call Trace:
[Thu Jun 14 13:05:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:05:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:05:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:05:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:05:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:05:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:05:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:05:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:05:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:05:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:05:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:05:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:05:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:05:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:05:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:07:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:07:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:07:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:07:49 2018] Call Trace:
[Thu Jun 14 13:07:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:07:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:07:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:07:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:07:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:07:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:07:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:07:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:07:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:07:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:07:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:07:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:07:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:07:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:07:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:09:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:09:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:09:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:09:49 2018] Call Trace:
[Thu Jun 14 13:09:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:09:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:09:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:09:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:09:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:09:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:09:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:09:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:09:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:09:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:09:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:09:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:09:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:09:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:09:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:11:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:11:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:11:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:11:49 2018] Call Trace:
[Thu Jun 14 13:11:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:11:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:11:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:11:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:11:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:11:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:11:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:11:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:11:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:11:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:11:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:11:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:11:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:11:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:11:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:13:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:13:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:13:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:13:49 2018] Call Trace:
[Thu Jun 14 13:13:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:13:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:13:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:13:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:13:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:13:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:13:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:13:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:13:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:13:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:13:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:13:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:13:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:13:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:13:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:15:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:15:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:15:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:15:49 2018] Call Trace:
[Thu Jun 14 13:15:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:15:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:15:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:15:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:15:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:15:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:15:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:15:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:15:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:15:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:15:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:15:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:15:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:15:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:15:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:17:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:17:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:17:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:17:49 2018] Call Trace:
[Thu Jun 14 13:17:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:17:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:17:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:17:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:17:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:17:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:17:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:17:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:17:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:17:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:17:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:17:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:17:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:17:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:17:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:19:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:19:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:19:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:19:49 2018] Call Trace:
[Thu Jun 14 13:19:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:19:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:19:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:19:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:19:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:19:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:19:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:19:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:19:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:19:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:19:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:19:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:19:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:19:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:19:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:21:49 2018] INFO: task glusterfs:7383 blocked for more than 120 seconds.
[Thu Jun 14 13:21:49 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 14 13:21:49 2018] glusterfs       D ffffa0c833ee5ee0     0  7383      1 0x00000080
[Thu Jun 14 13:21:49 2018] Call Trace:
[Thu Jun 14 13:21:49 2018]  [<ffffffffb50db128>] ? enqueue_task_fair+0x208/0x6c0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb50d22a5>] ? sched_clock_cpu+0x85/0xc0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5713f79>] schedule+0x29/0x70
[Thu Jun 14 13:21:49 2018]  [<ffffffffb57118e9>] schedule_timeout+0x239/0x2c0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5711f00>] ? bit_wait+0x50/0x50
[Thu Jun 14 13:21:49 2018]  [<ffffffffb571348d>] io_schedule_timeout+0xad/0x130
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5713528>] io_schedule+0x18/0x20
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5711f11>] bit_wait_io+0x11/0x50
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5711ac1>] __wait_on_bit_lock+0x61/0xc0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb5193634>] __lock_page+0x74/0x90
[Thu Jun 14 13:21:49 2018]  [<ffffffffb50bc210>] ? wake_bit_function+0x40/0x40
[Thu Jun 14 13:21:49 2018]  [<ffffffffb51a441e>] invalidate_inode_pages2_range+0x43e/0x460
[Thu Jun 14 13:21:49 2018]  [<ffffffffc0ac4f76>] fuse_reverse_inval_inode+0x66/0x90 [fuse]
[Thu Jun 14 13:21:49 2018]  [<ffffffffc0ab8a18>] fuse_notify+0x168/0x810 [fuse]
[Thu Jun 14 13:21:49 2018]  [<ffffffffc0ab7826>] ? fuse_copy_do+0x46/0x90 [fuse]
[Thu Jun 14 13:21:49 2018]  [<ffffffffc0aba0f7>] fuse_dev_do_write+0x157/0x3f0 [fuse]
[Thu Jun 14 13:21:49 2018]  [<ffffffffc0aba729>] fuse_dev_write+0x79/0xa0 [fuse]
[Thu Jun 14 13:21:49 2018]  [<ffffffffb521a533>] do_sync_write+0x93/0xe0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb521b010>] vfs_write+0xc0/0x1f0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:21:49 2018]  [<ffffffffb521be3f>] SyS_write+0x7f/0xf0
[Thu Jun 14 13:21:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[Thu Jun 14 13:21:49 2018]  [<ffffffffb572082f>] system_call_fastpath+0x1c/0x21
[Thu Jun 14 13:21:49 2018]  [<ffffffffb572077b>] ? system_call_after_swapgs+0xc8/0x160
[root@rhs-client26 ~]# date
Thu Jun 14 16:33:45 IST 2018

Comment 4 Nag Pavan Chilakam 2018-06-14 11:11:13 UTC

 
Volume Name: cross3
Type: Replicate
Volume ID: 432b19c6-ad2e-47bf-b937-89aa10db2214
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-arch-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cross3
Brick2: rhs-arch-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cross3
Brick3: rhs-arch-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cross3
Options Reconfigured:
cluster.post-op-delay-secs: 1
cluster.eager-lock: on
cluster.shd-max-threads: 32
transport.address-family: inet
nfs.disable: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
[root@rhs-arch-srv2 ~]# gluster v status cross3
Status of volume: cross3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhs-arch-srv1.lab.eng.blr.redhat.com:
/gluster/brick3/cross3                      49152     0          Y       4438 
Brick rhs-arch-srv2.lab.eng.blr.redhat.com:
/gluster/brick3/cross3                      49155     0          Y       28799
Brick rhs-arch-srv4.lab.eng.blr.redhat.com:
/gluster/brick3/cross3                      49154     0          Y       20078
Self-heal Daemon on localhost               N/A       N/A        Y       28820
Self-heal Daemon on rhs-arch-srv1.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       25013
Self-heal Daemon on rhs-arch-srv4.lab.eng.b
lr.redhat.com                               N/A       N/A        Y       17605
 
Task Status of Volume cross3
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@rhs-arch-srv2 ~]#

Comment 5 Nag Pavan Chilakam 2018-06-14 13:50:41 UTC

glusterfsd resource details
[root@rhs-arch-srv1 01PERL]# top -n 1 -b|egrep "gluster|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2695 root      20   0  672776  10680   4392 S   0.0  0.1   0:03.40 glusterd
 4438 root      20   0 3190444   1.2g   4792 S   0.0  7.8  89:34.41 glusterfsd
25013 root      20   0 1371120  20960   3388 S   0.0  0.1   0:11.96 glusterfs


[root@rhs-arch-srv2 28799]# top -n 1 -b|egrep "gluster|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28799 root      20   0 4245780   2.1g   4832 S   0.0 13.4   1323:35 glusterfsd
28820 root      20   0 1354728  12204   3424 S   0.0  0.1   0:10.14 glusterfs
29969 root      20   0  672776  13872   4524 S   0.0  0.1   0:09.39 glusterd

[root@rhs-arch-srv4 ~]# top -n 1 -b|egrep "gluster|RES"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
17605 root      20   0 1362924  11388   3424 S   0.0  0.1  31:27.77 glusterfs
19792 root      20   0  672776  11544   4460 S   0.0  0.1   0:09.61 glusterd
20078 root      20   0 2450856 330316   4944 S   0.0  2.0 959:07.62 glusterfsd





restarted the volume and heal info is as below

[root@rhs-arch-srv2 ~]# gluster v heal cross3 info
Brick rhs-arch-srv1.lab.eng.blr.redhat.com:/gluster/brick3/cross3
Status: Connected
Number of entries: 0

Brick rhs-arch-srv2.lab.eng.blr.redhat.com:/gluster/brick3/cross3
<gfid:519f0014-a343-477d-ada7-9a89bdac0e2b> 
Status: Connected
Number of entries: 1

Brick rhs-arch-srv4.lab.eng.blr.redhat.com:/gluster/brick3/cross3
Status: Connected
Number of entries: 0

Comment 6 Pranith Kumar K 2018-06-15 06:47:32 UTC

I am able to recreate it with the script from Nag. I am finding that there are locks that are not getting unlocked at all even after 10 minutes (Most likely these are stale locks). Need to figure out why these locks are not getting unlocked which is leading to I/O hangs leading to too many blocked locks making every operation take a very long time. Will update once I find RC for this.

Comment 7 Pranith Kumar K 2018-06-17 14:53:53 UTC

Observations so far:
1) I tried to find if there are any stale lock issues but if I kill the perl process, everything is coming back to normal.
2) I went through both old and new code bases once more and did a comparison didn't find any issues.
3) I was also able to recreate the issue with eager-lock off.
4) I minimized the steps to re-create as a) run I/O with the script Nag gave b) do gluster volume heal <volname> info. 
5) Did a lot of debugging to see if there are any problems in the locking code but didn't find any even in the case of no eager-lock.
6) As a last resort I tried this heal info case from step 4) on rhgs-3.3.1 which didn't have my fixes. I was able to recreate the stuck locks and locking taking too much time with eager-lock off even on the stable release. So at the moment, I am of the opinion that the application Nag wrote is leading to lot of locks in system which is slowing down the entire transaction leading to this problem.

Will spend some more time tomorrow trying to recreate the issues on rhgs-3.3.1 with eager-lock on with more debugging logs to see if this is a problem that always existed or not.

Good script Nag :-).

Comment 8 Pranith Kumar K 2018-06-19 07:25:50 UTC

The following patch fixed the issue for me. But this can affect the I/O performance while self-heal is in progress. Considering that this issue is day-1 issue and the application I/O is the one causing the issue and we haven't seen this kind of workload in the last 5 years since this feature has been present at the moment I am thinking of fixing the issue in locks xlator data structure which is a very big change and can't be done for 3.3.1 or 3.4.0. If we are able to see similar issue with any of the existing workloads like block/RHHI then I am open to send the fix for this issue:

diff --git a/xlators/cluster/afr/src/afr-transaction.c b/xlators/cluster/afr/src/afr-transaction.c
index 89d03c015..1ecdac78a 100644
--- a/xlators/cluster/afr/src/afr-transaction.c
+++ b/xlators/cluster/afr/src/afr-transaction.c
@@ -2176,9 +2176,7 @@ __afr_eager_lock_handle (afr_local_t *local, gf_boolean_t *take_lock,
         afr_local_t     *owner_local = NULL;
         xlator_t        *this = local->transaction.frame->this;
 
-        if (local->fd && !afr_are_multiple_fds_opened (local, this)) {
-                local->transaction.eager_lock_on = _gf_true;
-        }
+        local->transaction.eager_lock_on = _gf_true;
 
         lock = &local->inode_ctx->lock[local->transaction.type];
         if (__need_previous_lock_unlocked (local)) {
@@ -2197,8 +2195,6 @@ __afr_eager_lock_handle (afr_local_t *local, gf_boolean_t *take_lock,
                                 lock->delay_timer = NULL;
                         }
                 }
-                if (!local->transaction.eager_lock_on)
-                        goto out;
         }
 
         if (lock->release) {

Comment 11 Pranith Kumar K 2018-09-06 09:43:27 UTC

(In reply to Pranith Kumar K from comment #8)
> The following patch fixed the issue for me. But this can affect the I/O
> performance while self-heal is in progress. Considering that this issue is
> day-1 issue and the application I/O is the one causing the issue and we
> haven't seen this kind of workload in the last 5 years since this feature
> has been present at the moment I am thinking of fixing the issue in locks
> xlator data structure which is a very big change and can't be done for 3.3.1
> or 3.4.0. If we are able to see similar issue with any of the existing
> workloads like block/RHHI then I am open to send the fix for this issue:
> 
> diff --git a/xlators/cluster/afr/src/afr-transaction.c
> b/xlators/cluster/afr/src/afr-transaction.c
> index 89d03c015..1ecdac78a 100644
> --- a/xlators/cluster/afr/src/afr-transaction.c
> +++ b/xlators/cluster/afr/src/afr-transaction.c
> @@ -2176,9 +2176,7 @@ __afr_eager_lock_handle (afr_local_t *local,
> gf_boolean_t *take_lock,
>          afr_local_t     *owner_local = NULL;
>          xlator_t        *this = local->transaction.frame->this;
>  
> -        if (local->fd && !afr_are_multiple_fds_opened (local, this)) {
> -                local->transaction.eager_lock_on = _gf_true;
> -        }
> +        local->transaction.eager_lock_on = _gf_true;
>  
>          lock = &local->inode_ctx->lock[local->transaction.type];
>          if (__need_previous_lock_unlocked (local)) {
> @@ -2197,8 +2195,6 @@ __afr_eager_lock_handle (afr_local_t *local,
> gf_boolean_t *take_lock,
>                                  lock->delay_timer = NULL;
>                          }
>                  }
> -                if (!local->transaction.eager_lock_on)
> -                        goto out;
>          }
>  
>          if (lock->release) {

Nag,
    I am sending out the patch on master. Will it be possible to add a glusto case for this?

Pranith.

Comment 12 Atin Mukherjee 2018-11-11 19:55:37 UTC

I'm not sure if this bug needs to continue with severity as 'urgent' though.

Pranith - What'd it take for us to get this bug fixed in upstream. I see you have proposed a patch which is already being merged upstream. So if we're to pull this in one of the batch updates, what all things we need to consider apart from performance?

Comment 13 Pranith Kumar K 2018-11-12 09:27:55 UTC

(In reply to Atin Mukherjee from comment #12)
> I'm not sure if this bug needs to continue with severity as 'urgent' though.
> 
> Pranith - What'd it take for us to get this bug fixed in upstream. I see you
> have proposed a patch which is already being merged upstream. So if we're to
> pull this in one of the batch updates, what all things we need to consider
> apart from performance?

I believe the patch that fixes this issue already went in as part of eager-lock not depending on fd-count changes needed for performance improvement.
Downstream patch: https://code.engineering.redhat.com/gerrit/150700
Downstream Bug: 1630688

What should be done in this scenario with the bug?

Comment 14 Nag Pavan Chilakam 2018-11-12 12:32:52 UTC

(In reply to Pranith Kumar K from comment #13)
> (In reply to Atin Mukherjee from comment #12)
> > I'm not sure if this bug needs to continue with severity as 'urgent' though.
> > 
> > Pranith - What'd it take for us to get this bug fixed in upstream. I see you
> > have proposed a patch which is already being merged upstream. So if we're to
> > pull this in one of the batch updates, what all things we need to consider
> > apart from performance?
> 
> I believe the patch that fixes this issue already went in as part of
> eager-lock not depending on fd-count changes needed for performance
> improvement.
> Downstream patch: https://code.engineering.redhat.com/gerrit/150700
> Downstream Bug: 1630688
> 
> What should be done in this scenario with the bug?

if you think this bug has been fixed, with above patch, you should be moving it to on_qa(after getting the necessary acks)

Comment 15 Pranith Kumar K 2018-11-12 12:34:48 UTC

Sunil,
    Could you please help here.

Pranith

Comment 20 Anees Patel 2018-11-26 07:12:24 UTC

Verified the fix as per the test-plan,
All abnormalities seen per comment #0 were fixed.
Setting the bug to verified.

Note You need to log in before you can comment on or make changes to this bug.