1578703 – gluster volume status inode getting timed out after 30 minutes with no output/error

Bug 1578703 - gluster volume status inode getting timed out after 30 minutes with no output/error

Summary: gluster volume status inode getting timed out after 30 minutes with no output...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Sheetal Pamecha
QA Contact:	Vinayak Papnoi
Docs Contact:
URL:
Whiteboard:
Depends On:	1580315
Blocks:	1503143 1696807
TreeView+	depends on / blocked

Reported:	2018-05-16 08:32 UTC by Vinayak Papnoi
Modified:	2019-10-30 12:20 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-6.0-1
Doc Type:	Bug Fix
Doc Text:	Previously, running "gluster volume status <volname> inode" output the entire inode table, which could time out and create performance issues. The output of this command is now more streamlined, and the original information should now be obtained by performing a statedump.
Clone Of:
Clones:	1580315 (view as bug list)
Environment:
Last Closed:	2019-10-30 12:19:38 UTC
Embargoed:
Dependent Products:
Flags:	spamecha: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:3249	0	None	None	None	2019-10-30 12:20:03 UTC

Description Vinayak Papnoi 2018-05-16 08:32:48 UTC

Description of problem:
=======================

The command "gluster volume status inode" is getting timed out but without any error message / output

[root@dhcp43-18 ~]# time gluster v status alpha inode --timeout=86400

real    30m1.513s
user    0m0.311s
sys     0m0.374s
[root@dhcp43-18 ~]#


Snippet of glusterd log:

[2018-05-10 09:07:39.375476] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.43.4. Please check log file for details.
[2018-05-10 09:07:39.842184] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.43.9. Please check log file for details.
[2018-05-10 09:07:39.842512] E [MSGID: 106152] [glusterd-syncop.c:1641:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2018-05-10 09:07:39.842735] W [glusterd-locks.c:845:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe1379) [0x7f7d16d41379] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe09ca) [0x7f7d16d409ca] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe8935) [0x7f7d16d48935] ) 0-management: Lock for vol alpha not held


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.12.2-8.el7rhgs.x86_64


How reproducible:
=================

2/2


Steps to Reproduce:
===================

1. Create a volume
2. FUSE mount the volume
3. Do an untar of 1 or 2 kernel tarball from mount
4. perform "time gluster volume status volname inode --timeout=86400"


Actual results:
===============

Command gets timed out after 30 minutes


Expected results:
=================

Command should succeed with the proper output. If the command is failing then appropriate error message must be displayed.


Additional info:

Comment 2 Sanoj Unnikrishnan 2018-05-16 08:55:34 UTC

The issue was seen while doing kernel tarball untar. 
kernel tarball untar creates approx 60k inodes and hence the lru list is filled.

In the function inode_table_dump_to_dict:
Dumping 16384 inodes which are on lru under itable lock can take a few minutes.. , we should not hold the lock for that long. 

Work around: 
change inode-lru-limit to lower value before using "gluster v status inodes"and reset it after the command is run.


Possible solution:

Dumping lru inodes and purge inodes do not seem very useful from debugability perspective, we can avoid them..

the active list inodes can be listed by iterating active-list holding inode ref without itable lock.

Comment 3 Sanoj Unnikrishnan 2018-05-17 07:26:03 UTC

Amar/raghavendra

Do we need to print the lru and purge lists in gluster v status inodes?
Has it been useful anytime / Can we do away with it?

The issue is we cannot take a ref and iterate these list as it will activate the inodes.


For iterating active list, we can do following

/* Take ref on all inodes in active list.. start_inode and end_inode will point  to first and last inodes of the list */

inode_active_list_ref(**start_inode, **end_inode)
{

       pthread_mutex_trylock (&itable->lock)
       list_for_each_entry (inode, &itable->active, list) {
           __inode_ref(inode);        
       }
       start_inode = list_first_entry (&itable->active);
       end_inode = list_last_entry (&itable->active);
       pthread_mutex_unlock (&itable->lock);
}
 
      
 inode_active_list_ref(&start_inode, & end_inode) 
 tmp_inode = start_inode; 
 while (list_entry(tmp_inode) != end_inode) {
              /* This is costly operation hence do it outside itable lock*/
              inode_dump_to_dict(tmp_inode);
 }
 inode_active_list_unref(&start_inode, & end_inode)

Comment 6 Sanoj Unnikrishnan 2018-05-17 09:15:59 UTC

resetting needinfo i accidentally cancelled

Comment 7 Amar Tumballi 2018-05-17 09:41:08 UTC

I am OK with not having the details of purge list. Only active is good. But for purge list, and lru list, lets print the size.

Comment 14 Amar Tumballi 2019-03-13 04:46:23 UTC

https://review.gluster.org/22347/

Comment 22 errata-xmlrpc 2019-10-30 12:19:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249

Note You need to log in before you can comment on or make changes to this bug.