Description of problem: ======================= The command "gluster volume status inode" is getting timed out but without any error message / output [root@dhcp43-18 ~]# time gluster v status alpha inode --timeout=86400 real 30m1.513s user 0m0.311s sys 0m0.374s [root@dhcp43-18 ~]# Snippet of glusterd log: [2018-05-10 09:07:39.375476] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.43.4. Please check log file for details. [2018-05-10 09:07:39.842184] E [MSGID: 106116] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on 10.70.43.9. Please check log file for details. [2018-05-10 09:07:39.842512] E [MSGID: 106152] [glusterd-syncop.c:1641:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s) [2018-05-10 09:07:39.842735] W [glusterd-locks.c:845:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe1379) [0x7f7d16d41379] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe09ca) [0x7f7d16d409ca] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe8935) [0x7f7d16d48935] ) 0-management: Lock for vol alpha not held Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.12.2-8.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: =================== 1. Create a volume 2. FUSE mount the volume 3. Do an untar of 1 or 2 kernel tarball from mount 4. perform "time gluster volume status volname inode --timeout=86400" Actual results: =============== Command gets timed out after 30 minutes Expected results: ================= Command should succeed with the proper output. If the command is failing then appropriate error message must be displayed. Additional info:
The issue was seen while doing kernel tarball untar. kernel tarball untar creates approx 60k inodes and hence the lru list is filled. In the function inode_table_dump_to_dict: Dumping 16384 inodes which are on lru under itable lock can take a few minutes.. , we should not hold the lock for that long. Work around: change inode-lru-limit to lower value before using "gluster v status inodes"and reset it after the command is run. Possible solution: Dumping lru inodes and purge inodes do not seem very useful from debugability perspective, we can avoid them.. the active list inodes can be listed by iterating active-list holding inode ref without itable lock.
Amar/raghavendra Do we need to print the lru and purge lists in gluster v status inodes? Has it been useful anytime / Can we do away with it? The issue is we cannot take a ref and iterate these list as it will activate the inodes. For iterating active list, we can do following /* Take ref on all inodes in active list.. start_inode and end_inode will point to first and last inodes of the list */ inode_active_list_ref(**start_inode, **end_inode) { pthread_mutex_trylock (&itable->lock) list_for_each_entry (inode, &itable->active, list) { __inode_ref(inode); } start_inode = list_first_entry (&itable->active); end_inode = list_last_entry (&itable->active); pthread_mutex_unlock (&itable->lock); } inode_active_list_ref(&start_inode, & end_inode) tmp_inode = start_inode; while (list_entry(tmp_inode) != end_inode) { /* This is costly operation hence do it outside itable lock*/ inode_dump_to_dict(tmp_inode); } inode_active_list_unref(&start_inode, & end_inode)
resetting needinfo i accidentally cancelled
I am OK with not having the details of purge list. Only active is good. But for purge list, and lru list, lets print the size.
https://review.gluster.org/22347/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249