1761365 – libgfapi: the glfs_init() get stuck and is in inifinitely loop in pthread_spin_lock()

Bug 1761365 - libgfapi: the glfs_init() get stuck and is in inifinitely loop in pthread_spin_lock()

Summary: libgfapi: the glfs_init() get stuck and is in inifinitely loop in pthread_spi...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	libgfapi
Sub Component:
Version:	7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Assignee:	Soumya Koduri
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1761366 (view as bug list)
Depends On:
Blocks:	1643231
TreeView+	depends on / blocked

Reported:	2019-10-14 09:19 UTC by Xiubo Li
Modified:	2020-03-12 12:48 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-12 12:48:40 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Xiubo Li 2019-10-14 09:19:54 UTC

Description of problem:

I am now testing the gfapi stuff based on the gluster-block/tcmu-runner, and hit one problem that the tcmu-runner process is running in almost 100% cpu and get stuck when creating the gluster-block device:

The gluster-block command is:
[root@localhost tcmu-runner]# gluster-block create repvol/block ha 2 prealloc full 10.70.39.238,10.70.39.231 1G

[root@localhost tcmu-runner]# top
top - 14:14:50 up  1:07,  2 users,  load average: 2.06, 1.89, 1.17
Tasks: 116 total,   2 running, 114 sleeping,   0 stopped,   0 zombie
%Cpu(s): 50.0 us,  3.1 sy,  0.0 ni, 46.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   1990.4 total,    853.9 free,    270.9 used,    865.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1560.8 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                
 8020 root      20   0 2412664  35916  21792 R  93.8   1.8  12:39.00 tcmu-runner                                                                                            
    1 root      20   0  108892  15540   9472 S   0.0   0.8   0:02.44 systemd                                                                                                
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd                                                                                               
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp             


[root@localhost tcmu-runner]# perf top -p 8020
Samples: 7K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 1838750000 lost: 0/0 drop: 0/0                                                                        
Overhead  Shared Object       Symbol                                                                                                                                         
  99.95%  libpthread-2.29.so  [.] pthread_spin_lock
   0.01%  [kernel]            [k] __ip_queue_xmit
   0.01%  [kernel]            [k] __softirqentry_text_start
   0.01%  [kernel]            [k] _raw_spin_unlock_irqrestore
   0.01%  [kernel]            [k] run_rebalance_domains


[root@localhost tcmu-runner]# pstack 8020
Thread 17 (Thread 0x7f709a7fc700 (LWP 11351)):
...
Thread 1 (Thread 0x7f7128527880 (LWP 8020)):
#0  0x00007f7128d4f2b5 in pthread_spin_lock () at /lib64/libpthread.so.0
#1  0x00007f7126ba6eba in mem_get () at /lib64/libglusterfs.so.0
#2  0x00007f7126ba6fdd in mem_get0 () at /lib64/libglusterfs.so.0
#3  0x00007f7126b6e004 in get_new_dict_full () at /lib64/libglusterfs.so.0
#4  0x00007f7126b6f9f0 in dict_new () at /lib64/libglusterfs.so.0
#5  0x00007f7126ce9e38 in glfs_init_common () at /lib64/libgfapi.so.0
#6  0x00007f7126cea030 in glfs_init () at /lib64/libgfapi.so.0
#7  0x00007f7126d20332 in tcmu_glfs_unlock () at /usr/lib64/tcmu-runner/handler_glfs.so
[root@localhost tcmu-runner]# 


It is infinitely looping in the libglusterfs.so .......



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install gluterfs-7.0 packages from https://download.gluster.org/pub/gluster/glusterfs/qa-releases/7.0rc3/Fedora/fedora-30/x86_64/ and the tcmu-runner/gluster-block from source.

2. enable and start glusterd/tcmu-runner/gluster-blockd services.

3. create one replicate volume: 
# gluster vol create repvol replica 2 10.70.39.238:/data/repvol 10.70.39.231:/data/repvol force
 
# gluster vol set repvol group gluster-block

# gluster vol start repvol

# gluster volume set repvol locks.mandatory-locking forced

#gluster volume set repvol  enforce-mandatory-lock on

#gluster volume set repvol  performance.client-io-threads off

4. then create the gluster-block device by using:

#  gluster-block create repvol/block ha 2 prealloc full 10.70.39.238,10.70.39.231 1G

5. it will be stuck in Step4.


Actual results:


Expected results:


Additional info:

Comment 1 Xiubo Li 2019-10-14 09:22:46 UTC

*** Bug 1761366 has been marked as a duplicate of this bug. ***

Comment 5 Soumya Koduri 2020-02-28 10:01:41 UTC

Thanks BRs. If you could write any libgfapi program to consistently reproduce this issue, that shall be helpful to further debug this issue.

Comment 6 Xiubo Li 2020-02-28 10:10:24 UTC

(In reply to Soumya Koduri from comment #5)
> Thanks BRs. If you could write any libgfapi program to consistently
> reproduce this issue, that shall be helpful to further debug this issue.

I will try it again with the latest upstream code, will let you know.
Thanks
BRs

Comment 7 Xiubo Li 2020-03-01 01:50:31 UTC

The glusterfs is from the latest upstream and build the rpms myself:

# rpm -qa|grep gluster
glusterfs-libs-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-cli-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-cloudsync-plugins-8dev-0.444.git5425060.el7.centos.x86_64
python2-gluster-6.0-20.el7rhgs.x86_64
glusterfs-api-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-events-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-client-xlators-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-extra-xlators-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-thin-arbiter-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-regression-tests-8dev-0.444.git5425060.el7.centos.x86_64
python3-gluster-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-devel-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-resource-agents-8dev-0.444.git5425060.el7.centos.noarch
glusterfs-api-devel-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-fuse-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-geo-replication-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-server-8dev-0.444.git5425060.el7.centos.x86_64
glusterfs-debuginfo-8dev-0.444.git5425060.el7.centos.x86_64

The tcmu-runner and gluster-block are also from the latest upstream, I have ran it for more than 24 hours and couldn't reproduce it.

Thanks,
BRs

Comment 8 Worker Ant 2020-03-12 12:48:40 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/956, and will be tracked there from now on. Visit GitHub issues URL for further details

Note You need to log in before you can comment on or make changes to this bug.