Bug 987126 - core: trying to create 64K dirs, all brick go down with crash
core: trying to create 64K dirs, all brick go down with crash
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
x86_64 Linux
urgent Severity urgent
: ---
: ---
Assigned To: krishnan parthasarathi
Saurabh
: TestBlocker
Depends On: 986100
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-22 14:27 EDT by Saurabh
Modified: 2016-01-19 01:15 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:29:50 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Saurabh 2013-07-22 14:27:37 EDT
Description of problem:
while creating 64000 directories all the brick processes went down with a core dumped.

Version-Release number of selected component (if applicable):
[root@nfs1 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64


How reproducible:
Hence was able to find the issue for the first time.

Steps to Reproduce:
1. create a volume start it
2. enable quota on the volume
3. set some limit on the root of the volume
4. mount the volume over nfs mount
5. start creating 64K dirs.
[root@rhsauto030 nfs-test]# #for i in `seq 1 64000`; do mkdir dir$i; echo $i; done
6. meanwhile use gluster volume quota <vol-name> limit-usage /dir$i 100MB in a for loop

I started step 6 after the 10K dirs were already created and at the time issue is seen aroun 8.5K dirs had quota limits set. 


Actual results:

from nfs mount point,

54544
mkdir: cannot create directory `dir54545': Input/output error
54545
mkdir: cannot create directory `dir54546': Input/output error
54546
mkdir: cannot create directory `dir54547': Input/output error
54547
mkdir: cannot create directory `dir54548': Input/output error
54548
mkdir: cannot create directory `dir54549': Input/output error
54549
mkdir: cannot create directory `dir54550': Input/output error
54550
mkdir: cannot create directory `dir54551': Input/output error


Status of volume: quota-dist-rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.180:/rhs/bricks/quota-d1r1		49172	Y	23303
Brick 10.70.37.80:/rhs/bricks/quota-d1r2		N/A	N	18615
Brick 10.70.37.216:/rhs/bricks/quota-d2r1		N/A	N	5388
Brick 10.70.37.139:/rhs/bricks/quota-d2r2		N/A	N	17100
Brick 10.70.37.180:/rhs/bricks/quota-d3r1		49173	Y	23314
Brick 10.70.37.80:/rhs/bricks/quota-d3r2		N/A	N	18623
Brick 10.70.37.216:/rhs/bricks/quota-d4r1		N/A	N	5394
Brick 10.70.37.139:/rhs/bricks/quota-d4r2		N/A	N	17111
Brick 10.70.37.180:/rhs/bricks/quota-d5r1		49174	Y	23325
Brick 10.70.37.80:/rhs/bricks/quota-d5r2		N/A	N	18627
Brick 10.70.37.216:/rhs/bricks/quota-d6r1		N/A	N	5402
Brick 10.70.37.139:/rhs/bricks/quota-d6r2		N/A	N	17122
NFS Server on localhost					2049	Y	25673
Self-heal Daemon on localhost				N/A	Y	25680
NFS Server on 10.70.37.216				2049	Y	6885
Self-heal Daemon on 10.70.37.216			N/A	Y	6892
NFS Server on 10.70.37.80				2049	Y	20101
Self-heal Daemon on 10.70.37.80				N/A	Y	20114
NFS Server on 10.70.37.139				2049	Y	18714
Self-heal Daemon on 10.70.37.139			N/A	Y	18721
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    9e281276-6e32-43d6-8028-d06c80dc3b18              3

[root@nfs1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_nfs1-lv_root
                       50G   38G  9.4G  80% /
tmpfs                 4.0G     0  4.0G   0% /dev/shm
/dev/vda1             485M   32M  428M   7% /boot
/dev/mapper/vg_nfs1-lv_home
                      500G  5.6G  494G   2% /rhs/bricks
[root@nfs1 ~]# less /var/log/glusterfs/nfs.log
[root@nfs1 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7999       7483        516          0          3        257
-/+ buffers/cache:       7221        777
Swap:         7999       4185       3814


(gdb) bt
#0  gf_print_trace (signum=11, ctx=0x65342d353962302d) at common-utils.c:519
#1  <signal handler called>
#2  gf_print_trace (signum=7, ctx=0x65342d353962302d) at common-utils.c:519
#3  <signal handler called>
#4  mgmt_pmap_signin_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f7c8738890c)
    at glusterfsd-mgmt.c:2117
#5  0x0000003f6c60cf35 in rpc_clnt_handle_reply (clnt=0x1bc07d0, pollin=0x5d4c6fb0) at rpc-clnt.c:771
#6  0x0000003f6c60df06 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1bc0800, event=<value optimized out>, data=<value optimized out>)
    at rpc-clnt.c:891
#7  0x0000003f6c609838 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:499
#8  0x00007f7c84d5dbd6 in socket_event_poll_in (this=0x1bc5380) at socket.c:2119
#9  0x00007f7c84d5f4ed in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1bc5380, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2231
#10 0x0000003f6be5d4f7 in event_dispatch_epoll_handler (event_pool=0x1ba9eb0) at event-epoll.c:384
#11 event_dispatch_epoll (event_pool=0x1ba9eb0) at event-epoll.c:445
#12 0x00000000004067c6 in main (argc=19, argv=0x7fffa1983328) at glusterfsd.c:1962
(gdb) 



Expected results:
directory creation shouldn't turn in crash
it is not even a performance test, just simple sequential way of creating few thousand directories.

Additional info:
Comment 5 Kaushal 2013-07-23 04:58:41 EDT
This is caused because the buffer used to store volfile in memory was just 128K in size. This was just fixed upstream as a fix for bug-986100. Will backport the patch downstream.
Comment 8 Amar Tumballi 2013-08-17 12:19:36 EDT
This is an issue with 'Quota', which is not yet in. The previous version which was in, was not sufficient. Keeping it on 'fingers crossed' list, for blocker.
Comment 12 Scott Haines 2013-09-23 18:29:50 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.