Red Hat Bugzilla – Bug 987126
core: trying to create 64K dirs, all brick go down with crash
Last modified: 2016-01-19 01:15:27 EST
Description of problem:
while creating 64000 directories all the brick processes went down with a core dumped.
Version-Release number of selected component (if applicable):
[root@nfs1 ~]# rpm -qa | grep glusterfs
Hence was able to find the issue for the first time.
Steps to Reproduce:
1. create a volume start it
2. enable quota on the volume
3. set some limit on the root of the volume
4. mount the volume over nfs mount
5. start creating 64K dirs.
[root@rhsauto030 nfs-test]# #for i in `seq 1 64000`; do mkdir dir$i; echo $i; done
6. meanwhile use gluster volume quota <vol-name> limit-usage /dir$i 100MB in a for loop
I started step 6 after the 10K dirs were already created and at the time issue is seen aroun 8.5K dirs had quota limits set.
from nfs mount point,
mkdir: cannot create directory `dir54545': Input/output error
mkdir: cannot create directory `dir54546': Input/output error
mkdir: cannot create directory `dir54547': Input/output error
mkdir: cannot create directory `dir54548': Input/output error
mkdir: cannot create directory `dir54549': Input/output error
mkdir: cannot create directory `dir54550': Input/output error
mkdir: cannot create directory `dir54551': Input/output error
Status of volume: quota-dist-rep
Gluster process Port Online Pid
Brick 10.70.37.180:/rhs/bricks/quota-d1r1 49172 Y 23303
Brick 10.70.37.80:/rhs/bricks/quota-d1r2 N/A N 18615
Brick 10.70.37.216:/rhs/bricks/quota-d2r1 N/A N 5388
Brick 10.70.37.139:/rhs/bricks/quota-d2r2 N/A N 17100
Brick 10.70.37.180:/rhs/bricks/quota-d3r1 49173 Y 23314
Brick 10.70.37.80:/rhs/bricks/quota-d3r2 N/A N 18623
Brick 10.70.37.216:/rhs/bricks/quota-d4r1 N/A N 5394
Brick 10.70.37.139:/rhs/bricks/quota-d4r2 N/A N 17111
Brick 10.70.37.180:/rhs/bricks/quota-d5r1 49174 Y 23325
Brick 10.70.37.80:/rhs/bricks/quota-d5r2 N/A N 18627
Brick 10.70.37.216:/rhs/bricks/quota-d6r1 N/A N 5402
Brick 10.70.37.139:/rhs/bricks/quota-d6r2 N/A N 17122
NFS Server on localhost 2049 Y 25673
Self-heal Daemon on localhost N/A Y 25680
NFS Server on 10.70.37.216 2049 Y 6885
Self-heal Daemon on 10.70.37.216 N/A Y 6892
NFS Server on 10.70.37.80 2049 Y 20101
Self-heal Daemon on 10.70.37.80 N/A Y 20114
NFS Server on 10.70.37.139 2049 Y 18714
Self-heal Daemon on 10.70.37.139 N/A Y 18721
Task ID Status
---- -- ------
Rebalance 9e281276-6e32-43d6-8028-d06c80dc3b18 3
[root@nfs1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
50G 38G 9.4G 80% /
tmpfs 4.0G 0 4.0G 0% /dev/shm
/dev/vda1 485M 32M 428M 7% /boot
500G 5.6G 494G 2% /rhs/bricks
[root@nfs1 ~]# less /var/log/glusterfs/nfs.log
[root@nfs1 ~]# free -m
total used free shared buffers cached
Mem: 7999 7483 516 0 3 257
-/+ buffers/cache: 7221 777
Swap: 7999 4185 3814
#0 gf_print_trace (signum=11, ctx=0x65342d353962302d) at common-utils.c:519
#1 <signal handler called>
#2 gf_print_trace (signum=7, ctx=0x65342d353962302d) at common-utils.c:519
#3 <signal handler called>
#4 mgmt_pmap_signin_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f7c8738890c)
#5 0x0000003f6c60cf35 in rpc_clnt_handle_reply (clnt=0x1bc07d0, pollin=0x5d4c6fb0) at rpc-clnt.c:771
#6 0x0000003f6c60df06 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1bc0800, event=<value optimized out>, data=<value optimized out>)
#7 0x0000003f6c609838 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
#8 0x00007f7c84d5dbd6 in socket_event_poll_in (this=0x1bc5380) at socket.c:2119
#9 0x00007f7c84d5f4ed in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1bc5380, poll_in=1, poll_out=0,
poll_err=0) at socket.c:2231
#10 0x0000003f6be5d4f7 in event_dispatch_epoll_handler (event_pool=0x1ba9eb0) at event-epoll.c:384
#11 event_dispatch_epoll (event_pool=0x1ba9eb0) at event-epoll.c:445
#12 0x00000000004067c6 in main (argc=19, argv=0x7fffa1983328) at glusterfsd.c:1962
directory creation shouldn't turn in crash
it is not even a performance test, just simple sequential way of creating few thousand directories.
This is caused because the buffer used to store volfile in memory was just 128K in size. This was just fixed upstream as a fix for bug-986100. Will backport the patch downstream.
This is an issue with 'Quota', which is not yet in. The previous version which was in, was not sufficient. Keeping it on 'fingers crossed' list, for blocker.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
For information on the advisory, and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.