Bug 763105 (GLUSTER-1373) - fileop mkdir fails on 4x3 dist-repl gnfs mount
Summary: fileop mkdir fails on 4x3 dist-repl gnfs mount
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1373
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1-alpha
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Pavan Vilas Sondur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-16 09:47 UTC by Lakshmipathi G
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
gnfs log file (101.06 KB, application/octet-stream)
2010-08-16 06:47 UTC, Lakshmipathi G
no flags Details

Description Lakshmipathi G 2010-08-16 09:47:47 UTC
Running fileop with nfs_beta_rc10 (4x3 distributed replicate) fails with all performance translators enable, But this didn't crash the server.

# /opt/qa/tools-32bit/fileop -f 30 -t

Fileop:  Working in ., File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
Mkdir failed

Comment 1 Lakshmipathi G 2010-08-16 10:23:43 UTC
tested again with same setup. -nfs_beta_rc10 (4x3 distributed replicate)-with all
performance translators enabled,this time it passes.

Comment 2 Shehjar Tikoo 2010-09-04 05:49:29 UTC
With 3.1qa11, it still fails, with glusterfsd crashing in replicate.

fileop -d /mnt/nfs-master-4dist-3repl/fileoptest/ -f  10000
Fileop:  Working in /mnt/nfs-master-4dist-3repl/fileoptest, File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
 .       mkdir   chdir   rmdir  create    open    read   write   close    stat  access   chmod readdir  link    unlink  delete  Total_files


Mkdir failed

The crash trace:
Core was generated by `/home/shehjart/glusterfsd-master/sbin/glusterfs -f /home/shehjart/volfiles/nfs-'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000032d680b722 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000032d680b722 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00002b28015ee20b in fd_unref (fd=0x2aaaae909ad0) at fd.c:467
#2  0x00002aaaaad15686 in afr_local_cleanup (local=0x2aaae0ed16c8, this=0x1a17c548) at afr-common.c:353
#3  0x00002aaaaacee23c in afr_fstat_cbk (frame=0x2b28024c1cb8, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, 
    op_errno=0, buf=0x7fffc9d8cf10) at afr-inode-read.c:351
#4  0x00002aaaaaacac1a in client3_1_fstat_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x2b28024c15b8) at client3_1-fops.c:1042
#5  0x00002b280182bc95 in rpc_clnt_handle_reply (clnt=<value optimized out>, pollin=<value optimized out>) at rpc-clnt.c:734
#6  0x00002b280182be78 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1a19dca8, event=<value optimized out>, data=0x1a169190)
    at rpc-clnt.c:844
#7  0x00002b28018272cc in rpc_transport_notify (this=0xaaaaaab2, event=RPC_TRANSPORT_ACCEPT, data=0x1a169190) at rpc-transport.c:1124
#8  0x00002aaaaed21c2f in socket_event_poll_in (this=0x1a19de88) at socket.c:1561
#9  0x00002aaaaed21dc0 in socket_event_handler (fd=<value optimized out>, idx=10, data=0x1a19de88, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:1675
#10 0x00002b28015efd77 in event_dispatch_epoll_handler (event_pool=0x1a16ab08) at event.c:812
#11 event_dispatch_epoll (event_pool=0x1a16ab08) at event.c:876
#12 0x000000000040470d in main (argc=8, argv=0x7fffc9d8d7c8) at glusterfsd.c:1398

Comment 3 Vijay Bellur 2010-09-18 03:28:10 UTC
Can this be tested with qa26 please?

Comment 4 Shehjar Tikoo 2010-09-18 04:45:14 UTC
(In reply to comment #3)
> Can this be tested with qa26 please?

I've given a small test plan to Prithu. Am working with him to re-run the nfs tests again with gfid changes. This is part of that.

Comment 5 Shehjar Tikoo 2010-09-20 04:16:08 UTC
fileop has a really bad error reporting mechanism so it doesnt actually tell what the error was. "Mkdir failed" doesnt tell me anything.

I tried it and the reason mkdir failed for me with exactly same cmd is because of the OOM killer killing the gluster nfs process. This points to a known memory leak filed as bug 762991.

That must be resulting in fileop receiving an EIO on timeout, because we must have mounted with soft,intr as mount options.

Cmd line used.
fileop -d /mnt/nfs-4dist-master -f  10000

Retrying without those options to see if the mkdir fails without the timeout options.

Comment 6 Vijay Bellur 2010-09-29 14:57:14 UTC
Moving this to 3.1.1 as 3 replica tests can be done post 3.1.0

Comment 7 Shehjar Tikoo 2010-09-30 02:00:20 UTC
Closing. All the tests that I am running and having Prithu run are 4x3 dist-repl. We havent seen this problem at all lately. See previous comments to know why.


Note You need to log in before you can comment on or make changes to this bug.