| Summary: | fileop mkdir fails on 4x3 dist-repl gnfs mount | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Lakshmipathi G <lakshmipathi> | ||||
| Component: | replicate | Assignee: | Pavan Vilas Sondur <pavan> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 3.1-alpha | CC: | gluster-bugs, shehjart, vijay | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | --- | |||||
| Regression: | RTP | Mount Type: | nfs | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Lakshmipathi G
2010-08-16 09:47:47 UTC
tested again with same setup. -nfs_beta_rc10 (4x3 distributed replicate)-with all performance translators enabled,this time it passes. With 3.1qa11, it still fails, with glusterfsd crashing in replicate.
fileop -d /mnt/nfs-master-4dist-3repl/fileoptest/ -f 10000
Fileop: Working in /mnt/nfs-master-4dist-3repl/fileoptest, File size is 1, Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
. mkdir chdir rmdir create open read write close stat access chmod readdir link unlink delete Total_files
Mkdir failed
The crash trace:
Core was generated by `/home/shehjart/glusterfsd-master/sbin/glusterfs -f /home/shehjart/volfiles/nfs-'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000032d680b722 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00000032d680b722 in pthread_spin_lock () from /lib64/libpthread.so.0
#1 0x00002b28015ee20b in fd_unref (fd=0x2aaaae909ad0) at fd.c:467
#2 0x00002aaaaad15686 in afr_local_cleanup (local=0x2aaae0ed16c8, this=0x1a17c548) at afr-common.c:353
#3 0x00002aaaaacee23c in afr_fstat_cbk (frame=0x2b28024c1cb8, cookie=<value optimized out>, this=<value optimized out>, op_ret=0,
op_errno=0, buf=0x7fffc9d8cf10) at afr-inode-read.c:351
#4 0x00002aaaaaacac1a in client3_1_fstat_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>,
myframe=0x2b28024c15b8) at client3_1-fops.c:1042
#5 0x00002b280182bc95 in rpc_clnt_handle_reply (clnt=<value optimized out>, pollin=<value optimized out>) at rpc-clnt.c:734
#6 0x00002b280182be78 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1a19dca8, event=<value optimized out>, data=0x1a169190)
at rpc-clnt.c:844
#7 0x00002b28018272cc in rpc_transport_notify (this=0xaaaaaab2, event=RPC_TRANSPORT_ACCEPT, data=0x1a169190) at rpc-transport.c:1124
#8 0x00002aaaaed21c2f in socket_event_poll_in (this=0x1a19de88) at socket.c:1561
#9 0x00002aaaaed21dc0 in socket_event_handler (fd=<value optimized out>, idx=10, data=0x1a19de88, poll_in=1, poll_out=0, poll_err=0)
at socket.c:1675
#10 0x00002b28015efd77 in event_dispatch_epoll_handler (event_pool=0x1a16ab08) at event.c:812
#11 event_dispatch_epoll (event_pool=0x1a16ab08) at event.c:876
#12 0x000000000040470d in main (argc=8, argv=0x7fffc9d8d7c8) at glusterfsd.c:1398
Can this be tested with qa26 please? (In reply to comment #3) > Can this be tested with qa26 please? I've given a small test plan to Prithu. Am working with him to re-run the nfs tests again with gfid changes. This is part of that. fileop has a really bad error reporting mechanism so it doesnt actually tell what the error was. "Mkdir failed" doesnt tell me anything. I tried it and the reason mkdir failed for me with exactly same cmd is because of the OOM killer killing the gluster nfs process. This points to a known memory leak filed as bug 762991. That must be resulting in fileop receiving an EIO on timeout, because we must have mounted with soft,intr as mount options. Cmd line used. fileop -d /mnt/nfs-4dist-master -f 10000 Retrying without those options to see if the mkdir fails without the timeout options. Moving this to 3.1.1 as 3 replica tests can be done post 3.1.0 Closing. All the tests that I am running and having Prithu run are 4x3 dist-repl. We havent seen this problem at all lately. See previous comments to know why. |