Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 763105 (GLUSTER-1373)

Summary:

fileop mkdir fails on 4x3 dist-repl gnfs mount

Product:

[Community] GlusterFS

Reporter:

Lakshmipathi G <lakshmipathi>

Component:

replicate

Assignee:

Pavan Vilas Sondur <pavan>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

low

Version:

3.1-alpha

CC:

gluster-bugs, shehjart, vijay

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

RTP

Mount Type:

nfs

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
gnfs log file	none

Description Lakshmipathi G 2010-08-16 09:47:47 UTC

Running fileop with nfs_beta_rc10 (4x3 distributed replicate) fails with all performance translators enable, But this didn't crash the server.

# /opt/qa/tools-32bit/fileop -f 30 -t

Fileop:  Working in ., File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
Mkdir failed

Comment 1 Lakshmipathi G 2010-08-16 10:23:43 UTC

tested again with same setup. -nfs_beta_rc10 (4x3 distributed replicate)-with all
performance translators enabled,this time it passes.

Comment 2 Shehjar Tikoo 2010-09-04 05:49:29 UTC

With 3.1qa11, it still fails, with glusterfsd crashing in replicate.

fileop -d /mnt/nfs-master-4dist-3repl/fileoptest/ -f  10000
Fileop:  Working in /mnt/nfs-master-4dist-3repl/fileoptest, File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
 .       mkdir   chdir   rmdir  create    open    read   write   close    stat  access   chmod readdir  link    unlink  delete  Total_files


Mkdir failed

The crash trace:
Core was generated by `/home/shehjart/glusterfsd-master/sbin/glusterfs -f /home/shehjart/volfiles/nfs-'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000032d680b722 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000032d680b722 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00002b28015ee20b in fd_unref (fd=0x2aaaae909ad0) at fd.c:467
#2  0x00002aaaaad15686 in afr_local_cleanup (local=0x2aaae0ed16c8, this=0x1a17c548) at afr-common.c:353
#3  0x00002aaaaacee23c in afr_fstat_cbk (frame=0x2b28024c1cb8, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, 
    op_errno=0, buf=0x7fffc9d8cf10) at afr-inode-read.c:351
#4  0x00002aaaaaacac1a in client3_1_fstat_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x2b28024c15b8) at client3_1-fops.c:1042
#5  0x00002b280182bc95 in rpc_clnt_handle_reply (clnt=<value optimized out>, pollin=<value optimized out>) at rpc-clnt.c:734
#6  0x00002b280182be78 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1a19dca8, event=<value optimized out>, data=0x1a169190)
    at rpc-clnt.c:844
#7  0x00002b28018272cc in rpc_transport_notify (this=0xaaaaaab2, event=RPC_TRANSPORT_ACCEPT, data=0x1a169190) at rpc-transport.c:1124
#8  0x00002aaaaed21c2f in socket_event_poll_in (this=0x1a19de88) at socket.c:1561
#9  0x00002aaaaed21dc0 in socket_event_handler (fd=<value optimized out>, idx=10, data=0x1a19de88, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:1675
#10 0x00002b28015efd77 in event_dispatch_epoll_handler (event_pool=0x1a16ab08) at event.c:812
#11 event_dispatch_epoll (event_pool=0x1a16ab08) at event.c:876
#12 0x000000000040470d in main (argc=8, argv=0x7fffc9d8d7c8) at glusterfsd.c:1398

Comment 3 Vijay Bellur 2010-09-18 03:28:10 UTC

Can this be tested with qa26 please?

Comment 4 Shehjar Tikoo 2010-09-18 04:45:14 UTC

(In reply to comment #3)
> Can this be tested with qa26 please?

I've given a small test plan to Prithu. Am working with him to re-run the nfs tests again with gfid changes. This is part of that.

Comment 5 Shehjar Tikoo 2010-09-20 04:16:08 UTC

fileop has a really bad error reporting mechanism so it doesnt actually tell what the error was. "Mkdir failed" doesnt tell me anything.

I tried it and the reason mkdir failed for me with exactly same cmd is because of the OOM killer killing the gluster nfs process. This points to a known memory leak filed as bug 762991.

That must be resulting in fileop receiving an EIO on timeout, because we must have mounted with soft,intr as mount options.

Cmd line used.
fileop -d /mnt/nfs-4dist-master -f  10000

Retrying without those options to see if the mkdir fails without the timeout options.

Comment 6 Vijay Bellur 2010-09-29 14:57:14 UTC

Moving this to 3.1.1 as 3 replica tests can be done post 3.1.0

Comment 7 Shehjar Tikoo 2010-09-30 02:00:20 UTC

Closing. All the tests that I am running and having Prithu run are 4x3 dist-repl. We havent seen this problem at all lately. See previous comments to know why.