Bug 761913 (GLUSTER-181)

Summary: fileop reports mkdir failed on NFS mount over unfs3booster
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: distributeAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: urgent    
Version: 2.0.5CC: chrisw, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Mkdir failure log from a fileop run over unfs3booster NFS mount none

Description Shehjar Tikoo 2009-07-31 09:20:06 UTC
Created attachment 50 [details]
Tar file to reproduce bug

Comment 1 Shehjar Tikoo 2009-07-31 12:19:10 UTC
fileop on a recent run using the command line below reported the failure of the mkdir syscall.

[root@client12 mount]# /root/shehjart/iozone3_326/src/current/fileop 1 -f 1 -s 1M -b -w -e -t
Mkdir failed

This is not reproduceable reliably but the situation when this occurred was that just before I started this test, I'd restarted the unfs3booster server expecting things to work without problems, as expected.

A snippet from the log file(see attached..) is below:

[2009-07-31 00:00:41] N [trace.c:1341:trace_mkdir] tr: 19: (path=/fileop_L1_0, ino=0, mode=493)

[2009-07-31 00:00:41] N [trace.c:1341:trace_mkdir] tr-below-wb: 19: (path=/fileop_L1_0, ino=0, mode=493)

[2009-07-31 00:00:41] N [trace.c:1695:trace_statfs] tr-brick1: 19: (loc {path=/, ino=0})

[2009-07-31 00:00:41] N [trace.c:1695:trace_statfs] tr-brick2: 19: (loc {path=/, ino=0})

[2009-07-31 00:00:41] N [trace.c:1695:trace_statfs] tr-brick5: 19: (loc {path=/, ino=0})

[2009-07-31 00:00:41] N [trace.c:1695:trace_statfs] tr-brick4: 19: (loc {path=/, ino=0})

[2009-07-31 00:00:41] D [dht-layout.c:101:dht_layout_search] dist-repl: no subvolume for hash (value) = 893457940

[2009-07-31 00:00:41] D [dht-helper.c:228:dht_subvol_get_hashed] dist-repl: could not find subvolume for path=/fileop_L1_0

[2009-07-31 00:00:41] D [dht-common.c:3003:dht_mkdir] dist-repl: hashed subvol not found for /fileop_L1_0

[2009-07-31 00:00:41] N [trace.c:617:trace_mkdir_cbk] tr-below-wb: 19: (op_ret=-1, op_errno=22, ino=0

[2009-07-31 00:00:41] N [trace.c:617:trace_mkdir_cbk] tr: 19: (op_ret=-1, op_errno=22, ino=0

Comment 2 Shehjar Tikoo 2009-11-12 03:00:55 UTC
We've havent experienced exactly similar problems ever again. However, similar errors were seen when the initialization phase in libglusterfsclient finished without all the subvols of a distribute volume being up and ready for use.

We have fixed that temporarily by adding a small sleep time and that seems to work fine for now.