Bug 764191 (GLUSTER-2459)
Summary: | Some files are inaccessible until root reads them | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Need Real Name <landman> | ||||||
Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 3.1.2 | CC: | gluster-bugs, landman | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | Type: | --- | |||||||
Regression: | RTP | Mount Type: | nfs | ||||||
Documentation: | DP | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Need Real Name
2011-02-23 20:46:26 UTC
On a RHEL5.4 client [landman@blackbird ~]$ cat /etc/redhat-release CentOS release 5.4 (Final) [landman@blackbird ~]$ cat /gevol/assets/.config/volumes.xml cat: /gevol/assets/.config/volumes.xml: Operation not permitted Then as root on the same machine [root@blackbird ~]# cat /gevol/assets/.config/volumes.xml <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <VolumeDefList> <volumedefs> <item key="assetroot"> <netpath>/gevol/assets</netpath> <host>xxx</host> <localpath>/gevol/assets</localpath> <reserveSpace>100000000</reserveSpace> <isTmp>0</isTmp> </item> <item key="opt"> <netpath>/opt/google/share/tutorials</netpath> <host>xxx</host> <localpath>/opt/google/share/tutorials</localpath> <reserveSpace>100000000</reserveSpace> <isTmp>0</isTmp> </item> <item key="src"> <netpath>/gevol/src</netpath> <host>xxx</host> <localpath>/gevol/src</localpath> <reserveSpace>100000000</reserveSpace> <isTmp>0</isTmp> </item> </volumedefs> </VolumeDefList> then the same client in the same window that failed moments before [landman@blackbird ~]$ cat /gevol/assets/.config/volumes.xml <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <VolumeDefList> <volumedefs> <item key="assetroot"> <netpath>/gevol/assets</netpath> <host>xxx</host> <localpath>/gevol/assets</localpath> <reserveSpace>100000000</reserveSpace> <isTmp>0</isTmp> </item> <item key="opt"> <netpath>/opt/google/share/tutorials</netpath> <host>xxx</host> <localpath>/opt/google/share/tutorials</localpath> <reserveSpace>100000000</reserveSpace> <isTmp>0</isTmp> </item> <item key="src"> <netpath>/gevol/src</netpath> <host>xxx</host> <localpath>/gevol/src</localpath> <reserveSpace>100000000</reserveSpace> <isTmp>0</isTmp> </item> </volumedefs> </VolumeDefList> The volume is as follows: [root@manager ~]# gluster volume info fusion Volume Name: fusion Type: Distribute Status: Started Number of Bricks: 6 Transport-type: tcp Bricks: Brick1: dv4-1:/data/brick-md1/fusion Brick2: dv4-2:/data/brick-md1/fusion Brick3: dv4-3:/data/brick-md1/fusion Brick4: dv4-1:/data/brick-md2/fusion Brick5: dv4-2:/data/brick-md2/fusion Brick6: dv4-3:/data/brick-md2/fusion Options Reconfigured: performance.cache-refresh-timeout: 0 performance.stat-prefetch: 0 auth.allow: * mount options on the clients are xxx:/fusion/blackbird /gevol nfs rw,nosuid,nodev,intr,hard,noacl,nolock,noac 0 0 So this appears to be an attribute caching problem. Since I can recreate this user on the manager node, mount the directory, and have no issues whatsoever in accessing any of the files. This may or may not be related to a kernel client side bug in the NFS client. Given the age of the kernel we aren't sure. We will test with a newer kernel. Is there anything we can do in terms of a near term workaround? They need to use NFS (In reply to comment #0) > xxx:/fusion/blackbird /gevol nfs rw,nosuid,nodev,intr,hard,noacl,nolock,noac 0 > 0 > > So this appears to be an attribute caching problem. Since I can recreate this > user on the manager node, mount the directory, and have no issues whatsoever in > accessing any of the files. Why do you think it is related to attribute caching? The noac mount option disables attribute caching. I am trying to reproduce it. Please provide the ls -l output for this file. on the affected machines: [landman@blackbird ~]$ ls -l /gevol/assets/.config/* -rwxr-xr-x 1 pollreid 52030 249 Feb 4 04:51 /gevol/assets/.config/CombinedTerrain.taskrule -rw-rw-r-- 1 landman wedge 0 Feb 23 13:30 /gevol/assets/.config/garbage -rwxr-xr-x 1 pollreid 52030 247 Feb 4 04:51 /gevol/assets/.config/MapLayerLevel.taskrule -rw-r--r-- 1 gefusionuser gegroup 290 Dec 21 13:44 /gevol/assets/.config/misc.xml -rwxr-xr-x 1 pollreid 52030 220 Feb 4 04:51 /gevol/assets/.config/PacketLevel.taskrule -rw-r--r-- 1 gefusionuser gegroup 828 Dec 21 13:44 /gevol/assets/.config/volumes.xml [landman@blackbird ~]$ cat /gevol/assets/.config/garbage cat: /gevol/assets/.config/garbage: Input/output error yet from a mount that doesn't exhibit this problem [landman@manager ~]$ ls -l /gevol/assets/.config/* -rwxr-xr-x 1 52030 52030 249 Feb 4 04:51 /gevol/assets/.config/CombinedTerrain.taskrule -rw-rw-r-- 1 landman wedge 0 Feb 23 13:30 /gevol/assets/.config/garbage -rwxr-xr-x 1 52030 52030 247 Feb 4 04:51 /gevol/assets/.config/MapLayerLevel.taskrule -rw-r--r-- 1 312 315 290 Dec 21 13:44 /gevol/assets/.config/misc.xml -rwxr-xr-x 1 52030 52030 220 Feb 4 04:51 /gevol/assets/.config/PacketLevel.taskrule -rw-r--r-- 1 312 315 828 Dec 21 13:44 /gevol/assets/.config/volumes.xml [landman@manager ~]$ cat /gevol/assets/.config/garbage [landman@manager ~]$ I agree that the noac should disable attribute caching on the client. This appears to be a server side attribute caching issue. Customer has noted that it more often occurs when the files in question aren't on the same computer as the NFS export being mounted. Thanks. Here is what I need now: 1. Before doing the cat again on the affected system, set the log-level for the NFS server to TRACE. 2. Run: dmesg -c >/dev/null; 3. Run: echo 65535 > /proc/sys/sunrpc/nfs_debug 761736. Run the cat command. 5. Run: dmesg > /tmp/nfs-client.log. 4. If it fails again with IO error, please attach here the nfs.log file from the glusterd logs directory and /tmp/nfs-client.log Created attachment 440 [details]
This file contains no-problem Turkish consolefonts (12, 14 and 16 weight)
nfs-client.log.gz : uncompress with "gzip -d nfs-client.log.gz"
Created attachment 441 logs from /var/log/gluster on the server Files attached as per instructions. Output from cat was this: [landman@compute-0-2 ~]$ cat /gevol/assets/.config/* <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <TaskRule> <taskname>CombinedTerrain</taskname> <inputConstraints/> <outputConstraints/> <cpuConstraint> <minNumCPU>4</minNumCPU> <maxNumCPU>4</maxNumCPU> </cpuConstraint> </TaskRule> cat: /gevol/assets/.config/garbage: Input/output error cat: /gevol/assets/.config/MapLayerLevel.taskrule: Operation not permitted <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <MiscConfigStorage> <NFSVisibilityDelay>300</NFSVisibilityDelay> <AssetCacheSize>32000</AssetCacheSize> <VersionCacheSize>32000</VersionCacheSize> <GenerateProductPreviews>1</GenerateProductPreviews> </MiscConfigStorage> cat: /gevol/assets/.config/PacketLevel.taskrule: Operation not permitted cat: /gevol/assets/.config/volumes.xml: Operation not permitted adding me into cc list Hi The problem operation is: [landman@blackbird ~]$ cat /gevol/assets/.config/garbage cat: /gevol/assets/.config/garbage: Input/output error It shows up in the nfs.log as: [2011-02-25 11:41:17.863139] D [nfs3-helpers.c:2389:nfs3_log_rw_call] nfs-nfsv3: XID: c2523400, READ: args: FH: hashcount 4, exportid 6522e543-ea10-4684-8312-1fda37dbdb2f, gfid 7e25e922-6869-4336-9600-27d6354b25bd, offset: 0, count: 4096 [2011-02-25 11:41:17.863150] T [nfs3.c:1791:nfs3_read] nfs-nfsv3: FH to Volume: fusion [2011-02-25 11:41:17.863159] T [nfs3-helpers.c:3098:nfs3_fh_resolve_inode] nfs-nfsv3: FH needs inode resolution [2011-02-25 11:41:17.863167] T [nfs3-helpers.c:2523:nfs3_fh_resolve_inode_done] nfs-nfsv3: FH inode resolved [2011-02-25 11:41:17.863177] T [nfs3-helpers.c:2238:nfs3_file_open_and_resume] nfs-nfsv3: Opening: /blackbird/assets/.config/garbage [2011-02-25 11:41:17.863185] T [nfs3-helpers.c:2218:nfs3_fdcache_getfd] nfs-nfsv3: fd found in state: 2 [2011-02-25 11:41:17.863193] T [nfs3-helpers.c:1926:__nfs3_fdcache_update_entry] nfs-nfsv3: Updating fd: 0x7f9b602db024 [2011-02-25 11:41:17.863209] T [nfs.c:412:nfs_user_create] nfs: uid: 52033, gid 311, gids: 1 [2011-02-25 11:41:17.863218] T [nfs.c:420:nfs_user_create] nfs: gid: 311 [2011-02-25 11:41:17.863225] T [nfs-fops.c:133:nfs_create_frame] nfs: uid: 52033, gid 311, gids: 1 [2011-02-25 11:41:17.863233] T [nfs-fops.c:135:nfs_create_frame] nfs: gid: 311 [2011-02-25 11:41:17.863246] T [write-behind.c:442:wb_sync] fusion-write-behind: no vectors are to besynced [2011-02-25 11:41:17.863262] T [rpc-clnt.c:1295:rpc_clnt_record] : Auth Info: pid: 0, uid: 52033, gid: 311, owner: 260 [2011-02-25 11:41:17.863272] T [rpc-clnt.c:1195:rpc_clnt_record_build_header] rpc-clnt: Request fraglen 152, payload: 24, rpc hdr: 128 [2011-02-25 11:41:17.863301] T [rpc-clnt.c:1499:rpc_clnt_submit] rpc-clnt: submitted request (XID: 0x296x Program: GlusterFS 3.1, ProgVers: 310, Proc: 25) to rpc-transport (fusion-client-1) [2011-02-25 11:41:17.863631] T [rpc-clnt.c:631:rpc_clnt_reply_init] rpc-clnt: recieved rpc message (RPC XID: 0x296x Program: GlusterFS 3.1, ProgVers: 310, Proc: 25) from rpc-transport (fusion-client-1) [2011-02-25 11:41:17.863666] T [write-behind.c:442:wb_sync] fusion-write-behind: no vectors are to besynced [2011-02-25 11:41:17.863695] D [nfs3-helpers.c:2431:nfs3_log_read_res] nfs-nfsv3: XID: c2523400, READ: NFS: 0(Call completed successfully.), POSIX: -1(Unknown error 18446744073709551615), count: 0, is_eof: 0 Which means, a 0 length read is returned without the EOF flag set. For a potential work-around, please try disabling io-cache and quick-read. We fixed a bug each both post-3.1.2. (In reply to comment #11) > For a potential work-around, please try disabling io-cache and quick-read. We > fixed a bug each both post-3.1.2. Joe, please try with the work-around above and let us know, thanks. Closing....Please re-open if the work-around didnt work. Thanks. |