| Summary: | kernel untar fails during add-brick | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Lakshmipathi G <lakshmipathi> | ||||||||
| Component: | distribute | Assignee: | Shehjar Tikoo <shehjart> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | 3.1-alpha | CC: | gluster-bugs | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | Type: | --- | |||||||||
| Regression: | RTP | Mount Type: | nfs | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Amar Tumballi
2010-09-28 05:48:01 UTC
Created attachment 318 [details]
cutting of the /etc/group file, for the groups involved in the problem
on nfs-client mountpt ,untar the kernel and add 2 more bricks to existing 2 dht bricks. untar files with following error ------- linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/ linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/bridge-regs.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/debug-macro.S linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/entry-macro.S linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/gpio.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/hardware.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/io.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/irqs.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/memory.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/mv78xx0.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/system.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/timex.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/uncompress.h linux-2.6.35/arch/arm/mach-mv78xx0/include/mach/vmalloc.h linux-2.6.35/arch/arm/mach-mv78xx0/irq.c tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: Too many errors, quitting tar: Error is not recoverable: exiting now ------- initially ls gave "Stale NFS" message - but changing directory and returning it started working. ---------- [root@ip-10-212-117-143 client5]# pwd /mnt/client5 [root@ip-10-212-117-143 client5]# ls ls: cannot open directory .: Stale NFS file handle [root@ip-10-212-117-143 client5]# ls -ltr ls: cannot open directory .: Stale NFS file handle [root@ip-10-212-117-143 client5]# cd .. [root@ip-10-212-117-143 mnt]# cd client1 [root@ip-10-212-117-143 client1]# ls NFS.SH run24289 [root@ip-10-212-117-143 client1]# cd ../client5 [root@ip-10-212-117-143 client5]# ls linux-2.6.35 linux-2.6.35.tar --------------- Created attachment 320 [details]
blah. this one should work better; wrong version of diff last time.
(In reply to comment #3) > Created an attachment (id=320) [details] > nfs-log-adding single brick adding single brick to exising 2-dht setup ,also has this issue. --- linux-2.6.35/Documentation/filesystems/ecryptfs.txt linux-2.6.35/Documentation/filesystems/exofs.txt linux-2.6.35/Documentation/filesystems/ext2.txt tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: linux-2.6.35.tar: Cannot read: Stale NFS file handle tar: Too many errors, quitting tar: Error is not recoverable: exiting now [root@ip-10-245-210-193 mnt]# ls -- Created attachment 324 tested with qa38,still untar fails. NFS server needs to special case the Transport end-point errors because otherwise the server ends up returning an EIO which may be resulting in the errors we're seeing here. [2010-10-01 17:25:13.672120] T [rpc-clnt.c:1216:rpc_clnt_record] : Auth Info: pid: 0, uid: 0, gid: 0, owner: 1287 [2010-10-01 17:25:13.672139] T [rpc-clnt.c:1116:rpc_clnt_record_build_header] rpc-clnt: Request fraglen 156, payload: 28, rpc hdr: 128 [2010-10-01 17:25:13.672361] T [rpc-clnt.c:1388:rpc_clnt_submit] rpc-clnt: submitted request (XID: 0x4c0 Program: GlusterFS 3.1, ProgVers: 310, Proc: 16) to rpc-transport (new-client-1) [2010-10-01 17:25:13.676783] D [glusterfsd-mgmt.c:650:glusterfs_mgmt_pmap_signout] fsd-mgmt: portmapper signout arguments not given [2010-10-01 17:25:13.676815] I [glusterfsd.c:668:cleanup_and_exit] glusterfsd: shutting down [2010-10-01 17:25:13.676831] D [nfs.c:845:fini] nfs: NFS service going down [2010-10-01 17:25:13.677073] D [rpcsvc.c:2771:nfs_rpcsvc_program_unregister] nfsrpc: Program unregistered: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-10-01 17:25:13.677200] D [rpcsvc.c:2771:nfs_rpcsvc_program_unregister] nfsrpc: Program unregistered: MOUNT1, Num: 100005, Ver: 1, Port: 38466 [2010-10-01 17:25:13.677332] D [rpcsvc.c:2771:nfs_rpcsvc_program_unregister] nfsrpc: Program unregistered: NFS3, Num: 100003, Ver: 3, Port: 38467 [2010-10-01 17:25:13.677356] I [io-stats.c:1680:fini] new: io-stats translator unloaded [2010-10-01 17:25:13.677883] T [socket.c:2569:fini] new-client-1: transport 0xce2648 destroyed [2010-10-01 17:25:13.677911] D [rpc-clnt.c:489:rpc_clnt_connection_cleanup] rpc-clnt: cleaning up state in transport object 0xce2648 [2010-10-01 17:25:13.677945] E [rpc-clnt.c:338:saved_frames_unwind] rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(FSYNC(16)) called at 2010-10-01 17:25:13.672359 [2010-10-01 17:25:13.677968] D [dht-common.c:1480:dht_fsync_cbk] new-dht: subvolume new-client-1 returned -1 (Transport endpoint is not connected) [2010-10-01 17:25:13.677998] T [write-behind.c:442:wb_sync] new-write-behind: no vectors are to besynced [2010-10-01 17:25:13.678028] D [nfs3-helpers.c:2446:nfs3_log_commit_res] nfs-nfsv3: XID: 5cc1172e, COMMIT: NFS: 10006(Error occurred on the server or IO Error), POSIX: 107(Transport endpoint is not connected), wverf: 1285934079 nfs receives this error because it gets this error before any child-down is received for this volume. applied this patch to qa39. http://dev.gluster.com/~shehjart/0001-nfs-nfs3-Disable-subvolume-on-ENOTCONN.patch untar fails when adding 2 bricks. nfs trace log can be found at /share/tickets/1724 >
> untar fails when adding 2 bricks. nfs trace log can be found at
> /share/tickets/1724
thats a wrong nfs-server log. now moved the correct nfs server log along with tcpdump at /share/tickets/1724/logs
Upgrading to critical trace inspection through wireshark shows a file is receiving two different inode numbers from nfs server. Dump file showing the differing inode numbers for the same file handle is at dev:/share/tickets/1724/logs/dump3.bin The trace starts with a lookup request number 11 with reply at 12. The file handle returned is 0x450a00c4 and the fileid returned is 67141636. Much later, at getattr request at number 10481 with reply at 10483, for the same file handle we receive a fileid of 100712454. This is what is causing a read request failure for linux tar file on the same mountpoint. Its a bug in nfs3-helpers.c:nfs3_stat_to_fattr3
fa.fileid = buf->ia_ino;
ia_ino needs to be filled using the gfid, which is not being done right now.
PATCH: http://patches.gluster.com/patch/5247 in master (nfs,nfs3: Disable subvolume on ENOTCONN) PATCH: http://patches.gluster.com/patch/5248 in master (nfs3: Convert gfid into inode number) PATCH: http://patches.gluster.com/patch/5275 in master (nfs3: Convert gfid to ino only for non-root) PATCH: http://patches.gluster.com/patch/5337 in master (nfs: Revert downed-subvolume changes) PATCH: http://patches.gluster.com/patch/5338 in master (nfs3: Fix gfid to ino conversion) |