Bug 852575

Summary: NFS/DHT: rename fails under load
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: glusterdAssignee: Amar Tumballi <amarts>
Status: CLOSED ERRATA QA Contact: Sachidananda Urs <surs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: amarts, gluster-bugs, racpatel, rfortier, rhs-bugs, sac, saujain, sdharane, vbellur, vraman
Target Milestone: ---Flags: sac: needinfo+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0qa6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 802671
: 858484 (view as bug list) Environment:
Last Closed: 2013-09-23 22:39:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 802671    
Bug Blocks: 858484    
Attachments:
Description Flags
rename program none

Description Vidya Sakar 2012-08-29 01:02:40 UTC
+++ This bug was initially created as a clone of Bug #802671 +++

rename fails with errors:

rename: rename(//home-0/rename.dir.4015, //home-0/rename.dir.2.4015): No such file or directory
rename: stat(//home-0/rename.dir.4015): No such file or directory
rename: rmdir(//home-0/rename.dir.4014): No such file or directory
rename: rmdir(//home-0/rename.dir.4013): No such file or directory
rename: rmdir(//home-0/rename.dir.4016): No such file or directory
rename: rmdir(//home-0/rename.dir.4015): No such file or directory

though rmdir, stat, and rename are done synchronously.

Attaching script to reproduce the issue. Please find attached logs as well.

--- Additional comment from sac on 2012-03-13 05:09:20 EDT ---

Created attachment 569608 [details]
Logs for rename failure

--- Additional comment from sac on 2012-03-13 05:13:48 EDT ---

Created attachment 569611 [details]
Program to reproduce

--- Additional comment from sac on 2012-03-13 12:40:03 EDT ---

dmesg:

Call Trace:
 [<ffffffff814dc4be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8117f701>] ? path_put+0x31/0x40
 [<ffffffff814dc35b>] mutex_lock+0x2b/0x50
 [<ffffffff8117ee63>] lock_rename+0xd3/0xe0
 [<ffffffff81183363>] sys_renameat+0x113/0x260
 [<ffffffff810500b0>] ? __dequeue_entity+0x30/0x50
 [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811834cb>] sys_rename+0x1b/0x20
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task rename:32401 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rename        D 0000000000000000     0 32401  32246 0x00000000
 ffff880078b47cf8 0000000000000082 0000000000000000 0000000000000000
 ffff880078b47c88 00000000a3393d40 ffff880078b47c78 000000012a4d6516
 ffff88007a067078 ffff880078b47fd8 000000000000f598 ffff88007a067078
Call Trace:
 [<ffffffff814dc4be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8117f701>] ? path_put+0x31/0x40
 [<ffffffff814dc35b>] mutex_lock+0x2b/0x50
 [<ffffffff8117ee63>] lock_rename+0xd3/0xe0
 [<ffffffff81183363>] sys_renameat+0x113/0x260
 [<ffffffff8120e314>] ? inode_has_perm+0x54/0xa0
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811834cb>] sys_rename+0x1b/0x20
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

--- Additional comment from sac on 2012-03-13 12:42:51 EDT ---

Fails with I/O error. Log snippet:

DHT:

[2012-03-13 02:32:45.280183] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280208] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280219] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280264] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280345] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280373] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280384] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077326] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077397] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077430] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077441] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077461] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077473] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389

=====================


[2012-03-13 12:33:48.935239] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 4fa512be: /rename.dir.2371 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.935269] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 4fa512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
[2012-03-13 12:33:48.935454] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.935545] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.935634] W [client3_1-fops.c:2598:client3_1_lookup] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.935717] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.936464] W [client3_1-fops.c:3757:client3_1_opendir] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.937219] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 54a512be: /rename.dir.2378 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.937248] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 54a512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
[2012-03-13 12:33:48.937522] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937638] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937710] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937834] W [client3_1-fops.c:2598:client3_1_lookup] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.938613] W [client3_1-fops.c:3757:client3_1_opendir] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.939300] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 59a512be: /rename.dir.2374 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.939323] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 59a512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)

--- Additional comment from sac on 2012-03-15 03:13:03 EDT ---

Application errors:

a.out: rename: Device or resource busy
a.out: rename: File exists
a.out: fopen: Remote I/O error
a.out: rename: File exists
a.out: rename: Device or resource busy

==============================
Logs:


[2012-03-14 05:43:15.423293] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.30 on nfs-test-1-client-1
[2012-03-14 05:43:31.235064] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.17
[2012-03-14 05:43:49.817049] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.13
[2012-03-14 05:44:05.672628] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.6
[2012-03-14 05:44:10.342640] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.42 on nfs-test-1-client-0
[2012-03-14 05:44:17.704338] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.40
[2012-03-14 05:44:31.873421] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.37 on nfs-test-1-client-3
[2012-03-14 05:44:42.682963] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.33
[2012-03-14 05:45:00.765599] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.4 on nfs-test-1-client-2
[2012-03-14 05:45:21.322249] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.11
[2012-03-14 05:45:32.088259] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.8
[2012-03-14 05:45:44.789369] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.30 on nfs-test-1-client-1
[2012-03-14 05:45:50.327663] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.31
[2012-03-14 05:46:25.356643] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.1
[2012-03-14 05:46:32.370776] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-nfs-test-1-client-1: remote operation failed: Success. Path: /nfsrename.17
[2012-03-14 05:46:40.739286] W [client3_1-fops.c:511:client3_1_unlink_cbk] 0-nfs-test-1-client-0: remote operation failed: No such file or directory
[2012-03-14 05:46:40.739328] W [dht-rename.c:340:dht_rename_unlink_cbk] 0-nfs-test-1-dht: /nfsrename.nGG7a0: unlink on nfs-test-1-client-0 failed (No such file or directory)
[2012-03-14 05:47:37.471179] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.38
[2012-03-14 05:47:41.392221] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.48
[2012-03-14 05:48:12.094273] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.40
[2012-03-14 05:48:23.639715] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.16
[2012-03-14 05:49:11.593940] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.28 on nfs-test-1-client-2
[2012-03-14 05:49:38.661165] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.35
[2012-03-14 05:50:00.017499] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.33 on nfs-test-1-client-2
[2012-03-14 05:50:07.370571] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.21 on nfs-test-1-client-1
[2012-03-14 05:50:37.808788] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.9
[2012-03-14 05:50:57.532082] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.12
[2012-03-14 05:51:19.719767] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.21
[2012-03-14 05:51:30.566440] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.19 on nfs-test-1-client-3
[2012-03-14 05:51:34.585923] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-nfs-test-1-client-3: remote operation failed: Success. Path: /nfsrename.18
[2012-03-14 05:51:34.586472] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.18
[2012-03-14 05:51:46.509069] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.12
[2012-03-14 05:51:57.722177] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.20 on nfs-test-1-client-1
[2012-03-14 05:52:04.270368] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.36
[2012-03-14 05:53:04.852505] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.10

--- Additional comment from vbellur on 2012-05-18 09:11:36 EDT ---

Pushing it to post 3.3.0

Comment 2 Krishna Srinivas 2012-10-22 10:24:41 UTC
"dht: mismatching layouts" errors seen> re-assigning bug to distribute.

Comment 3 Amar Tumballi 2012-10-23 11:06:23 UTC
Ran the provided script for some time, and not able to reproduce the issue. will keep it open till we have a brew build, and then close the bug for ON_QA to verify again.

Comment 4 Amar Tumballi 2012-10-31 10:10:39 UTC
Sac, not happening on upstream codebase... hence please try to confirm the behavior again.

Comment 7 Amar Tumballi 2012-12-21 07:41:30 UTC
now that downstream (brew build) of 3.4.0 (ie, RHS2.1 source) is available, marking it ON_QA

Comment 8 Sachidananda Urs 2013-01-15 13:05:09 UTC
Created attachment 678768 [details]
rename program

Comment 9 Sachidananda Urs 2013-01-15 13:05:24 UTC
Tested by running multiple instances of the attached program. rename tests now pass, can close the bug.

Comment 12 Scott Haines 2013-09-23 22:39:07 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 13 Scott Haines 2013-09-23 22:43:41 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html