This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 802671 - NFS/DHT: rename fails under load
NFS/DHT: rename fails under load
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
pre-release
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Rajesh
: Triaged
Depends On:
Blocks: 852575 858484
  Show dependency treegraph
 
Reported: 2012-03-13 05:03 EDT by Sachidananda Urs
Modified: 2013-07-04 18:44 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852575 (view as bug list)
Environment:
Last Closed: 2013-06-19 02:18:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Logs for rename failure (604.57 KB, application/x-bzip)
2012-03-13 05:09 EDT, Sachidananda Urs
no flags Details
Program to reproduce (3.84 KB, application/x-shellscript)
2012-03-13 05:13 EDT, Sachidananda Urs
no flags Details

  None (edit)
Description Sachidananda Urs 2012-03-13 05:03:48 EDT
rename fails with errors:

rename: rename(//home-0/rename.dir.4015, //home-0/rename.dir.2.4015): No such file or directory
rename: stat(//home-0/rename.dir.4015): No such file or directory
rename: rmdir(//home-0/rename.dir.4014): No such file or directory
rename: rmdir(//home-0/rename.dir.4013): No such file or directory
rename: rmdir(//home-0/rename.dir.4016): No such file or directory
rename: rmdir(//home-0/rename.dir.4015): No such file or directory

though rmdir, stat, and rename are done synchronously.

Attaching script to reproduce the issue. Please find attached logs as well.
Comment 1 Sachidananda Urs 2012-03-13 05:09:20 EDT
Created attachment 569608 [details]
Logs for rename failure
Comment 2 Sachidananda Urs 2012-03-13 05:13:48 EDT
Created attachment 569611 [details]
Program to reproduce
Comment 3 Sachidananda Urs 2012-03-13 12:40:03 EDT
dmesg:

Call Trace:
 [<ffffffff814dc4be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8117f701>] ? path_put+0x31/0x40
 [<ffffffff814dc35b>] mutex_lock+0x2b/0x50
 [<ffffffff8117ee63>] lock_rename+0xd3/0xe0
 [<ffffffff81183363>] sys_renameat+0x113/0x260
 [<ffffffff810500b0>] ? __dequeue_entity+0x30/0x50
 [<ffffffff810097ac>] ? __switch_to+0x1ac/0x320
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811834cb>] sys_rename+0x1b/0x20
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task rename:32401 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rename        D 0000000000000000     0 32401  32246 0x00000000
 ffff880078b47cf8 0000000000000082 0000000000000000 0000000000000000
 ffff880078b47c88 00000000a3393d40 ffff880078b47c78 000000012a4d6516
 ffff88007a067078 ffff880078b47fd8 000000000000f598 ffff88007a067078
Call Trace:
 [<ffffffff814dc4be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8117f701>] ? path_put+0x31/0x40
 [<ffffffff814dc35b>] mutex_lock+0x2b/0x50
 [<ffffffff8117ee63>] lock_rename+0xd3/0xe0
 [<ffffffff81183363>] sys_renameat+0x113/0x260
 [<ffffffff8120e314>] ? inode_has_perm+0x54/0xa0
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff811834cb>] sys_rename+0x1b/0x20
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Comment 4 Sachidananda Urs 2012-03-13 12:42:51 EDT
Fails with I/O error. Log snippet:

DHT:

[2012-03-13 02:32:45.280183] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280208] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280219] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280264] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280345] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:45.280373] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:45.280384] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077326] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077397] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077430] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077441] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389
[2012-03-13 02:32:46.077461] I [dht-layout.c:674:dht_layout_dir_mismatch] 0-nfs-test-1-dht: /rename.dir.2.30389 - disk layout missing
[2012-03-13 02:32:46.077473] I [dht-common.c:600:dht_revalidate_cbk] 0-nfs-test-1-dht: mismatching layouts for /rename.dir.2.30389

=====================


[2012-03-13 12:33:48.935239] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 4fa512be: /rename.dir.2371 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.935269] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 4fa512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
[2012-03-13 12:33:48.935454] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.935545] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.935634] W [client3_1-fops.c:2598:client3_1_lookup] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.935717] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.936464] W [client3_1-fops.c:3757:client3_1_opendir] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.937219] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 54a512be: /rename.dir.2378 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.937248] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 54a512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
[2012-03-13 12:33:48.937522] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937638] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937710] W [client3_1-fops.c:2663:client3_1_stat] 0-nfs-test-1-client-1: failed to send the fop Transport endpoint is not connected
[2012-03-13 12:33:48.937834] W [client3_1-fops.c:2598:client3_1_lookup] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.938613] W [client3_1-fops.c:3757:client3_1_opendir] 0-nfs-test-1-client-1: failed to send the fop: Transport endpoint is not connected
[2012-03-13 12:33:48.939300] W [nfs3.c:3524:nfs3svc_rmdir_cbk] 0-nfs: 59a512be: /rename.dir.2374 => -1 (Transport endpoint is not connected)
[2012-03-13 12:33:48.939323] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 59a512be, RMDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected)
Comment 5 Sachidananda Urs 2012-03-15 03:13:03 EDT
Application errors:

a.out: rename: Device or resource busy
a.out: rename: File exists
a.out: fopen: Remote I/O error
a.out: rename: File exists
a.out: rename: Device or resource busy

==============================
Logs:


[2012-03-14 05:43:15.423293] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.30 on nfs-test-1-client-1
[2012-03-14 05:43:31.235064] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.17
[2012-03-14 05:43:49.817049] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.13
[2012-03-14 05:44:05.672628] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.6
[2012-03-14 05:44:10.342640] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.42 on nfs-test-1-client-0
[2012-03-14 05:44:17.704338] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.40
[2012-03-14 05:44:31.873421] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.37 on nfs-test-1-client-3
[2012-03-14 05:44:42.682963] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.33
[2012-03-14 05:45:00.765599] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.4 on nfs-test-1-client-2
[2012-03-14 05:45:21.322249] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.11
[2012-03-14 05:45:32.088259] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.8
[2012-03-14 05:45:44.789369] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.30 on nfs-test-1-client-1
[2012-03-14 05:45:50.327663] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.31
[2012-03-14 05:46:25.356643] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.1
[2012-03-14 05:46:32.370776] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-nfs-test-1-client-1: remote operation failed: Success. Path: /nfsrename.17
[2012-03-14 05:46:40.739286] W [client3_1-fops.c:511:client3_1_unlink_cbk] 0-nfs-test-1-client-0: remote operation failed: No such file or directory
[2012-03-14 05:46:40.739328] W [dht-rename.c:340:dht_rename_unlink_cbk] 0-nfs-test-1-dht: /nfsrename.nGG7a0: unlink on nfs-test-1-client-0 failed (No such file or directory)
[2012-03-14 05:47:37.471179] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.38
[2012-03-14 05:47:41.392221] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.48
[2012-03-14 05:48:12.094273] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.40
[2012-03-14 05:48:23.639715] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.16
[2012-03-14 05:49:11.593940] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.28 on nfs-test-1-client-2
[2012-03-14 05:49:38.661165] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.35
[2012-03-14 05:50:00.017499] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.33 on nfs-test-1-client-2
[2012-03-14 05:50:07.370571] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.21 on nfs-test-1-client-1
[2012-03-14 05:50:37.808788] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.9
[2012-03-14 05:50:57.532082] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.12
[2012-03-14 05:51:19.719767] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-1: remote operation failed: File exists. Path: /nfsrename.21
[2012-03-14 05:51:30.566440] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.19 on nfs-test-1-client-3
[2012-03-14 05:51:34.585923] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-nfs-test-1-client-3: remote operation failed: Success. Path: /nfsrename.18
[2012-03-14 05:51:34.586472] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-3: remote operation failed: File exists. Path: /nfsrename.18
[2012-03-14 05:51:46.509069] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.12
[2012-03-14 05:51:57.722177] I [dht-common.c:956:dht_lookup_everywhere_cbk] 0-nfs-test-1-dht: deleting stale linkfile /nfsrename.20 on nfs-test-1-client-1
[2012-03-14 05:52:04.270368] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-2: remote operation failed: File exists. Path: /nfsrename.36
[2012-03-14 05:53:04.852505] W [client3_1-fops.c:238:client3_1_mknod_cbk] 0-nfs-test-1-client-0: remote operation failed: File exists. Path: /nfsrename.10
Comment 6 Vijay Bellur 2012-05-18 09:11:36 EDT
Pushing it to post 3.3.0
Comment 7 Rajesh 2013-06-19 02:18:03 EDT
/mnt/m1/test# ~/x/rename/rename.sh /mnt/m1/test
/mnt/m1/test# 

I tried to reproduce the issue on the latest HEAD, and after this script ran for about a day, there were no errors found in the application or logs. Please reopen if this issue is seen again.

Note You need to log in before you can comment on or make changes to this bug.