+++ This bug was initially created as a clone of Bug #804592 +++ Description of problem: Renames fails with ENOENT while the graph change was going on. This was a single export volume. Version-Release number of selected component (if applicable): 3.3.0qa29 How reproducible: Consistently Steps to Reproduce: 1. while true; do echo 'sdsdsd' > dot; mv dot dot2; rm -rf *; done 2. while true; do gluster volume set test2 performance.write-behind off; sleep 1; gluster volume set test2 performance.write-behind on; sleep 1; done 3. Actual results: mv: cannot move `dot' to `dot2': No such file or directory Expected results: Renames should continue without errors. Additional info: Client log: [2012-03-19 16:31:06.223804] E [fuse-bridge.c:1511:fuse_rename_resume] 0-glusterfs-fuse: RENAME 45223 00000000-0000-0000-0000-000000000000/do t -> 00000000-0000-0000-0000-000000000000/dot2 src resolution failed [2012-03-19 16:31:06.229473] E [fuse-bridge.c:1329:fuse_unlink_resume] 0-glusterfs-fuse: UNLINK 1 (00000000-0000-0000-0000-000000000000/dot) resolution failed [2012-03-19 16:31:06.235730] E [fuse-bridge.c:1511:fuse_rename_resume] 0-glusterfs-fuse: RENAME 45240 00000000-0000-0000-0000-000000000000/do t -> 00000000-0000-0000-0000-000000000000/dot2 src resolution failed [2012-03-19 16:31:06.243696] E [fuse-bridge.c:1329:fuse_unlink_resume] 0-glusterfs-fuse: UNLINK 1 (00000000-0000-0000-0000-000000000000/dot) resolution failed [2012-03-19 16:31:06.249440] E [fuse-bridge.c:1511:fuse_rename_resume] 0-glusterfs-fuse: RENAME 45257 00000000-0000-0000-0000-000000000000/do t -> 00000000-0000-0000-0000-000000000000/dot2 src resolution failed [2012-03-19 16:31:06.252887] E [fuse-bridge.c:1329:fuse_unlink_resume] 0-glusterfs-fuse: UNLINK 1 (00000000-0000-0000-0000-000000000000/dot) resolution failed [2012-03-19 16:31:06.259771] E [fuse-bridge.c:1511:fuse_rename_resume] 0-glusterfs-fuse: RENAME 45274 00000000-0000-0000-0000-000000000000/do t -> 00000000-0000-0000-0000-000000000000/dot2 src resolution failed [2012-03-19 16:31:06.262661] E [fuse-bridge.c:1329:fuse_unlink_resume] 0-glusterfs-fuse: UNLINK 1 (00000000-0000-0000-0000-000000000000/dot) resolution failed [2012-03-19 16:31:06.271101] E [fuse-bridge.c:1511:fuse_rename_resume] 0-glusterfs-fuse: RENAME 45291 00000000-0000-0000-0000-000000000000/do t -> 00000000-0000-0000-0000-000000000000/dot2 src resolution failed [2012-03-19 16:31:06.273614] E [fuse-bridge.c:1329:fuse_unlink_resume] 0-glusterfs-fuse: UNLINK 1 (00000000-0000-0000-0000-000000000000/dot) resolution failed [2012-03-19 16:31:06.277884] E [fuse-bridge.c:1511:fuse_rename_resume] 0-glusterfs-fuse: RENAME 45308 00000000-0000-0000-0000-000000000000/do t -> 00000000-0000-0000-0000-000000000000/dot2 src resolution failed [2012-03-19 16:31:06.280381] E [fuse-bridge.c:1329:fuse_unlink_resume] 0-glusterfs-fuse: UNLINK 1 (00000000-0000-0000-0000-000000000000/dot) resolution failed --- Additional comment from amarts on 2012-03-25 03:14:16 EDT --- Check if its already fixed. --- Additional comment from ashetty on 2012-03-26 01:00:36 EDT --- This issue still exists on the mainline. --- Additional comment from rgowdapp on 2012-04-02 23:35:05 EDT --- Patch has been sent for review at http://review.gluster.com/#change,3007 --- Additional comment from ashetty on 2012-04-23 03:07:12 EDT --- With http://review.gluster.com/#change,3181 and http://review.gluster.com/#change,3181, this issue is fixed. --- Additional comment from ashetty on 2012-04-23 03:08:09 EDT --- With http://review.gluster.com/#change,3007 and http://review.gluster.com/#change,3181 I meant. --- Additional comment from aavati on 2012-05-08 18:34:57 EDT --- CHANGE: http://review.gluster.com/3007 (fuse-resolve: consider cases where an entry should be resolved even when parent belongs to active itable.) merged in master by Anand Avati (avati) --- Additional comment from aavati on 2012-05-15 20:09:15 EDT --- CHANGE: http://review.gluster.com/3181 (fuse-resolve: Attempt fd-migration in resolver, if migration was never attempted.) merged in master by Anand Avati (avati) --- Additional comment from vbellur on 2012-05-18 09:12:23 EDT --- Addressing this post 3.3.0.
with proper fd-migration, it should be fixed now in upstream, not seeing the issue.
We still saw renames failing on glusterfs-3.3.0.5rhs-40.el6rhs.x86_64. Client sosreport here - http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/848326/sosreport-rhsvm02.848326-20130121144432-b8bb.tar.xz
we had set the 'fixed in version' as 3.4.0qa5 and also bug is targeted for rhs-2.1.0, so please confirm with the correct binary.
We saw a OOM kill for the same testcase with glusterfs-3.4.0.17rhs-1.el6rhs.x86_64 out of memory: Kill process 2948 (glusterfs) score 933 or sacrifice child Killed process 2948, UID 0, (glusterfs) total-vm:7634196kB, anon-rss:7225540kB, file-rss:1360kB vdsm invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 vdsm cpuset=/ mems_allowed=0 Pid: 1919, comm: vdsm Not tainted 2.6.32-358.14.1.el6.x86_64 #1 Call Trace: [<ffffffff810cb561>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [<ffffffff8111cd80>] ? dump_header+0x90/0x1b0 [<ffffffff8111d202>] ? oom_kill_process+0x82/0x2a0 [<ffffffff8111d141>] ? select_bad_process+0xe1/0x120 [<ffffffff8111d640>] ? out_of_memory+0x220/0x3c0 [<ffffffff8112c2ec>] ? __alloc_pages_nodemask+0x8ac/0x8d0 [<ffffffff811609ea>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8111a167>] ? __page_cache_alloc+0x87/0x90 [<ffffffff81119b4e>] ? find_get_page+0x1e/0xa0 [<ffffffff8111b127>] ? filemap_fault+0x1a7/0x500 [<ffffffff81007ca2>] ? check_events+0x12/0x20 [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10 [<ffffffff81143124>] ? __do_fault+0x54/0x530 [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff815106bc>] ? _spin_unlock_irqrestore+0x1c/0x20 [<ffffffff811436f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff810074fd>] ? xen_force_evtchn_callback+0xd/0x10 [<ffffffff81007ca2>] ? check_events+0x12/0x20 [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff81004a49>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff8114438a>] ? handle_mm_fault+0x23a/0x310 [<ffffffff810474e9>] ? __do_page_fault+0x139/0x480 [<ffffffff81007c8f>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff815106bc>] ? _spin_unlock_irqrestore+0x1c/0x20 [<ffffffff811c78d6>] ? ep_poll+0x306/0x330 [<ffffffff81063330>] ? default_wake_function+0x0/0x20 [<ffffffff815137be>] ? do_page_fault+0x3e/0xa0 [<ffffffff81510b75>] ? page_fault+0x25/0x30 Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 32 CPU 1: hi: 186, btch: 31 usd: 51 Node 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 31 CPU 1: hi: 186, btch: 31 usd: 164 active_anon:1831753 inactive_anon:1 isolated_anon:0 active_file:12 inactive_file:699 isolated_file:0 unevictable:10968 dirty:1 writeback:2 unstable:0 free:6454 slab_reclaimable:2094 slab_unreclaimable:5661 mapped:1267 shmem:35 pagetables:5030 bounce:0 Node 0 DMA free:576kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:572kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 4024 7559 7559 Node 0 DMA32 free:20068kB min:5920kB low:7400kB high:8880kB active_anon:3815240kB inactive_anon:0kB active_file:32kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:4kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:56kB slab_unreclaimable:112kB kernel_stack:8kB pagetables:7492kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:118 all_unreclaimable? yes lowmem_reserve[]: 0 0 3535 3535 Node 0 Normal free:1081988kB min:5200kB low:6500kB high:7800kB active_anon:2432940kB inactive_anon:4kB active_file:4kB inactive_file:4300kB unevictable:43872kB isolated(anon):0kB isolated(file):0kB present:3619840kB mlocked:43872kB dirty:44kB writeback:0kB mapped:5500kB shmem:140kB slab_reclaimable:8300kB slab_unreclaimable:22532kB kernel_stack:1568kB pagetables:12584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 0*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 576kB Node 0 DMA32: 277*4kB 128*8kB 77*16kB 63*32kB 57*64kB 47*128kB 34*256kB 32*512kB 10*1024kB 3*2048kB 2*4096kB = 64708kB Node 0 Normal: 3023*4kB 2015*8kB 1523*16kB 1177*32kB 938*64kB 785*128kB 739*256kB 608*512kB 229*1024kB 45*2048kB 1*4096kB = 1081988kB 2220 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1966079 pages RAM 87605 pages reserved 11357 pages shared 1560180 pages non-shared [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 343] 0 343 2726 200 1 -17 -1000 udevd [ 828] 0 828 2279 153 1 0 0 dhclient [ 866] 0 866 63855 341 1 0 0 rsyslogd [ 888] 32 888 4743 193 0 0 0 rpcbind [ 908] 29 908 5836 292 0 0 0 rpc.statd [ 940] 0 940 6290 133 1 0 0 rpc.idmapd [ 963] 0 963 64671 2750 0 0 0 glusterd [ 1091] 81 1091 5350 154 0 0 0 dbus-daemon [ 1121] 68 1121 6231 331 1 0 0 hald [ 1122] 0 1122 4526 185 1 0 0 hald-runner [ 1187] 0 1187 16029 260 0 -17 -1000 sshd [ 1197] 38 1197 7540 315 0 0 0 ntpd [ 1217] 0 1217 21652 510 0 0 0 sendmail [ 1227] 51 1227 19540 445 1 0 0 sendmail [ 1242] 0 1242 27042 135 0 0 0 ksmtuned [ 1374] 0 1374 41995 1046 0 -17 -1000 multipathd [ 1413] 0 1413 3387 829 1 0 0 wdmd [ 1435] 179 1435 65809 4379 1 0 0 sanlock [ 1436] 0 1436 5769 71 0 0 0 sanlock-helper [ 1469] 0 1469 7236 5699 1 0 -17 iscsiuio [ 1474] 0 1474 1217 127 0 0 0 iscsid [ 1475] 0 1475 1342 830 1 0 -17 iscsid [ 1488] 0 1488 232150 1545 0 0 0 libvirtd [ 1764] 0 1764 2725 188 0 -17 -1000 udevd [ 1765] 0 1765 2725 175 0 -17 -1000 udevd [ 1806] 36 1806 2300 116 0 0 0 respawn [ 1809] 36 1809 363823 5323 1 0 0 vdsm [ 1816] 0 1816 29302 292 1 0 0 crond [ 1833] 0 1833 25971 127 0 0 0 rhsmcertd [ 1866] 0 1866 1014 141 1 0 0 mingetty [ 1868] 0 1868 1014 141 1 0 0 mingetty [ 1871] 0 1871 1014 142 1 0 0 mingetty [ 1873] 0 1873 1014 142 1 0 0 mingetty [ 1875] 0 1875 1014 142 1 0 0 mingetty [ 1877] 0 1877 1014 141 1 0 0 mingetty [ 1889] 0 1889 19105 373 0 0 0 sudo [ 1890] 0 1890 151115 4201 0 0 0 python [ 2089] 0 2089 30134 689 0 0 0 screen [ 2090] 0 2090 27075 300 0 0 0 bash [ 2098] 0 2098 27075 299 1 0 0 bash [ 2117] 0 2117 14950 448 0 0 0 ssh [ 2118] 0 2118 27075 298 0 0 0 bash [ 2131] 0 2131 14950 416 0 0 0 ssh [ 2132] 0 2132 27075 298 1 0 0 bash [ 2139] 0 2139 14950 416 0 0 0 ssh [ 2140] 0 2140 27075 299 1 0 0 bash [ 2153] 0 2153 14950 415 0 0 0 ssh [ 2749] 0 2749 78077 7601 0 0 0 glusterfs [ 2824] 0 2824 23947 470 0 0 0 sshd [ 2828] 0 2828 27075 289 0 0 0 bash [ 2846] 0 2846 29677 231 1 0 0 screen [ 2959] 0 2959 27315 553 0 0 0 bash [17593] 0 17593 25225 134 0 0 0 sleep [20223] 0 20223 28404 187 0 0 0 mv Out of memory: Kill process 1435 (sanlock) score 2 or sacrifice child Killed process 1436, UID 0, (sanlock-helper) total-vm:23076kB, anon-rss:168kB, file-rss:116kB [root@ip-10-253-22-43 ~]#
Dev ack to 3.0 RHS BZs
Is the current issue is OOM kill or fd not being migrated properly? If its the former, its a known issue as of now since memory consumed by old graphs (and inodes, fds etc in that old graph) is not freed. However, the application should not see any errors while doing I/O on fds the were opened prior to graph switch. If it is just about OOM kill, the bug can be closed by documenting it as a known issue.
The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days