Client log: [2010-09-15 10:49:56.504216] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279807 sent = 2010-09-15 10:19:54. timeout = 1800 [2010-09-15 10:53:06.695340] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279809 sent = 2010-09-15 10:23:04. timeout = 1800 [2010-09-15 10:53:06.695472] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279808 sent = 2010-09-15 10:23:04. timeout = 1800 [2010-09-15 10:53:06.696597] D [afr-common.c:562:afr_lookup_collect_xattr] sep15_replica-replicate-1: entry self-heal is pending for /clients/client8/~dmtmp/ ACCESS. [2010-09-15 11:00:37.150115] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(ENTRYLK(31)) xid = 23279810 sent = 2010-09-15 10:30:35. timeout = 1800 [2010-09-15 11:00:37.150303] D [afr-lk-common.c:989:afr_lock_blocking] sep15_replica-replicate-0: we're done locking [2010-09-15 11:00:37.150341] D [afr-transaction.c:935:afr_post_blocking_entrylk_cbk] sep15_replica-replicate-0: Blocking entrylks done. Proceeding to FOP [2010-09-15 11:00:37.151531] D [dht-rename.c:274:dht_rename_unlink_cbk] sep15_replica-dht: unlink on sep15_replica-replicate-0 failed (No such file or direct ory) [2010-09-15 11:00:37.151556] W [fuse-bridge.c:1297:fuse_rename_cbk] glusterfs-fuse: 20692474: /clients/client6/~dmtmp/EXCEL/0BBC0000 -> /clients/client6/~dmtmp/EXCEL/RESULTS.XLS => -1 (Transport endpoint is not connected) [2010-09-15 11:00:37.151754] D [afr-lk-common.c:410:transaction_lk_op] sep15_replica-replicate-0: lk op is for a transaction [2010-09-15 11:05:17.431195] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(FSYNC(16)) xid = 23279811 sent = 2010-09-15 10:35:15. timeout = 1800 [2010-09-15 11:06:37.511926] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(SETATTR(38)) xid = 23279812 sent = 2010-09-15 10:36:35. timeout = 1800 [2010-09-15 11:06:37.511989] D [afr-self-heal-data.c:107:afr_sh_data_flush_cbk] sep15_replica-replicate-0: flush or setattr failed on /clients/client5/~dmtmp/PM/PMD383.TMP on subvolume sep15_replica-client-0: Transport endpoint is not connected [2010-09-15 11:06:37.512014] I [afr-self-heal-common.c:1583:afr_self_heal_completion_cbk] sep15_replica-replicate-0: background meta-data data entry self-heal completed on /clients/client5/~dmtmp/PM/PMD383.TMP [2010-09-15 11:19:58.317274] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279813 sent = 2010-09-15 10:49:56. timeout = 1800 [2010-09-15 11:23:08.511380] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279815 sent = 2010-09-15 10:53:06. timeout = 1800 [2010-09-15 11:23:08.511520] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279814 sent = 2010-09-15 10:53:06. timeout = 1800 [2010-09-15 11:23:08.513528] D [afr-common.c:568:afr_lookup_collect_xattr] sep15_replica-replicate-1: data self-heal is pending for /clients/client8/~dmtmp/ACCESS/LABELS.PRN.
This was on a distribute+replicate setup and dbench was set to run for 6hrs. 10 146265 0.12 MB/sec execute 10575 sec latency 8983676.996 ms 10 146265 0.12 MB/sec execute 10576 sec latency 8984678.020 ms 10 146265 0.12 MB/sec execute 10577 sec latency 8985679.040 ms 10 146265 0.12 MB/sec execute 10578 sec latency 8986680.061 ms 10 146265 0.12 MB/sec execute 10579 sec latency 8987681.082 ms 10 146265 0.12 MB/sec execute 10580 sec latency 8988682.105 ms 10 146265 0.12 MB/sec execute 10581 sec latency 8989683.126 ms 10 146265 0.12 MB/sec execute 10582 sec latency 8990684.148 ms 10 146265 0.12 MB/sec execute 10583 sec latency 8991685.170 ms 10 146265 0.12 MB/sec execute 10584 sec latency 8992686.192 ms 10 146265 0.12 MB/sec execute 10585 sec latency 8993687.213 ms 10 146265 0.12 MB/sec execute 10586 sec latency 8994688.235 ms 10 146265 0.12 MB/sec execute 10587 sec latency 8995689.257 ms 10 146265 0.12 MB/sec execute 10588 sec latency 8996690.279 ms 10 146265 0.12 MB/sec execute 10589 sec latency 8997691.300 ms 10 146265 0.12 MB/sec execute 10590 sec latency 8998692.322 ms 10 146265 0.12 MB/sec execute 10591 sec latency 8999693.344 ms 10 146265 0.12 MB/sec execute 10592 sec latency 9000694.366 ms 10 146265 0.12 MB/sec execute 10593 sec latency 9001695.387 ms 10 146265 0.12 MB/sec execute 10594 sec latency 9002696.409 ms 10 146265 0.12 MB/sec execute 10595 sec latency 9003697.430 ms 10 146265 0.12 MB/sec execute 10596 sec latency 9004698.453 ms 10 146265 0.12 MB/sec execute 10597 sec latency 9005699.474 ms 10 146265 0.12 MB/sec execute 10598 sec latency 9006700.496 ms 10 146265 0.12 MB/sec execute 10599 sec latency 9007701.517 ms 10 146265 0.12 MB/sec execute 10600 sec latency 9008702.540 ms 10 146265 0.12 MB/sec execute 10601 sec latency 9009703.561 ms [144300] rename ./clients/client6/~dmtmp/EXCEL/0BBC0000 ./clients/client6/~dmtmp/EXCEL/RESULTS.XLS failed (Transport endpoint is not connected) - expected NT_STATUS_OK ERROR: child 6 failed at line 144300 Child failed with status 1
RDMA is not blocker, but critical
on a two node replicate setup, dbench ran successfully. we should run tests on distributed replicate setup.
The tests were run on git commit id eaf0618e47b4e575180a9cbdbeda6ff5.
dbench was successful on distributed replicate setup also. Hence closing this bug.