Bug 763347 (GLUSTER-1615) - dbench fails on rdma
Summary: dbench fails on rdma
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1615
Product: GlusterFS
Classification: Community
Component: rdma
Version: mainline
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-16 04:45 UTC by Anush Shetty
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: fuse
Documentation: DNR
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anush Shetty 2010-09-16 01:49:13 UTC
Client log:

[2010-09-15 10:49:56.504216] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279807 sent = 
2010-09-15 10:19:54. timeout = 1800
[2010-09-15 10:53:06.695340] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279809 sent = 
2010-09-15 10:23:04. timeout = 1800
[2010-09-15 10:53:06.695472] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279808 sent = 
2010-09-15 10:23:04. timeout = 1800
[2010-09-15 10:53:06.696597] D [afr-common.c:562:afr_lookup_collect_xattr] sep15_replica-replicate-1: entry self-heal is pending for /clients/client8/~dmtmp/
ACCESS.
[2010-09-15 11:00:37.150115] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(ENTRYLK(31)) xid = 23279810 sent =
 2010-09-15 10:30:35. timeout = 1800
[2010-09-15 11:00:37.150303] D [afr-lk-common.c:989:afr_lock_blocking] sep15_replica-replicate-0: we're done locking
[2010-09-15 11:00:37.150341] D [afr-transaction.c:935:afr_post_blocking_entrylk_cbk] sep15_replica-replicate-0: Blocking entrylks done. Proceeding to FOP
[2010-09-15 11:00:37.151531] D [dht-rename.c:274:dht_rename_unlink_cbk] sep15_replica-dht: unlink on sep15_replica-replicate-0 failed (No such file or direct
ory)
[2010-09-15 11:00:37.151556] W [fuse-bridge.c:1297:fuse_rename_cbk] glusterfs-fuse: 20692474: /clients/client6/~dmtmp/EXCEL/0BBC0000 -> /clients/client6/~dmtmp/EXCEL/RESULTS.XLS => -1 (Transport endpoint is not connected)
[2010-09-15 11:00:37.151754] D [afr-lk-common.c:410:transaction_lk_op] sep15_replica-replicate-0: lk op is for a transaction
[2010-09-15 11:05:17.431195] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(FSYNC(16)) xid = 23279811 sent = 2010-09-15 10:35:15. timeout = 1800
[2010-09-15 11:06:37.511926] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(SETATTR(38)) xid = 23279812 sent = 2010-09-15 10:36:35. timeout = 1800
[2010-09-15 11:06:37.511989] D [afr-self-heal-data.c:107:afr_sh_data_flush_cbk] sep15_replica-replicate-0: flush or setattr failed on /clients/client5/~dmtmp/PM/PMD383.TMP on subvolume sep15_replica-client-0: Transport endpoint is not connected
[2010-09-15 11:06:37.512014] I [afr-self-heal-common.c:1583:afr_self_heal_completion_cbk] sep15_replica-replicate-0: background  meta-data data entry self-heal completed on /clients/client5/~dmtmp/PM/PMD383.TMP
[2010-09-15 11:19:58.317274] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279813 sent = 2010-09-15 10:49:56. timeout = 1800
[2010-09-15 11:23:08.511380] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279815 sent = 2010-09-15 10:53:06. timeout = 1800
[2010-09-15 11:23:08.511520] E [rpc-clnt.c:196:call_bail] sep15_replica-client-0: bailing out frame type(GlusterFS 3.1) op(LOOKUP(27)) xid = 23279814 sent = 2010-09-15 10:53:06. timeout = 1800
[2010-09-15 11:23:08.513528] D [afr-common.c:568:afr_lookup_collect_xattr] sep15_replica-replicate-1: data self-heal is pending for /clients/client8/~dmtmp/ACCESS/LABELS.PRN.

Comment 1 Anush Shetty 2010-09-16 04:45:09 UTC
This was on a distribute+replicate setup and dbench was set to run for 6hrs.

  10    146265     0.12 MB/sec  execute 10575 sec  latency 8983676.996 ms
  10    146265     0.12 MB/sec  execute 10576 sec  latency 8984678.020 ms
  10    146265     0.12 MB/sec  execute 10577 sec  latency 8985679.040 ms
  10    146265     0.12 MB/sec  execute 10578 sec  latency 8986680.061 ms
  10    146265     0.12 MB/sec  execute 10579 sec  latency 8987681.082 ms
  10    146265     0.12 MB/sec  execute 10580 sec  latency 8988682.105 ms
  10    146265     0.12 MB/sec  execute 10581 sec  latency 8989683.126 ms
  10    146265     0.12 MB/sec  execute 10582 sec  latency 8990684.148 ms
  10    146265     0.12 MB/sec  execute 10583 sec  latency 8991685.170 ms
  10    146265     0.12 MB/sec  execute 10584 sec  latency 8992686.192 ms
  10    146265     0.12 MB/sec  execute 10585 sec  latency 8993687.213 ms
  10    146265     0.12 MB/sec  execute 10586 sec  latency 8994688.235 ms
  10    146265     0.12 MB/sec  execute 10587 sec  latency 8995689.257 ms
  10    146265     0.12 MB/sec  execute 10588 sec  latency 8996690.279 ms
  10    146265     0.12 MB/sec  execute 10589 sec  latency 8997691.300 ms
  10    146265     0.12 MB/sec  execute 10590 sec  latency 8998692.322 ms
  10    146265     0.12 MB/sec  execute 10591 sec  latency 8999693.344 ms
  10    146265     0.12 MB/sec  execute 10592 sec  latency 9000694.366 ms
  10    146265     0.12 MB/sec  execute 10593 sec  latency 9001695.387 ms
  10    146265     0.12 MB/sec  execute 10594 sec  latency 9002696.409 ms
  10    146265     0.12 MB/sec  execute 10595 sec  latency 9003697.430 ms
  10    146265     0.12 MB/sec  execute 10596 sec  latency 9004698.453 ms
  10    146265     0.12 MB/sec  execute 10597 sec  latency 9005699.474 ms
  10    146265     0.12 MB/sec  execute 10598 sec  latency 9006700.496 ms
  10    146265     0.12 MB/sec  execute 10599 sec  latency 9007701.517 ms
  10    146265     0.12 MB/sec  execute 10600 sec  latency 9008702.540 ms
  10    146265     0.12 MB/sec  execute 10601 sec  latency 9009703.561 ms
[144300] rename ./clients/client6/~dmtmp/EXCEL/0BBC0000 ./clients/client6/~dmtmp/EXCEL/RESULTS.XLS failed (Transport endpoint is not connected) - expected NT_STATUS_OK
ERROR: child 6 failed at line 144300
Child failed with status 1

Comment 2 Amar Tumballi 2010-09-21 03:50:17 UTC
RDMA is not blocker, but critical

Comment 3 Raghavendra G 2010-11-11 02:55:40 UTC
on a two node replicate setup, dbench ran successfully. we should run tests on distributed replicate setup.

Comment 4 Raghavendra G 2010-11-11 02:57:57 UTC
The tests were run on git commit id eaf0618e47b4e575180a9cbdbeda6ff5.

Comment 5 Raghavendra G 2010-11-14 02:27:28 UTC
dbench was successful on distributed replicate setup also. Hence closing this bug.


Note You need to log in before you can comment on or make changes to this bug.