Bug 854629 - [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back up dbench fails
[Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back u...
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.0
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Amar Tumballi
M S Vishwanath Bhat
: ZStream
Depends On: GLUSTER-3782
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-05 09:27 EDT by Vidya Sakar
Modified: 2016-05-31 21:56 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.33rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: GLUSTER-3782
Environment:
Last Closed: 2013-10-03 05:26:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-09-05 09:27:36 EDT
+++ This bug was initially created as a clone of Bug #765514 +++

Created attachment 718 [details]
Bad XF86Config generated by installer.

--- Additional comment from msvbhat@redhat.com on 2011-11-04 03:11:42 EDT ---

Created a pure replicate volume with rdma transport type. Mounted via fuse and started running dbench for 1000 secs with 20 clients. After sometime I took down one of the brick. Now when the brick comes back online, dbench failed in unlink with following message.

  20     21179     6.03 MB/sec  execute 160 sec  latency 1102.911 ms
  20     21242     6.06 MB/sec  execute 161 sec  latency 952.563 ms
  20     21305     6.09 MB/sec  execute 162 sec  latency 346.382 ms
[21398] open ./clients/client12/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 0
  20     21357     6.10 MB/sec  execute 163 sec  latency 414.598 ms
[21152] open ./clients/client17/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21447] open ./clients/client3/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client13/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21370     6.07 MB/sec  execute 164 sec  latency 1099.871 ms
[21139] open ./clients/client6/~dmtmp/ACCESS/SALES.PRN succeeded for handle 21340
[21152] open ./clients/client6/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21436] open ./clients/client14/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client7/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21405     6.07 MB/sec  execute 165 sec  latency 712.598 ms
[21648] open ./clients/client10/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21436] open ./clients/client13/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21648] open ./clients/client16/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client8/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client4/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
  20     21474     6.12 MB/sec  execute 166 sec  latency 850.422 ms
[21436] open ./clients/client7/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
  20     21561     6.13 MB/sec  execute 167 sec  latency 205.801 ms
  20     21644     6.14 MB/sec  execute 168 sec  latency 312.451 ms
  20     21717     6.12 MB/sec  execute 169 sec  latency 94.683 ms
  20     21770     6.09 MB/sec  execute 170 sec  latency 178.992 ms
  20     21819     6.07 MB/sec  execute 171 sec  latency 171.873 ms
  20     21860     6.04 MB/sec  execute 172 sec  latency 338.510 ms
  20     21861     6.00 MB/sec  execute 173 sec  latency 1338.601 ms
  20     21872     5.97 MB/sec  execute 174 sec  latency 1463.730 ms
  20     21886     5.94 MB/sec  execute 175 sec  latency 1369.267 ms
[21886] unlink ./clients/client12/~dmtmp/COREL/GRAPH1.CDR failed (No such file or directory) - expected NT_STATUS_OK
ERROR: child 12 failed at line 21886
Child failed with status 1


Last time I tried, dbench failed as soon as the replicate brick went down. Now it's failing when it comes back online.

I have attached the client log.

--- Additional comment from pkarampu@redhat.com on 2012-02-24 03:44:48 EST ---

Is it happening on 3.2 branch?

--- Additional comment from msvbhat@redhat.com on 2012-02-24 04:05:07 EST ---

It happened on RHSSA 3.2.4. I haven't checked whether it's still happening recently.

--- Additional comment from vbellur@redhat.com on 2012-03-29 07:41:40 EDT ---

Can you please check if the problem persists on 3.3 now?
Comment 4 Sudhir D 2013-07-30 01:48:20 EDT
removed 2.1 as rdma is not slated for this release.
Comment 5 RHEL Product and Program Management 2013-10-03 05:26:43 EDT
Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.