Bug 854629

Summary: [Red Hat SSA-3.2.4] when one of the replicate pair goes down and comes back up dbench fails
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: glusterfsAssignee: Amar Tumballi <amarts>
Status: CLOSED WONTFIX QA Contact: M S Vishwanath Bhat <vbhat>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: gluster-bugs, mzywusko, rfortier, rhs-bugs, sdharane, vbellur, vbhat, vijay, vraman
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.33rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: GLUSTER-3782 Environment:
Last Closed: 2013-10-03 09:26:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 765514    
Bug Blocks:    

Description Vidya Sakar 2012-09-05 13:27:36 UTC
+++ This bug was initially created as a clone of Bug #765514 +++

Created attachment 718 [details]
Bad XF86Config generated by installer.

--- Additional comment from msvbhat on 2011-11-04 03:11:42 EDT ---

Created a pure replicate volume with rdma transport type. Mounted via fuse and started running dbench for 1000 secs with 20 clients. After sometime I took down one of the brick. Now when the brick comes back online, dbench failed in unlink with following message.

  20     21179     6.03 MB/sec  execute 160 sec  latency 1102.911 ms
  20     21242     6.06 MB/sec  execute 161 sec  latency 952.563 ms
  20     21305     6.09 MB/sec  execute 162 sec  latency 346.382 ms
[21398] open ./clients/client12/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 0
  20     21357     6.10 MB/sec  execute 163 sec  latency 414.598 ms
[21152] open ./clients/client17/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21447] open ./clients/client3/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client13/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21370     6.07 MB/sec  execute 164 sec  latency 1099.871 ms
[21139] open ./clients/client6/~dmtmp/ACCESS/SALES.PRN succeeded for handle 21340
[21152] open ./clients/client6/~dmtmp/ACCESS/LABELS.PRN succeeded for handle 19548
[21436] open ./clients/client14/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21325] open ./clients/client7/~dmtmp/COREL/GRAPH1.CDR succeeded for handle 21063
  20     21405     6.07 MB/sec  execute 165 sec  latency 712.598 ms
[21648] open ./clients/client10/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21436] open ./clients/client13/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
[21648] open ./clients/client16/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client8/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
[21648] open ./clients/client4/~dmtmp/PM/BDES1.PRN succeeded for handle 12627
  20     21474     6.12 MB/sec  execute 166 sec  latency 850.422 ms
[21436] open ./clients/client7/~dmtmp/COREL/GRAPH2.CDR succeeded for handle 21063
  20     21561     6.13 MB/sec  execute 167 sec  latency 205.801 ms
  20     21644     6.14 MB/sec  execute 168 sec  latency 312.451 ms
  20     21717     6.12 MB/sec  execute 169 sec  latency 94.683 ms
  20     21770     6.09 MB/sec  execute 170 sec  latency 178.992 ms
  20     21819     6.07 MB/sec  execute 171 sec  latency 171.873 ms
  20     21860     6.04 MB/sec  execute 172 sec  latency 338.510 ms
  20     21861     6.00 MB/sec  execute 173 sec  latency 1338.601 ms
  20     21872     5.97 MB/sec  execute 174 sec  latency 1463.730 ms
  20     21886     5.94 MB/sec  execute 175 sec  latency 1369.267 ms
[21886] unlink ./clients/client12/~dmtmp/COREL/GRAPH1.CDR failed (No such file or directory) - expected NT_STATUS_OK
ERROR: child 12 failed at line 21886
Child failed with status 1


Last time I tried, dbench failed as soon as the replicate brick went down. Now it's failing when it comes back online.

I have attached the client log.

--- Additional comment from pkarampu on 2012-02-24 03:44:48 EST ---

Is it happening on 3.2 branch?

--- Additional comment from msvbhat on 2012-02-24 04:05:07 EST ---

It happened on RHSSA 3.2.4. I haven't checked whether it's still happening recently.

--- Additional comment from vbellur on 2012-03-29 07:41:40 EDT ---

Can you please check if the problem persists on 3.3 now?

Comment 4 Sudhir D 2013-07-30 05:48:20 UTC
removed 2.1 as rdma is not slated for this release.

Comment 5 RHEL Program Management 2013-10-03 09:26:43 UTC
Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.