Bug 1439250 - [Parallel Readdir] : Reads fail during dbench
Summary: [Parallel Readdir] : Reads fail during dbench
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Poornima G
QA Contact: Ambarish
URL:
Whiteboard:
Depends On:
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-04-05 13:55 UTC by Ambarish
Modified: 2017-09-21 04:37 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-22
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:37:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Ambarish 2017-04-05 13:55:29 UTC
Description of problem:
-----------------------

EC  4+2 , parallel readdir enabled,FUSE mounts n 6 clients.

dbench fails on all of my clients with the following errors :

Running for 300 seconds with load '/opt/qa/tools/client.txt' and minimum warmup 60 secs
7 of 10 processes prepared for launch   0 sec
10 of 10 processes prepared for launch   0 sec
releasing clients
  10       205    80.51 MB/sec  warmup   1 sec  latency 60.611 ms
  10       430    75.82 MB/sec  warmup   2 sec  latency 71.439 ms
  10       655    70.36 MB/sec  warmup   3 sec  latency 73.005 ms
  10       828    53.81 MB/sec  warmup   4 sec  latency 148.088 ms
  10       968    44.04 MB/sec  warmup   5 sec  latency 105.164 ms
[1137] read failed on handle 10106 (No such file or directory)
  10      1107    38.21 MB/sec  warmup   6 sec  latency 267.752 ms
[1234] read failed on handle 10126 (No such file or directory)
[1137] read failed on handle 10106 (No such file or directory)
[1137] read failed on handle 10106 (No such file or directory)
[1137] read failed on handle 10106 (No such file or directory)
[1268] open ./clients/client8/~dmtmp/WORD/~WRL0004.TMP failed for handle 10133 (No such file or directory)
(1269) ERROR: handle 10133 was not found
Child failed with status 1
[root@gqac015 /]# 
[root@gqac015 /]# 



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-20

How reproducible:
-----------------

100%


Actual results:
----------------

dbench fails

Expected results:
-----------------

dbench should pass.

Additional info:
----------------

[root@gqas013 ~]# gluster v info
 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 55902003-7ea9-4f58-987d-63c6c759a385
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick5: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick6: gqas015.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick8: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick10: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick11: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick12: gqas015.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick14: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick15: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick16: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick17: gqas014.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick18: gqas015.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick19: gqas013.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick20: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick21: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick22: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick23: gqas014.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick24: gqas015.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick25: gqas013.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick26: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick27: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick28: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick29: gqas014.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick30: gqas015.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick31: gqas013.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick32: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick33: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick34: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick35: gqas014.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick36: gqas015.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick37: gqas013.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick38: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick39: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick40: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick41: gqas014.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick42: gqas015.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick43: gqas013.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick44: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick45: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick46: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick47: gqas014.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick48: gqas015.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick49: gqas013.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick50: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick51: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick52: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick53: gqas014.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick54: gqas015.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick55: gqas013.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick56: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick57: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick58: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick59: gqas014.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick60: gqas015.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick61: gqas013.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick62: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick63: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick64: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick65: gqas014.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick66: gqas015.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick67: gqas013.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick68: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick69: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick70: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick71: gqas014.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick72: gqas015.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
performance.quick-read: on
performance.io-cache: on
performance.parallel-readdir: on
transport.address-family: inet
nfs.disable: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50000
cluster.lookup-optimize: on
server.event-threads: 4
client.event-threads: 4
[root@gqas013 ~]#

Comment 2 Ambarish 2017-04-05 13:56:02 UTC
I set parallel readir to off and got a clean run.

Comment 5 Poornima G 2017-04-26 10:06:51 UTC
Can you please recheck this test case with the latest build, as quite a fixes in this area is gone in?

Comment 6 Ambarish 2017-05-04 06:07:52 UTC
Multiple iterations of the test passed on 3.8.4-24 :


<snip>

===========================TESTS RUNNING===========================
Changing to the specified mountpoint
/A/run24200
executing dbench
start:11:28:10



real	6m1.564s
user	0m2.778s
sys	0m10.788s
end:11:34:12
removed clients
1
Total 1 tests were successful
Switching over to the previous working directory
Removing /A/run24200/
[root@dhcp35-126 ~]# 
[root@dhcp35-126 ~]# 



AND

===========================TESTS RUNNING===========================
Changing to the specified mountpoint
/B/run12998
executing dbench
start:11:27:37



real	6m1.372s
user	0m2.826s
sys	0m11.331s
end:11:33:39
removed clients
1
Total 1 tests were successful
Switching over to the previous working directory
Removing /B/run12998/
[root@dhcp35-103 ~]#

Comment 10 Ambarish 2017-05-06 04:12:04 UTC
Verified on 3.8.4-24.

Comment 12 errata-xmlrpc 2017-09-21 04:37:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.