Bug 956977 - nfs: dbench with 100 dirs fails
Summary: nfs: dbench with 100 dirs fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: RHGS 2.1.2
Assignee: santosh pradhan
QA Contact: Saurabh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-26 06:39 UTC by Saurabh
Modified: 2016-01-19 06:11 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.4.0.52rhs-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Previously, when ACL was ON, the GETACL call would invalidate the FSID causing a dbench operation with 100 threads to FAIL. With the fix, the GETACL call responds with a valid FSID and it works as expected.
Clone Of:
Environment:
Last Closed: 2014-02-25 07:28:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC

Description Saurabh 2013-04-26 06:39:09 UTC
Description of problem:

dbench on nfs mount with 100 sync-dirs fails

Version-Release number of selected component (if applicable):
server,
glusterfs-server-3.4.0.1rhs-1.el6rhs.x86_64

client,
Red Hat Enterprise Linux Server release 6.4 (Santiago)

How reproducible:
always

Steps to Reproduce:
1. create a dist-rep volume; start volume
2. nfs mount
3. execute,
 time /opt/qa/tools/dbench -t 300 -c /opt/qa/tools/dbench-4.0/client.txt -s -S 100


Actual results:
[root@rhsauto020 dir1]# time /opt/qa/tools/dbench -t 300 -c /opt/qa/tools/dbench-4.0/client.txt -s -S 100
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 300 seconds with load '/opt/qa/tools/dbench-4.0/client.txt' and minimum warmup 60 secs
99 of 100 processes prepared for launch   0 sec
100 of 100 processes prepared for launch   0 sec
releasing clients
   2         2     1.63 MB/sec  warmup   1 sec  latency 966.954 ms
   6         3     2.09 MB/sec  warmup   2 sec  latency 1965.621 ms
   8         5     3.27 MB/sec  warmup   3 sec  latency 2589.622 ms
  15         7     4.58 MB/sec  warmup   4 sec  latency 3467.830 ms
  19         9     5.43 MB/sec  warmup   5 sec  latency 4468.089 ms
[3] open ./clients/client92 failed for handle 16385 (Stale file handle)
(4) ERROR: handle 16385 was not found
Child failed with status 1

real    0m5.919s
user    0m0.007s
sys     0m0.014s


Expected results:
it should pass, similar to sync-dir with "10".

Additional info:

Comment 2 Rajesh 2013-06-21 11:10:13 UTC
When I worked on this issue, I found that the NFS ACL seems to trigger this behaviour. mounting with "noacl" option, dbench runs fine with 100 clients.
Upon further investigation, the client only does getacl calls if there are no acls involved. None of the getacl calls fail. Need further investigation to root cause this issue.

Comment 3 Scott Haines 2013-07-24 07:10:56 UTC
Per discussion w/ Vivek, works without ACL being on.  Not for Big Bend as ACL is off by default.  Targeting for Corbett.

Comment 5 Scott Haines 2013-09-23 23:18:58 UTC
Retargeting for 2.1.z U2 (Corbett) release.

Comment 6 santosh pradhan 2013-12-06 17:39:13 UTC
Not much progress on this.

Comment 8 santosh pradhan 2013-12-14 13:18:56 UTC
This turns out to be a cache inconsistency issue. When dbench forks around 100 threads and they do I/O in the same mount point, it causes the cache inconsistency in the client cache. The issue is seen while ACL is on. The issue does not happen while attr caching is OFF (the client mounts with -o noac option) consistently, though happens sometimes. 

Had a discussion with Amar, Vijay, Rajesh and Anand. md-cache xlator needs to be tried with NFS.

Comment 9 Niels de Vos 2013-12-14 19:39:34 UTC
Debugging from the NFS-client should be a little easier with these patches:
- http://thread.gmane.org/gmane.linux.nfs/60602
- https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=9557

1. rpcdebug should now show non-negative fileids, which we can relate to inode
   numbers and matching values in tcpdumps.
2. Wireshark can dissect the file-handles and display the exportid and gfid of
   the file. This makes it easier to match with the logs and xattrs on the
   Gluster side.

Comment 10 Amar Tumballi 2013-12-16 04:44:54 UTC
> Had a discussion with Amar, Vijay, Rajesh and Anand. md-cache xlator needs to 
> be tried with NFS.

Also try with a simplistic volume file for nfs server...

---------
volume posix
...

volume locks
....
 subvolumes posix
...

volume acl
....
 subvolumes locks
...

volume nfs
....
 subvolumes acl
...

-------------------

This should be able to reduce the scope of the problem...

Comment 11 santosh pradhan 2013-12-17 09:12:15 UTC
With the FIX (one liner), dbench works :) 

[root@santosh-new-4 glnfs]# dbench -t 100 -c /usr/share/dbench/client.txt -s -S 100
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 100 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 20 secs
97 of 100 processes prepared for launch   0 sec
100 of 100 processes prepared for launch   0 sec
releasing clients
   5         4     2.14 MB/sec  warmup   1 sec  latency 943.834 ms
   6         4     2.05 MB/sec  warmup   2 sec  latency 1944.099 ms
   9         5     2.35 MB/sec  warmup   3 sec  latency 2944.345 ms
  12         6     2.89 MB/sec  warmup   4 sec  latency 3944.598 ms
  12         7     3.17 MB/sec  warmup   5 sec  latency 4944.898 ms
  13         8     3.31 MB/sec  warmup   6 sec  latency 5945.175 ms
  13         8     3.47 MB/sec  warmup   7 sec  latency 6945.456 ms
  13         9     3.68 MB/sec  warmup   8 sec  latency 7945.694 ms
  16        11     3.72 MB/sec  warmup   9 sec  latency 8945.970 ms
  16        12     3.85 MB/sec  warmup  10 sec  latency 9946.206 ms
  19        13     4.02 MB/sec  warmup  11 sec  latency 10946.396 ms
  22        14     4.10 MB/sec  warmup  12 sec  latency 11946.668 ms
  23        15     4.20 MB/sec  warmup  13 sec  latency 12946.913 ms
  23        16     4.32 MB/sec  warmup  14 sec  latency 13947.194 ms
  23        17     4.40 MB/sec  warmup  15 sec  latency 14947.377 ms
  23        18     4.45 MB/sec  warmup  16 sec  latency 15947.615 ms
  24        19     4.50 MB/sec  warmup  17 sec  latency 16947.917 ms
  26        21     4.57 MB/sec  warmup  18 sec  latency 17945.881 ms
  27        22     4.61 MB/sec  warmup  19 sec  latency 18946.216 ms
  29        23     4.67 MB/sec  warmup  20 sec  latency 19946.452 ms
  29        24     4.67 MB/sec  warmup  21 sec  latency 20951.320 ms
  30        26     4.75 MB/sec  warmup  22 sec  latency 21951.590 ms
  30        27     4.78 MB/sec  warmup  23 sec  latency 22951.791 ms
  30        28     4.79 MB/sec  warmup  24 sec  latency 23952.079 ms
  30        29     4.81 MB/sec  warmup  25 sec  latency 24952.294 ms
  30        31     4.84 MB/sec  warmup  26 sec  latency 25952.572 ms
  30        32     4.85 MB/sec  warmup  27 sec  latency 26952.807 ms
  31        33     4.89 MB/sec  warmup  28 sec  latency 27953.068 ms
  32        34     4.89 MB/sec  warmup  29 sec  latency 28953.359 ms
  32        36     4.89 MB/sec  warmup  30 sec  latency 29953.638 ms
  33        37     4.93 MB/sec  warmup  31 sec  latency 30953.887 ms
  33        38     4.87 MB/sec  warmup  32 sec  latency 31954.179 ms
  33        38     4.76 MB/sec  warmup  33 sec  latency 32954.437 ms
  33        38     4.67 MB/sec  warmup  34 sec  latency 33954.714 ms
  34        39     4.55 MB/sec  warmup  35 sec  latency 34954.956 ms
  34        40     4.55 MB/sec  warmup  36 sec  latency 35955.198 ms
  35        41     4.58 MB/sec  warmup  37 sec  latency 36955.422 ms
  36        43     4.60 MB/sec  warmup  38 sec  latency 37943.523 ms
  37        44     4.60 MB/sec  warmup  39 sec  latency 38943.766 ms
  41        46     4.61 MB/sec  warmup  40 sec  latency 39944.044 ms
  42        47     4.64 MB/sec  warmup  41 sec  latency 40944.285 ms
  45        49     4.67 MB/sec  warmup  42 sec  latency 41944.593 ms
  46        50     4.68 MB/sec  warmup  43 sec  latency 42944.812 ms
  47        52     4.70 MB/sec  warmup  44 sec  latency 43945.077 ms
  49        53     4.73 MB/sec  warmup  45 sec  latency 44945.281 ms
  52        55     4.73 MB/sec  warmup  46 sec  latency 45945.514 ms
  55        56     4.73 MB/sec  warmup  47 sec  latency 46945.739 ms
  57        58     4.75 MB/sec  warmup  48 sec  latency 47943.350 ms
  59        59     4.76 MB/sec  warmup  49 sec  latency 48943.710 ms
  61        61     4.78 MB/sec  warmup  50 sec  latency 49943.972 ms
  63        62     4.80 MB/sec  warmup  51 sec  latency 50944.207 ms
  65        63     4.81 MB/sec  warmup  52 sec  latency 51944.459 ms
  66        65     4.84 MB/sec  warmup  53 sec  latency 52944.747 ms
  67        66     4.84 MB/sec  warmup  54 sec  latency 53944.987 ms
  69        68     4.85 MB/sec  warmup  55 sec  latency 54945.253 ms
  70        69     4.87 MB/sec  warmup  56 sec  latency 55945.468 ms
  72        71     4.88 MB/sec  warmup  57 sec  latency 56945.826 ms
  73        72     4.89 MB/sec  warmup  58 sec  latency 57946.099 ms
  74        73     4.91 MB/sec  warmup  59 sec  latency 58946.339 ms
  75        75     4.91 MB/sec  warmup  60 sec  latency 59946.562 ms
  77        76     4.93 MB/sec  warmup  61 sec  latency 60946.858 ms
  78        77     4.94 MB/sec  warmup  62 sec  latency 61947.008 ms
  78        79     4.95 MB/sec  warmup  63 sec  latency 62942.688 ms
  80        80     4.98 MB/sec  warmup  64 sec  latency 63942.903 ms
  80        81     4.93 MB/sec  warmup  65 sec  latency 64943.118 ms
  80        81     4.87 MB/sec  warmup  66 sec  latency 65943.310 ms
  80        81     4.82 MB/sec  warmup  67 sec  latency 66943.521 ms
  80        81     4.77 MB/sec  warmup  68 sec  latency 67943.837 ms
  81        83     4.75 MB/sec  warmup  69 sec  latency 68944.162 ms
  82        84     4.77 MB/sec  warmup  70 sec  latency 69944.404 ms
  84        85     4.77 MB/sec  warmup  71 sec  latency 70944.667 ms
  85        87     4.79 MB/sec  warmup  72 sec  latency 71892.717 ms
  87        88     4.82 MB/sec  warmup  73 sec  latency 72893.035 ms
  87        90     4.83 MB/sec  warmup  74 sec  latency 73882.992 ms
  89        91     4.85 MB/sec  warmup  75 sec  latency 74841.458 ms
  90        93     4.86 MB/sec  warmup  76 sec  latency 75821.931 ms
  91        94     4.88 MB/sec  warmup  77 sec  latency 3773.437 ms
  91        95     4.90 MB/sec  warmup  78 sec  latency 3807.140 ms
  94        97     4.91 MB/sec  warmup  79 sec  latency 3964.437 ms
  95        98     4.92 MB/sec  warmup  80 sec  latency 3398.932 ms
  97        99     4.93 MB/sec  warmup  81 sec  latency 3031.466 ms
  98       101     4.94 MB/sec  warmup  82 sec  latency 3399.057 ms
 100       103     5.36 MB/sec  execute   1 sec  latency 3281.744 ms
 100       105     5.56 MB/sec  execute   2 sec  latency 3411.132 ms
 100       106     5.54 MB/sec  execute   3 sec  latency 3624.913 ms
 100       107     5.59 MB/sec  execute   4 sec  latency 2417.211 ms
 100       108     5.60 MB/sec  execute   5 sec  latency 2143.010 ms
 100       109     5.62 MB/sec  execute   6 sec  latency 2433.499 ms
 100       110     5.54 MB/sec  execute   7 sec  latency 3433.794 ms
 100       112     5.56 MB/sec  execute   8 sec  latency 3560.078 ms
 100       113     5.59 MB/sec  execute   9 sec  latency 3795.081 ms
 100       114     5.56 MB/sec  execute  10 sec  latency 3094.726 ms
 100       115     5.51 MB/sec  execute  11 sec  latency 2665.034 ms
 100       117     5.56 MB/sec  execute  12 sec  latency 2915.161 ms
 100       118     5.59 MB/sec  execute  13 sec  latency 2550.801 ms
 100       119     5.60 MB/sec  execute  14 sec  latency 2783.397 ms
 100       120     5.48 MB/sec  execute  15 sec  latency 2934.023 ms
 100       120     5.19 MB/sec  execute  16 sec  latency 3934.336 ms
 100       121     4.96 MB/sec  execute  17 sec  latency 4679.548 ms
 100       122     4.89 MB/sec  execute  18 sec  latency 4422.592 ms
 100       122     4.73 MB/sec  execute  19 sec  latency 5422.896 ms
 100       123     4.56 MB/sec  execute  20 sec  latency 6107.067 ms
 100       124     4.62 MB/sec  execute  21 sec  latency 6563.886 ms
 100       125     4.71 MB/sec  execute  22 sec  latency 6914.216 ms
 100       126     4.76 MB/sec  execute  23 sec  latency 5189.257 ms
 100       128     4.80 MB/sec  execute  24 sec  latency 2029.798 ms
 100       129     4.80 MB/sec  execute  25 sec  latency 2459.675 ms
 100       130     4.82 MB/sec  execute  26 sec  latency 3134.820 ms
 100       132     4.85 MB/sec  execute  27 sec  latency 3289.884 ms
 100       133     4.88 MB/sec  execute  28 sec  latency 2680.891 ms
 100       134     4.90 MB/sec  execute  29 sec  latency 3525.660 ms
 100       135     4.91 MB/sec  execute  30 sec  latency 3677.829 ms
 100       136     4.93 MB/sec  execute  31 sec  latency 3858.502 ms
 100       138     4.95 MB/sec  execute  32 sec  latency 3340.983 ms
 100       139     4.95 MB/sec  execute  33 sec  latency 3048.761 ms
 100       141     4.96 MB/sec  execute  34 sec  latency 3221.823 ms
 100       142     4.98 MB/sec  execute  35 sec  latency 3047.699 ms
 100       143     5.01 MB/sec  execute  36 sec  latency 3194.896 ms
 100       145     5.04 MB/sec  execute  37 sec  latency 3438.738 ms
 100       146     5.05 MB/sec  execute  38 sec  latency 4106.409 ms
 100       148     5.07 MB/sec  execute  39 sec  latency 3331.490 ms
 100       149     5.07 MB/sec  execute  40 sec  latency 2386.135 ms
 100       150     5.08 MB/sec  execute  41 sec  latency 2487.593 ms
 100       152     5.10 MB/sec  execute  42 sec  latency 3018.656 ms
 100       153     5.11 MB/sec  execute  43 sec  latency 3273.131 ms
 100       155     5.12 MB/sec  execute  44 sec  latency 3254.048 ms
 100       156     5.10 MB/sec  execute  45 sec  latency 2798.498 ms
 100       157     5.11 MB/sec  execute  46 sec  latency 3299.004 ms
 100       159     5.14 MB/sec  execute  47 sec  latency 2989.024 ms
 100       160     5.13 MB/sec  execute  48 sec  latency 3087.011 ms
 100       162     5.12 MB/sec  execute  49 sec  latency 2863.351 ms
 100       163     5.08 MB/sec  execute  50 sec  latency 2771.840 ms
 100       163     5.00 MB/sec  execute  51 sec  latency 3499.051 ms
 100       163     4.92 MB/sec  execute  52 sec  latency 4081.201 ms
 100       164     4.86 MB/sec  execute  53 sec  latency 5081.434 ms
 100       164     4.83 MB/sec  execute  54 sec  latency 5878.798 ms
 100       166     4.87 MB/sec  execute  55 sec  latency 5406.963 ms
 100       168     4.89 MB/sec  execute  56 sec  latency 2013.503 ms
 100       170     4.89 MB/sec  execute  57 sec  latency 2094.650 ms
 100       171     4.91 MB/sec  execute  58 sec  latency 2142.667 ms
 100       173     4.94 MB/sec  execute  59 sec  latency 2037.227 ms
 100       174     4.96 MB/sec  execute  60 sec  latency 2637.869 ms
 100       177     4.98 MB/sec  execute  61 sec  latency 2496.679 ms
 100       178     4.99 MB/sec  execute  62 sec  latency 1758.958 ms
 100       180     5.01 MB/sec  execute  63 sec  latency 2221.876 ms
 100       182     5.04 MB/sec  execute  64 sec  latency 2227.705 ms
 100       184     5.06 MB/sec  execute  65 sec  latency 2155.579 ms
 100       186     5.09 MB/sec  execute  66 sec  latency 1973.997 ms
 100       188     5.10 MB/sec  execute  67 sec  latency 1849.199 ms
 100       189     5.12 MB/sec  execute  68 sec  latency 2303.635 ms
 100       191     5.13 MB/sec  execute  69 sec  latency 2261.626 ms
 100       193     5.12 MB/sec  execute  70 sec  latency 1784.288 ms
 100       195     5.14 MB/sec  execute  71 sec  latency 2003.458 ms
 100       197     5.17 MB/sec  execute  72 sec  latency 2333.693 ms
 100       199     5.18 MB/sec  execute  73 sec  latency 2022.409 ms
 100       201     5.20 MB/sec  execute  74 sec  latency 1840.424 ms
 100       203     5.21 MB/sec  execute  75 sec  latency 1757.040 ms
 100       205     5.22 MB/sec  execute  76 sec  latency 1563.506 ms
 100       207     5.24 MB/sec  execute  77 sec  latency 1728.521 ms
 100       209     5.26 MB/sec  execute  78 sec  latency 1982.083 ms
 100       211     5.29 MB/sec  execute  79 sec  latency 2136.528 ms
 100       214     5.29 MB/sec  execute  80 sec  latency 2342.328 ms
 100       216     5.32 MB/sec  execute  81 sec  latency 1591.293 ms
 100       218     5.33 MB/sec  execute  82 sec  latency 1973.839 ms
 100       221     5.34 MB/sec  execute  83 sec  latency 1713.779 ms
 100       221     5.30 MB/sec  execute  84 sec  latency 2001.131 ms
 100       221     5.26 MB/sec  execute  85 sec  latency 2947.984 ms
 100       222     5.21 MB/sec  execute  86 sec  latency 3866.077 ms
 100       223     5.18 MB/sec  execute  87 sec  latency 4443.496 ms
 100       224     5.16 MB/sec  execute  88 sec  latency 4758.341 ms
 100       225     5.15 MB/sec  execute  89 sec  latency 4807.897 ms
 100       226     5.11 MB/sec  execute  90 sec  latency 4500.059 ms
 100       226     5.06 MB/sec  execute  91 sec  latency 3408.592 ms
 100       227     5.03 MB/sec  execute  92 sec  latency 4044.082 ms
 100       229     5.04 MB/sec  execute  93 sec  latency 3970.390 ms
 100       231     5.05 MB/sec  execute  94 sec  latency 1987.122 ms
 100       233     5.06 MB/sec  execute  95 sec  latency 2100.590 ms
 100       235     5.05 MB/sec  execute  96 sec  latency 1786.842 ms
 100       236     5.08 MB/sec  execute  97 sec  latency 1562.301 ms
 100       239     5.09 MB/sec  execute  98 sec  latency 1522.411 ms
 100       241     5.11 MB/sec  execute  99 sec  latency 1608.492 ms
 100  cleanup 100 sec
  93  cleanup 101 sec
   0  cleanup 102 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX       1753  1569.071  6914.205
 Close           1731     1.870   829.754
 Qfileinfo       1396     0.107   138.307
 WriteX          9313   786.827  3864.266

Throughput 5.11019 MB/sec (sync open) (sync dirs)  100 clients  100 procs  max_latency=6914.216 ms


Fix is posted upstream: http://review.gluster.org/#/c/6523/

Comment 12 santosh pradhan 2013-12-17 16:55:23 UTC
Code is posted downstream for review:

https://code.engineering.redhat.com/gerrit/#/c/17592/

Comment 13 santosh pradhan 2013-12-17 17:09:04 UTC
The root cause is same as that of BZ 965400.

Comment 14 Saurabh 2013-12-24 06:53:19 UTC
verified on glusterfs-3.4.0.52rhs-1.el6rhs

Comment 15 santosh pradhan 2014-01-28 17:07:12 UTC
The doc text looks good to me.

Comment 17 errata-xmlrpc 2014-02-25 07:28:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.