Description of problem: dbench on nfs mount with 100 sync-dirs fails Version-Release number of selected component (if applicable): server, glusterfs-server-3.4.0.1rhs-1.el6rhs.x86_64 client, Red Hat Enterprise Linux Server release 6.4 (Santiago) How reproducible: always Steps to Reproduce: 1. create a dist-rep volume; start volume 2. nfs mount 3. execute, time /opt/qa/tools/dbench -t 300 -c /opt/qa/tools/dbench-4.0/client.txt -s -S 100 Actual results: [root@rhsauto020 dir1]# time /opt/qa/tools/dbench -t 300 -c /opt/qa/tools/dbench-4.0/client.txt -s -S 100 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 300 seconds with load '/opt/qa/tools/dbench-4.0/client.txt' and minimum warmup 60 secs 99 of 100 processes prepared for launch 0 sec 100 of 100 processes prepared for launch 0 sec releasing clients 2 2 1.63 MB/sec warmup 1 sec latency 966.954 ms 6 3 2.09 MB/sec warmup 2 sec latency 1965.621 ms 8 5 3.27 MB/sec warmup 3 sec latency 2589.622 ms 15 7 4.58 MB/sec warmup 4 sec latency 3467.830 ms 19 9 5.43 MB/sec warmup 5 sec latency 4468.089 ms [3] open ./clients/client92 failed for handle 16385 (Stale file handle) (4) ERROR: handle 16385 was not found Child failed with status 1 real 0m5.919s user 0m0.007s sys 0m0.014s Expected results: it should pass, similar to sync-dir with "10". Additional info:
When I worked on this issue, I found that the NFS ACL seems to trigger this behaviour. mounting with "noacl" option, dbench runs fine with 100 clients. Upon further investigation, the client only does getacl calls if there are no acls involved. None of the getacl calls fail. Need further investigation to root cause this issue.
Per discussion w/ Vivek, works without ACL being on. Not for Big Bend as ACL is off by default. Targeting for Corbett.
Retargeting for 2.1.z U2 (Corbett) release.
Not much progress on this.
This turns out to be a cache inconsistency issue. When dbench forks around 100 threads and they do I/O in the same mount point, it causes the cache inconsistency in the client cache. The issue is seen while ACL is on. The issue does not happen while attr caching is OFF (the client mounts with -o noac option) consistently, though happens sometimes. Had a discussion with Amar, Vijay, Rajesh and Anand. md-cache xlator needs to be tried with NFS.
Debugging from the NFS-client should be a little easier with these patches: - http://thread.gmane.org/gmane.linux.nfs/60602 - https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=9557 1. rpcdebug should now show non-negative fileids, which we can relate to inode numbers and matching values in tcpdumps. 2. Wireshark can dissect the file-handles and display the exportid and gfid of the file. This makes it easier to match with the logs and xattrs on the Gluster side.
> Had a discussion with Amar, Vijay, Rajesh and Anand. md-cache xlator needs to > be tried with NFS. Also try with a simplistic volume file for nfs server... --------- volume posix ... volume locks .... subvolumes posix ... volume acl .... subvolumes locks ... volume nfs .... subvolumes acl ... ------------------- This should be able to reduce the scope of the problem...
With the FIX (one liner), dbench works :) [root@santosh-new-4 glnfs]# dbench -t 100 -c /usr/share/dbench/client.txt -s -S 100 dbench version 4.00 - Copyright Andrew Tridgell 1999-2004 Running for 100 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 20 secs 97 of 100 processes prepared for launch 0 sec 100 of 100 processes prepared for launch 0 sec releasing clients 5 4 2.14 MB/sec warmup 1 sec latency 943.834 ms 6 4 2.05 MB/sec warmup 2 sec latency 1944.099 ms 9 5 2.35 MB/sec warmup 3 sec latency 2944.345 ms 12 6 2.89 MB/sec warmup 4 sec latency 3944.598 ms 12 7 3.17 MB/sec warmup 5 sec latency 4944.898 ms 13 8 3.31 MB/sec warmup 6 sec latency 5945.175 ms 13 8 3.47 MB/sec warmup 7 sec latency 6945.456 ms 13 9 3.68 MB/sec warmup 8 sec latency 7945.694 ms 16 11 3.72 MB/sec warmup 9 sec latency 8945.970 ms 16 12 3.85 MB/sec warmup 10 sec latency 9946.206 ms 19 13 4.02 MB/sec warmup 11 sec latency 10946.396 ms 22 14 4.10 MB/sec warmup 12 sec latency 11946.668 ms 23 15 4.20 MB/sec warmup 13 sec latency 12946.913 ms 23 16 4.32 MB/sec warmup 14 sec latency 13947.194 ms 23 17 4.40 MB/sec warmup 15 sec latency 14947.377 ms 23 18 4.45 MB/sec warmup 16 sec latency 15947.615 ms 24 19 4.50 MB/sec warmup 17 sec latency 16947.917 ms 26 21 4.57 MB/sec warmup 18 sec latency 17945.881 ms 27 22 4.61 MB/sec warmup 19 sec latency 18946.216 ms 29 23 4.67 MB/sec warmup 20 sec latency 19946.452 ms 29 24 4.67 MB/sec warmup 21 sec latency 20951.320 ms 30 26 4.75 MB/sec warmup 22 sec latency 21951.590 ms 30 27 4.78 MB/sec warmup 23 sec latency 22951.791 ms 30 28 4.79 MB/sec warmup 24 sec latency 23952.079 ms 30 29 4.81 MB/sec warmup 25 sec latency 24952.294 ms 30 31 4.84 MB/sec warmup 26 sec latency 25952.572 ms 30 32 4.85 MB/sec warmup 27 sec latency 26952.807 ms 31 33 4.89 MB/sec warmup 28 sec latency 27953.068 ms 32 34 4.89 MB/sec warmup 29 sec latency 28953.359 ms 32 36 4.89 MB/sec warmup 30 sec latency 29953.638 ms 33 37 4.93 MB/sec warmup 31 sec latency 30953.887 ms 33 38 4.87 MB/sec warmup 32 sec latency 31954.179 ms 33 38 4.76 MB/sec warmup 33 sec latency 32954.437 ms 33 38 4.67 MB/sec warmup 34 sec latency 33954.714 ms 34 39 4.55 MB/sec warmup 35 sec latency 34954.956 ms 34 40 4.55 MB/sec warmup 36 sec latency 35955.198 ms 35 41 4.58 MB/sec warmup 37 sec latency 36955.422 ms 36 43 4.60 MB/sec warmup 38 sec latency 37943.523 ms 37 44 4.60 MB/sec warmup 39 sec latency 38943.766 ms 41 46 4.61 MB/sec warmup 40 sec latency 39944.044 ms 42 47 4.64 MB/sec warmup 41 sec latency 40944.285 ms 45 49 4.67 MB/sec warmup 42 sec latency 41944.593 ms 46 50 4.68 MB/sec warmup 43 sec latency 42944.812 ms 47 52 4.70 MB/sec warmup 44 sec latency 43945.077 ms 49 53 4.73 MB/sec warmup 45 sec latency 44945.281 ms 52 55 4.73 MB/sec warmup 46 sec latency 45945.514 ms 55 56 4.73 MB/sec warmup 47 sec latency 46945.739 ms 57 58 4.75 MB/sec warmup 48 sec latency 47943.350 ms 59 59 4.76 MB/sec warmup 49 sec latency 48943.710 ms 61 61 4.78 MB/sec warmup 50 sec latency 49943.972 ms 63 62 4.80 MB/sec warmup 51 sec latency 50944.207 ms 65 63 4.81 MB/sec warmup 52 sec latency 51944.459 ms 66 65 4.84 MB/sec warmup 53 sec latency 52944.747 ms 67 66 4.84 MB/sec warmup 54 sec latency 53944.987 ms 69 68 4.85 MB/sec warmup 55 sec latency 54945.253 ms 70 69 4.87 MB/sec warmup 56 sec latency 55945.468 ms 72 71 4.88 MB/sec warmup 57 sec latency 56945.826 ms 73 72 4.89 MB/sec warmup 58 sec latency 57946.099 ms 74 73 4.91 MB/sec warmup 59 sec latency 58946.339 ms 75 75 4.91 MB/sec warmup 60 sec latency 59946.562 ms 77 76 4.93 MB/sec warmup 61 sec latency 60946.858 ms 78 77 4.94 MB/sec warmup 62 sec latency 61947.008 ms 78 79 4.95 MB/sec warmup 63 sec latency 62942.688 ms 80 80 4.98 MB/sec warmup 64 sec latency 63942.903 ms 80 81 4.93 MB/sec warmup 65 sec latency 64943.118 ms 80 81 4.87 MB/sec warmup 66 sec latency 65943.310 ms 80 81 4.82 MB/sec warmup 67 sec latency 66943.521 ms 80 81 4.77 MB/sec warmup 68 sec latency 67943.837 ms 81 83 4.75 MB/sec warmup 69 sec latency 68944.162 ms 82 84 4.77 MB/sec warmup 70 sec latency 69944.404 ms 84 85 4.77 MB/sec warmup 71 sec latency 70944.667 ms 85 87 4.79 MB/sec warmup 72 sec latency 71892.717 ms 87 88 4.82 MB/sec warmup 73 sec latency 72893.035 ms 87 90 4.83 MB/sec warmup 74 sec latency 73882.992 ms 89 91 4.85 MB/sec warmup 75 sec latency 74841.458 ms 90 93 4.86 MB/sec warmup 76 sec latency 75821.931 ms 91 94 4.88 MB/sec warmup 77 sec latency 3773.437 ms 91 95 4.90 MB/sec warmup 78 sec latency 3807.140 ms 94 97 4.91 MB/sec warmup 79 sec latency 3964.437 ms 95 98 4.92 MB/sec warmup 80 sec latency 3398.932 ms 97 99 4.93 MB/sec warmup 81 sec latency 3031.466 ms 98 101 4.94 MB/sec warmup 82 sec latency 3399.057 ms 100 103 5.36 MB/sec execute 1 sec latency 3281.744 ms 100 105 5.56 MB/sec execute 2 sec latency 3411.132 ms 100 106 5.54 MB/sec execute 3 sec latency 3624.913 ms 100 107 5.59 MB/sec execute 4 sec latency 2417.211 ms 100 108 5.60 MB/sec execute 5 sec latency 2143.010 ms 100 109 5.62 MB/sec execute 6 sec latency 2433.499 ms 100 110 5.54 MB/sec execute 7 sec latency 3433.794 ms 100 112 5.56 MB/sec execute 8 sec latency 3560.078 ms 100 113 5.59 MB/sec execute 9 sec latency 3795.081 ms 100 114 5.56 MB/sec execute 10 sec latency 3094.726 ms 100 115 5.51 MB/sec execute 11 sec latency 2665.034 ms 100 117 5.56 MB/sec execute 12 sec latency 2915.161 ms 100 118 5.59 MB/sec execute 13 sec latency 2550.801 ms 100 119 5.60 MB/sec execute 14 sec latency 2783.397 ms 100 120 5.48 MB/sec execute 15 sec latency 2934.023 ms 100 120 5.19 MB/sec execute 16 sec latency 3934.336 ms 100 121 4.96 MB/sec execute 17 sec latency 4679.548 ms 100 122 4.89 MB/sec execute 18 sec latency 4422.592 ms 100 122 4.73 MB/sec execute 19 sec latency 5422.896 ms 100 123 4.56 MB/sec execute 20 sec latency 6107.067 ms 100 124 4.62 MB/sec execute 21 sec latency 6563.886 ms 100 125 4.71 MB/sec execute 22 sec latency 6914.216 ms 100 126 4.76 MB/sec execute 23 sec latency 5189.257 ms 100 128 4.80 MB/sec execute 24 sec latency 2029.798 ms 100 129 4.80 MB/sec execute 25 sec latency 2459.675 ms 100 130 4.82 MB/sec execute 26 sec latency 3134.820 ms 100 132 4.85 MB/sec execute 27 sec latency 3289.884 ms 100 133 4.88 MB/sec execute 28 sec latency 2680.891 ms 100 134 4.90 MB/sec execute 29 sec latency 3525.660 ms 100 135 4.91 MB/sec execute 30 sec latency 3677.829 ms 100 136 4.93 MB/sec execute 31 sec latency 3858.502 ms 100 138 4.95 MB/sec execute 32 sec latency 3340.983 ms 100 139 4.95 MB/sec execute 33 sec latency 3048.761 ms 100 141 4.96 MB/sec execute 34 sec latency 3221.823 ms 100 142 4.98 MB/sec execute 35 sec latency 3047.699 ms 100 143 5.01 MB/sec execute 36 sec latency 3194.896 ms 100 145 5.04 MB/sec execute 37 sec latency 3438.738 ms 100 146 5.05 MB/sec execute 38 sec latency 4106.409 ms 100 148 5.07 MB/sec execute 39 sec latency 3331.490 ms 100 149 5.07 MB/sec execute 40 sec latency 2386.135 ms 100 150 5.08 MB/sec execute 41 sec latency 2487.593 ms 100 152 5.10 MB/sec execute 42 sec latency 3018.656 ms 100 153 5.11 MB/sec execute 43 sec latency 3273.131 ms 100 155 5.12 MB/sec execute 44 sec latency 3254.048 ms 100 156 5.10 MB/sec execute 45 sec latency 2798.498 ms 100 157 5.11 MB/sec execute 46 sec latency 3299.004 ms 100 159 5.14 MB/sec execute 47 sec latency 2989.024 ms 100 160 5.13 MB/sec execute 48 sec latency 3087.011 ms 100 162 5.12 MB/sec execute 49 sec latency 2863.351 ms 100 163 5.08 MB/sec execute 50 sec latency 2771.840 ms 100 163 5.00 MB/sec execute 51 sec latency 3499.051 ms 100 163 4.92 MB/sec execute 52 sec latency 4081.201 ms 100 164 4.86 MB/sec execute 53 sec latency 5081.434 ms 100 164 4.83 MB/sec execute 54 sec latency 5878.798 ms 100 166 4.87 MB/sec execute 55 sec latency 5406.963 ms 100 168 4.89 MB/sec execute 56 sec latency 2013.503 ms 100 170 4.89 MB/sec execute 57 sec latency 2094.650 ms 100 171 4.91 MB/sec execute 58 sec latency 2142.667 ms 100 173 4.94 MB/sec execute 59 sec latency 2037.227 ms 100 174 4.96 MB/sec execute 60 sec latency 2637.869 ms 100 177 4.98 MB/sec execute 61 sec latency 2496.679 ms 100 178 4.99 MB/sec execute 62 sec latency 1758.958 ms 100 180 5.01 MB/sec execute 63 sec latency 2221.876 ms 100 182 5.04 MB/sec execute 64 sec latency 2227.705 ms 100 184 5.06 MB/sec execute 65 sec latency 2155.579 ms 100 186 5.09 MB/sec execute 66 sec latency 1973.997 ms 100 188 5.10 MB/sec execute 67 sec latency 1849.199 ms 100 189 5.12 MB/sec execute 68 sec latency 2303.635 ms 100 191 5.13 MB/sec execute 69 sec latency 2261.626 ms 100 193 5.12 MB/sec execute 70 sec latency 1784.288 ms 100 195 5.14 MB/sec execute 71 sec latency 2003.458 ms 100 197 5.17 MB/sec execute 72 sec latency 2333.693 ms 100 199 5.18 MB/sec execute 73 sec latency 2022.409 ms 100 201 5.20 MB/sec execute 74 sec latency 1840.424 ms 100 203 5.21 MB/sec execute 75 sec latency 1757.040 ms 100 205 5.22 MB/sec execute 76 sec latency 1563.506 ms 100 207 5.24 MB/sec execute 77 sec latency 1728.521 ms 100 209 5.26 MB/sec execute 78 sec latency 1982.083 ms 100 211 5.29 MB/sec execute 79 sec latency 2136.528 ms 100 214 5.29 MB/sec execute 80 sec latency 2342.328 ms 100 216 5.32 MB/sec execute 81 sec latency 1591.293 ms 100 218 5.33 MB/sec execute 82 sec latency 1973.839 ms 100 221 5.34 MB/sec execute 83 sec latency 1713.779 ms 100 221 5.30 MB/sec execute 84 sec latency 2001.131 ms 100 221 5.26 MB/sec execute 85 sec latency 2947.984 ms 100 222 5.21 MB/sec execute 86 sec latency 3866.077 ms 100 223 5.18 MB/sec execute 87 sec latency 4443.496 ms 100 224 5.16 MB/sec execute 88 sec latency 4758.341 ms 100 225 5.15 MB/sec execute 89 sec latency 4807.897 ms 100 226 5.11 MB/sec execute 90 sec latency 4500.059 ms 100 226 5.06 MB/sec execute 91 sec latency 3408.592 ms 100 227 5.03 MB/sec execute 92 sec latency 4044.082 ms 100 229 5.04 MB/sec execute 93 sec latency 3970.390 ms 100 231 5.05 MB/sec execute 94 sec latency 1987.122 ms 100 233 5.06 MB/sec execute 95 sec latency 2100.590 ms 100 235 5.05 MB/sec execute 96 sec latency 1786.842 ms 100 236 5.08 MB/sec execute 97 sec latency 1562.301 ms 100 239 5.09 MB/sec execute 98 sec latency 1522.411 ms 100 241 5.11 MB/sec execute 99 sec latency 1608.492 ms 100 cleanup 100 sec 93 cleanup 101 sec 0 cleanup 102 sec Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 1753 1569.071 6914.205 Close 1731 1.870 829.754 Qfileinfo 1396 0.107 138.307 WriteX 9313 786.827 3864.266 Throughput 5.11019 MB/sec (sync open) (sync dirs) 100 clients 100 procs max_latency=6914.216 ms Fix is posted upstream: http://review.gluster.org/#/c/6523/
Code is posted downstream for review: https://code.engineering.redhat.com/gerrit/#/c/17592/
The root cause is same as that of BZ 965400.
verified on glusterfs-3.4.0.52rhs-1.el6rhs
The doc text looks good to me.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html