Description of problem: We see a large file performance regression between 3.4.0.57rhs-1.el6rhs and 3.4.0.58rhs-1.el6rhs Version-Release number of selected component (if applicable): 3.4.0.58rhs-1.el6rhs How reproducible: Consistently Steps to Reproduce: 1. Create 2x2 Distributed-Replicate volume and mount 2 fuse clients 2. Run iozone in clustered mode with the following optionsL: -w -c -e -i 0 -+n -r 64k -s 10g -t 8 3. Run regression script (Attached) python is-regression-v2.py throughput 95 10 10 baseline_sample test_sample Actual results: Regression in large file writes Expected results: Additional info: # ./calc_avg 69 71 run69 - glusterfs - 3.4.0.57rhs-1.el6rhs - IOZONE - [-w -c -e -i 0 -+n -r 64k -s 10g -t 8] - distrep - (quota off, gsync off) run71 - glusterfs - 3.4.0.58rhs-1.el6rhs - IOZONE - [-w -c -e -i 0 -+n -r 64k -s 10g -t 8] - distrep - (quota off, gsync off) Operations RUN69 RUN71 ------------------------- ------- ------- write 113783 97224 read 180242 175290 ======= Throughput write ========= decision parameters: sample type = throughput confidence threshold = 95.00 % max. pct. deviation = 10.00 % regression threshold = 10.00 % sample stats for baseline: min = 112456.210000 max = 115952.230000 mean = 113783.530000 sd = 1893.800198 pct.dev. = 1.66 % sample stats for current: min = 97048.210000 max = 97358.390000 mean = 97224.083333 sd = 159.212904 pct.dev. = 0.16 % CHANGE -14.55 percent magnitude of change is at least 12.89% /usr/lib64/python2.6/site-packages/scipy/stats/stats.py:420: DeprecationWarning: scipy.stats.mean is deprecated; please update your code to use numpy.mean. Please note that: - numpy.mean axis argument defaults to None, not 0 - numpy.mean has a ddof argument to replace bias in a more general manner. scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x, axis=0, ddof=1). axis=0, ddof=1).""", DeprecationWarning) t-test t-statistic = 15.091865 probability = 0.000112 t-test says that mean of two sample sets differs with probability 99.99% probability that sample sets have same mean = 0.0001 declaring a performance regression test FAILURE because of lower throughput RESULT:10 ======= Throughput read ========= decision parameters: sample type = throughput confidence threshold = 95.00 % max. pct. deviation = 10.00 % regression threshold = 10.00 % sample stats for baseline: min = 179548.590000 max = 181565.140000 mean = 180242.366667 sd = 1146.013124 pct.dev. = 0.64 % sample stats for current: min = 174850.550000 max = 176092.530000 mean = 175290.820000 sd = 695.419108 pct.dev. = 0.40 % CHANGE -2.75 percent magnitude of change is at least 2.11% /usr/lib64/python2.6/site-packages/scipy/stats/stats.py:420: DeprecationWarning: scipy.stats.mean is deprecated; please update your code to use numpy.mean. Please note that: - numpy.mean axis argument defaults to None, not 0 - numpy.mean has a ddof argument to replace bias in a more general manner. scipy.stats.mean(a, bias=True) can be replaced by numpy.mean(x, axis=0, ddof=1). axis=0, ddof=1).""", DeprecationWarning) t-test t-statistic = 6.397835 probability = 0.003065 t-test says that mean of two sample sets differs with probability 99.69% probability that sample sets have same mean = 0.0031 RESULT:0
Created attachment 857751 [details] Regression script
Between 57 and 58 we have fixed 4 bugs : 977492 1026787 829734 1056204. None of these are in the IO path. Discussed the same with Anush, planning to re-run these tests.
Because of I/O throttling fix (BZ 977492) which was gone into 3.4.0.58rhs-1.el6, the performance might be impacted. Your workload was not very big (-s 10g -t 8 which means 8 threads write of 10g size I/O = 80g). To get the same performance as the previous build (i.e. 3.4.0.57rhs-1.el6, the one you compared with), you can set following parameters to turn the I/O throttling OFF. gluster volume set <volname> nfs.outstanding-rpc-limit 0 gluster volume set <volname> server.outstanding-rpc-limit 0 And restart the I/O workload, ideally you should get the same performance. It should not be a bug. Thanks, Santosh
No update after a month. If no issue, should it be closed?
Please reopen if it reoccurs, closing it for now.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days