Bug 983595 - IOZone hangs or is very slow over glusterfs.
Summary: IOZone hangs or is very slow over glusterfs.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Nagaprasad Sathyanarayana
QA Contact: Ben Turner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-11 14:17 UTC by Ben Turner
Modified: 2016-02-18 00:19 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.4.0.15rhs-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:35:43 UTC
Embargoed:


Attachments (Terms of Use)
IOzone output (14.46 KB, text/plain)
2013-07-31 19:51 UTC, Ben Turner
no flags Details
Good iozone run from 2.1 bits (17.95 KB, text/plain)
2013-07-31 20:07 UTC, Ben Turner
no flags Details

Description Ben Turner 2013-07-11 14:17:47 UTC
Description of problem:

In running client certifications I saw into IOZone taking much longer than normal to complete.  I logged into the client running it and killed the process but it just ended up hanging in D state.  IOZone runs normally take under an hour.  I tested this twice on glusterfs-3.3.0.11rhs-1.el6rhs.x86_64 both times it was either very slow or hung.  Here is the IOZone output:

	Iozone: Performance Test of File I/O
	        Version $Revision: 3.394 $
		Compiled for 64 bit mode.
		Build: linux 

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
	             Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
	             Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
	             Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
	             Ben England.

	Run began: Wed Jul 10 13:03:38 2013

	Auto Mode
	Command line used: iozone -a
	Output is in Kbytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
              64       4   26361   85114    44629    50630    5997   39097    3840   131405    49767    77947    99686   26392    31934
              64       8   52626  104947    59761    65780    8266  102581    6436    96810    37361   149229   158014   27175    34027
              64      16   76005  101113    64007    65844   20139  155003   13562   127777    47545   235303   125978   25226    30445
              64      32  135861  183884  4564786  4564786   63992  214781   30145   272300    49999   273409   342477   52165    67145
              64      64  242312  371894  5389653  4897948   64701  361867   45911   688639    53958   338589   274809   52074    58818
             128       4  107571  122494    38496    38073    5527   46028    2485    81482    18409    98929   103149   25447    27574
             128       8   78343  206424    37069    40558    9112   80856    4876   176034    55223   155939   166223   25102    27563
             128      16   88888  153268    81162    78722    9975  175573    8960   261275    52827   236111   169316   34659    39901
             128      32  142074  219148    65473    76143   28003  212302   16982   349797    69979   300464   232330   35725    37197
             128      64  288360  348661  5122535  5122535   79704  338761   34224   384356    53758   520321   561694   67541    85166
             128     128  419822  540217  4934216  5122535   80192  895074   59919   747933    62442   592699   561694   57163    77764
             256       4   81866  109077    55100    59352    6372   44060    2539   114437    17591   108702   104573   38671    43971
             256       8  119506  193056    55907    60462   10294   83743    4937   174144    26907   189613   184149   36724    41444
             256      16  116574  284751    87789    91923   18255  190758    9763   306537   126489   212294   289589   57607    61081
             256      32  187134  406442    83821    80328   20604  350117   18671   412849   101508   308032   326980   55183    59107
             256      64  254211  438296   138532   143975   52296  316010   35319   437582   112968   417018   483056   67689    78000
             256     128  427307  562520  5569035   231719   93435  530031   66842   526652    78531   826078  1019886   82393    96094
             256     256  502969  748875  4840911  4929815   91986  718796   81246   733527    70212   755195   782721   73924    92621
             512       4  100510  127456    72479    53691    7454   46043    2579   145121    16596   125299   119564   50097    44214
             512       8  134987  198679    63256    75829   11071   82513    5211   199695    31596   179079   208477   59444    46188
             512      16  166229  313327    95810   100369   17834  156959   10153   236381    33582   240943   270164   46532    74364
             512      32  355026  334404    66397    76157   30765  301705   19073   474917   229412   280101   300986   69171    67290
             512      64  268777  344766   123554   126891   41380  365977   37813   537299   234753   380098   399695   88033    95716
             512     128  443712  552932   157627   149267   69773  614654   73797   832420   208984   777002   774759   98252    99555
             512     256  509509  802555   235397   232315   99036  700463   87670   840566    93671  1155924  1380303   97858    97357
             512     512  576535 1089094  4871723  4871723   99532  781811   98198   762924    87339   745186   911184   86238   103539
            1024       4  110272  122034    88673    79830    8961   41996    2546   101315    16116   132696    97330   51832    56558
            1024       8  173234  208343    56512    57853   13361   94918    5220   171669    27966   198456   209788   76275    60919
            1024      16  248482  301172    77858   107754   16631  151456   10248   287311    40809   301701   313918   86538    61454
            1024      32  288198  388762   109306   106380   30308  324569   19880   442697    51722   377453   396880   90164    65355
            1024      64  348199  547631   118342   123746   47293  358039   38400   537825   478504   373027   555277   97460   104158
            1024     128  443841  554131   153294   131804   77364  673635   73553   931702   435736   612010   900637  105437   108096
            1024     256  528104  708646   153661   220879   73611  768219   88010   939446   183680   975063  1035229  104108   109029
            1024     512  582462  788964   240146   229704  107323  720051  101025   885961   178519  1175799  1307525  103361   107417
            1024    1024  652650  954266  4715569  4531485  108626  799390  106835   933322   100509  1045053   834653  103748   108338
            2048       4   97407   94139    58601   103230    6941   51567    2605        7    16306    88643    88915   46698    62511
            2048       8  107516   95705    56193    64380   15653   83533    5113       15    25944    97250    93485   61953    53485
            2048      16   99799  112254    95634    94256   23819  101643   10157       31    43875    94365   100752   62922    55273
            2048      32  109883  114618    50611    66973   30638  105796   19892       65    55747    97621   100367   60806    88378
            2048      64  103112  108228   115889   118333   46672  103169   39776      133    96011   114476   106765  106543   108635
            2048     128  113664  118986   122503   121708   81179  120053   74714      289   979936   131146   134268  109344   110846
            2048     256  117707  120385   130429   141846   82064  118925   93652      408   346357   142340   137173  110000   111521
            2048     512  113286  117565   153271   152620   74069  118423   72927   113355   346762   251103     1949  107495   110708
            2048    1024  118539  118361   229986   235994  110524  120954  107410   114203   208150  1263447  1688337  109882   110869
            2048    2048  113349  119542  4490751  8830061  112023  106456  109203   106505   108994   105382   105496  110184   111273
            4096       4   73595   69774    74848    78287    6993   28789    2452        5    15998    63979    65968   63255    75499
            4096       8   69487   71658    69366    68039   12667   66694    5271       10    26781    72044    72393   71658    62965
            4096      16   73889   73299    71177    64846   23021   73046   10090       20    38092    72714    72426   62371    70622
            4096      32   74717   75384    69901    75416   32019   74301   20681       42    52328    73046    72891   66345    68203
            4096      64   74520   75413   115604   116944   48070   75431   38744       85    87169    74721    75330  109136   111368
            4096     128   73073   77377   118841   118149   84233   77459   75198      174   105477    80542    80474  111998   112971
            4096     256   77404   77235   122517   121814   77784   77533   90289      191  1044127    84129    83168  111355   113264
            4096     512   77587   77201   131235   131357   69436   77009   63143     2157   636131    93479    92843  112160   112471
            4096    1024   77699   77079   153886   153772   93770   77331   87910    77485   388581   116721   115653  112163   112564
            4096    2048   77519   76712   229840   230516  112868   77581  109674    77530   216970   210041   216480  111983   112570
            4096    4096   77650   77243  3941416  3918939  112744   77453  111389    77590   112093    77205    77558  111757   112909
            8192       4   63794   63054    61645    80808    7237   38014    2480        4    16085    62301    63397   74053    63684
            8192       8   64010   63916    75515    68273   11871   40424    5078
iozone: interrupted

exiting iozone


real	128m41.474s
user	0m0.179s
sys	0m2.370s

Normally when I run the kill command on IOZone/Bonnie/<whatever IO generator> I am able to kill the process successfully but this time it just hung in D state and I had to bounce the system to unmount the filesystem:

umount: /gluster-mount: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Version-Release number of selected component (if applicable):

glusterfs-3.3.0.11rhs-1.el6rhs.x86_64

How reproducible:

I hit this twice in automated testing, both times I tested it on:

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 48eec3da-075b-4260-a001-9963ae0bcf52
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0
Brick2: storage-qe12.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1
Brick3: storage-qe13.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2
Brick4: storage-qe14.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3
Brick5: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4
Brick6: storage-qe12.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5
Brick7: storage-qe13.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6
Brick8: storage-qe14.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7
Brick9: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8
Brick10: storage-qe12.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9
Brick11: storage-qe13.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10
Brick12: storage-qe14.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11

Steps to Reproduce:
1.  Create 6x2 volume
2.  Mount -t glusterfs on 6.4 client
3.  Run iozone -a

Actual results:

The process either hung or was very slow, attempts to kill the IOZone process resulted in it hanging in D state.

Expected results:

IOZone on these systems/this config usually completes in under an hour.

Additional info:

I ran into this in a batch of automated tests, I am going back through manually to repro and collect additional data.  I'll update the BZ shortly with the info.

Comment 2 santosh pradhan 2013-07-11 15:49:55 UTC
Hi Ben,
Just to understand the issue little better:

The iozone is slow with glusterfs-3.3.0.11rhs-1.el6rhs.x86_64 but compared to which version? It may help to find the code change/FIX which might be responsible to make it slower.

Thanks,
Santosh

Comment 3 Ben Turner 2013-07-11 17:21:49 UTC
I reran iozone -a on a 6x2 glusterfs mount with profiling and here is what I saw:

[root@storage-qe11 ~]# gluster volume profile testvol info
Brick: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      5.37      34.58 us      13.00 us      57.00 us             12    GETXATTR
      9.84      31.67 us      10.00 us     166.00 us             24     READDIR
     21.79      39.14 us      14.00 us      72.00 us             43      STATFS
     62.99      50.15 us       6.00 us      83.00 us             97      LOOKUP
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      5.37      34.58 us      13.00 us      57.00 us             12    GETXATTR
      9.84      31.67 us      10.00 us     166.00 us             24     READDIR
     21.79      39.14 us      14.00 us      72.00 us             43      STATFS
     62.99      50.15 us       6.00 us      83.00 us             97      LOOKUP
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      4.74      31.33 us      16.00 us      44.00 us             12    GETXATTR
     11.08      36.58 us      11.00 us     124.00 us             24     READDIR
     20.54      37.86 us      13.00 us      91.00 us             43      STATFS
     63.63      51.99 us       6.00 us      96.00 us             97      LOOKUP
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      4.74      31.33 us      16.00 us      44.00 us             12    GETXATTR
     11.08      36.58 us      11.00 us     124.00 us             24     READDIR
     20.54      37.86 us      13.00 us      91.00 us             43      STATFS
     63.63      51.99 us       6.00 us      96.00 us             97      LOOKUP
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      4.92      38.33 us      27.00 us      46.00 us             12    GETXATTR
     12.75      49.71 us      11.00 us     174.00 us             24     READDIR
     18.22      39.65 us      20.00 us      52.00 us             43      STATFS
     64.11      59.40 us       5.00 us      89.00 us            101      LOOKUP
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      4.92      38.33 us      27.00 us      46.00 us             12    GETXATTR
     12.75      49.71 us      11.00 us     174.00 us             24     READDIR
     18.22      39.65 us      20.00 us      52.00 us             43      STATFS
     64.11      59.40 us       5.00 us      89.00 us            101      LOOKUP
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe12.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.32      42.25 us      22.00 us      75.00 us             12    GETXATTR
      8.72      44.37 us      17.00 us      86.00 us             43      STATFS
     24.35      53.28 us      22.00 us     110.00 us            100      LOOKUP
     64.61      39.93 us      10.00 us      58.00 us            354        STAT
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.32      42.25 us      22.00 us      75.00 us             12    GETXATTR
      8.72      44.37 us      17.00 us      86.00 us             43      STATFS
     24.35      53.28 us      22.00 us     110.00 us            100      LOOKUP
     64.61      39.93 us      10.00 us      58.00 us            354        STAT
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe12.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.16      33.17 us      15.00 us      42.00 us             12    GETXATTR
      8.45      36.28 us      18.00 us      70.00 us             43      STATFS
     28.16      54.17 us      26.00 us      88.00 us             96      LOOKUP
     61.23      31.94 us       9.00 us      53.00 us            354        STAT
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.16      33.17 us      15.00 us      42.00 us             12    GETXATTR
      8.45      36.28 us      18.00 us      70.00 us             43      STATFS
     28.16      54.17 us      26.00 us      88.00 us             96      LOOKUP
     61.23      31.94 us       9.00 us      53.00 us            354        STAT
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe12.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.19      31.75 us      14.00 us      39.00 us             12    GETXATTR
      8.78      35.53 us      18.00 us      78.00 us             43      STATFS
     27.80      50.42 us      23.00 us      80.00 us             96      LOOKUP
     61.23      30.11 us      11.00 us      83.00 us            354        STAT
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.19      31.75 us      14.00 us      39.00 us             12    GETXATTR
      8.78      35.53 us      18.00 us      78.00 us             43      STATFS
     27.80      50.42 us      23.00 us      80.00 us             96      LOOKUP
     61.23      30.11 us      11.00 us      83.00 us            354        STAT
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe13.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10
-------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      3.86      21.00 us      13.00 us      39.00 us             12    GETXATTR
     24.23      36.74 us      16.00 us      69.00 us             43      STATFS
     71.91      48.85 us      14.00 us      78.00 us             96      LOOKUP
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      3.86      21.00 us      13.00 us      39.00 us             12    GETXATTR
     24.23      36.74 us      16.00 us      69.00 us             43      STATFS
     71.91      48.85 us      14.00 us      78.00 us             96      LOOKUP
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe13.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      5.36      29.42 us      17.00 us      58.00 us             12    GETXATTR
     25.11      38.47 us      10.00 us      76.00 us             43      STATFS
     69.53      47.70 us      14.00 us      89.00 us             96      LOOKUP
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      5.36      29.42 us      17.00 us      58.00 us             12    GETXATTR
     25.11      38.47 us      10.00 us      76.00 us             43      STATFS
     69.53      47.70 us      14.00 us      89.00 us             96      LOOKUP
 
    Duration: 7616 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe13.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2
------------------------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                 8167                  3914                  1043 
 
   Block Size:              32768b+               65536b+              131072b+ 
 No. of Reads:                    0                    35                 14062 
No. of Writes:                  558                   408                  3447 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            115      FORGET
      0.00       0.00 us       0.00 us       0.00 us           1036     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.00      46.00 us      13.00 us      90.00 us              3       FSTAT
      0.00      35.04 us      17.00 us      59.00 us             24    GETXATTR
      0.00      39.21 us      12.00 us      67.00 us             24     READDIR
      0.00      28.14 us      14.00 us      62.00 us             43      STATFS
      0.00      57.62 us      16.00 us      95.00 us             39   FTRUNCATE
      0.00      80.20 us      29.00 us     120.00 us             79      UNLINK
      0.00      56.33 us       6.00 us      98.00 us            201      LOOKUP
      0.00      35.37 us       7.00 us      88.00 us            354        STAT
      0.00      40.73 us       7.00 us      89.00 us            314     ENTRYLK
      0.00     351.22 us      92.00 us   11571.00 us             78      CREATE
      0.00      45.37 us       6.00 us      87.00 us            702       FLUSH
      0.00      53.08 us      13.00 us     104.00 us            624        OPEN
      0.01      68.56 us       5.00 us   32609.00 us          14429        READ
      0.02     221.94 us      14.00 us   32621.00 us          15042    FXATTROP
      0.14    1360.69 us       7.00 us   56915.00 us          16677       WRITE
     46.49  186513.20 us       3.00 us 2169347.00 us          40626    FINODELK
     53.34 1020648.32 us     171.00 us 2294523.00 us           8518       FSYNC
 
    Duration: 7616 seconds
   Data Read: 1845428224 bytes
Data Written: 584482816 bytes
 
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                 8167                  3914                  1043 
 
   Block Size:              32768b+               65536b+              131072b+ 
 No. of Reads:                    0                    35                 14062 
No. of Writes:                  558                   408                  3447 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            115      FORGET
      0.00       0.00 us       0.00 us       0.00 us           1036     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.00      46.00 us      13.00 us      90.00 us              3       FSTAT
      0.00      35.04 us      17.00 us      59.00 us             24    GETXATTR
      0.00      39.21 us      12.00 us      67.00 us             24     READDIR
      0.00      28.14 us      14.00 us      62.00 us             43      STATFS
      0.00      57.62 us      16.00 us      95.00 us             39   FTRUNCATE
      0.00      80.20 us      29.00 us     120.00 us             79      UNLINK
      0.00      56.33 us       6.00 us      98.00 us            201      LOOKUP
      0.00      35.37 us       7.00 us      88.00 us            354        STAT
      0.00      40.73 us       7.00 us      89.00 us            314     ENTRYLK
      0.00     351.22 us      92.00 us   11571.00 us             78      CREATE
      0.00      45.37 us       6.00 us      87.00 us            702       FLUSH
      0.00      53.08 us      13.00 us     104.00 us            624        OPEN
      0.01      68.56 us       5.00 us   32609.00 us          14429        READ
      0.02     221.94 us      14.00 us   32621.00 us          15042    FXATTROP
      0.14    1360.69 us       7.00 us   56915.00 us          16677       WRITE
     46.49  186513.20 us       3.00 us 2169347.00 us          40626    FINODELK
     53.34 1020648.32 us     171.00 us 2294523.00 us           8518       FSYNC
 
    Duration: 7616 seconds
   Data Read: 1845428224 bytes
Data Written: 584482816 bytes
 
Brick: storage-qe14.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7
------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.01      36.75 us      17.00 us      60.00 us             12    GETXATTR
      3.72      34.00 us      11.00 us      91.00 us             24     READDIR
      8.64      44.07 us      12.00 us      74.00 us             43      STATFS
     22.27      50.35 us       5.00 us      77.00 us             97      LOOKUP
     63.35      39.25 us      10.00 us      74.00 us            354        STAT
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      2.01      36.75 us      17.00 us      60.00 us             12    GETXATTR
      3.72      34.00 us      11.00 us      91.00 us             24     READDIR
      8.64      44.07 us      12.00 us      74.00 us             43      STATFS
     22.27      50.35 us       5.00 us      77.00 us             97      LOOKUP
     63.35      39.25 us      10.00 us      74.00 us            354        STAT
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe14.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11
-------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      1.64      26.92 us      16.00 us      40.00 us             12    GETXATTR
      4.60      37.75 us      11.00 us     133.00 us             24     READDIR
      7.89      36.14 us      14.00 us      75.00 us             43      STATFS
     22.60      45.87 us       6.00 us      73.00 us             97      LOOKUP
     63.26      35.18 us       8.00 us      63.00 us            354        STAT
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      1.64      26.92 us      16.00 us      40.00 us             12    GETXATTR
      4.60      37.75 us      11.00 us     133.00 us             24     READDIR
      7.89      36.14 us      14.00 us      75.00 us             43      STATFS
     22.60      45.87 us       6.00 us      73.00 us             97      LOOKUP
     63.26      35.18 us       8.00 us      63.00 us            354        STAT
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: storage-qe14.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3
------------------------------------------------------------------
Cumulative Stats:
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                 8167                  3914                  1043 
 
   Block Size:              32768b+               65536b+              131072b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                  558                   408                  3447 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            115      FORGET
      0.00       0.00 us       0.00 us       0.00 us           1036     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.00      34.96 us      12.00 us      65.00 us             24     READDIR
      0.00      35.42 us      17.00 us      54.00 us             24    GETXATTR
      0.00      29.40 us      16.00 us      51.00 us             43      STATFS
      0.00      45.49 us      17.00 us      75.00 us             39   FTRUNCATE
      0.00      74.65 us      44.00 us      98.00 us             79      UNLINK
      0.00      50.58 us       4.00 us      90.00 us            201      LOOKUP
      0.00      35.47 us       5.00 us      75.00 us            314     ENTRYLK
      0.00      34.04 us       7.00 us      83.00 us            702       FLUSH
      0.00     354.10 us      79.00 us    9744.00 us             78      CREATE
      0.00      45.94 us      13.00 us      83.00 us            624        OPEN
      0.01      67.36 us       7.00 us   16270.00 us          16677       WRITE
      0.01      31.30 us       3.00 us   31286.00 us          40624    FINODELK
      0.03     171.17 us      15.00 us   33384.00 us          15130    FXATTROP
     99.94 1013579.63 us     169.00 us 2272143.00 us           8518       FSYNC
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 584482816 bytes
 
Interval 0 Stats:
   Block Size:               4096b+                8192b+               16384b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                 8167                  3914                  1043 
 
   Block Size:              32768b+               65536b+              131072b+ 
 No. of Reads:                    0                     0                     0 
No. of Writes:                  558                   408                  3447 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us            115      FORGET
      0.00       0.00 us       0.00 us       0.00 us           1036     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.00      34.96 us      12.00 us      65.00 us             24     READDIR
      0.00      35.42 us      17.00 us      54.00 us             24    GETXATTR
      0.00      29.40 us      16.00 us      51.00 us             43      STATFS
      0.00      45.49 us      17.00 us      75.00 us             39   FTRUNCATE
      0.00      74.65 us      44.00 us      98.00 us             79      UNLINK
      0.00      50.58 us       4.00 us      90.00 us            201      LOOKUP
      0.00      35.47 us       5.00 us      75.00 us            314     ENTRYLK
      0.00      34.04 us       7.00 us      83.00 us            702       FLUSH
      0.00     354.10 us      79.00 us    9744.00 us             78      CREATE
      0.00      45.94 us      13.00 us      83.00 us            624        OPEN
      0.01      67.36 us       7.00 us   16270.00 us          16677       WRITE
      0.01      31.30 us       3.00 us   31286.00 us          40624    FINODELK
      0.03     171.17 us      15.00 us   33384.00 us          15130    FXATTROP
     99.94 1013579.63 us     169.00 us 2272143.00 us           8518       FSYNC
 
    Duration: 7617 seconds
   Data Read: 0 bytes
Data Written: 584482816 bytes

Note the latency on the fsync calls.

Comment 6 Amar Tumballi 2013-07-22 10:33:35 UTC
Noticed that this issue poped up in RHS2.0 U5 bits. Should we treat this blocker for Big Bend?

Comment 7 Ben England 2013-07-30 11:17:52 UTC
other people have complained about iozone -a with new RHS 2.0U5 release, it is because of extra fsyncs that were added to prevent data corruption.  The AFR protocol is introducing a level of synchronization on disk that is rather extreme.  If I understand it correctly, every 4-KB write is mirrored on disk.  Eager lock might help if it allowed fsyncs to be deferred -- I hope RHS 2.0U5 still has deferred postop patch.

Comment 8 Ben Turner 2013-07-31 14:25:33 UTC
@Amar - On the 2.1 bits iozone -a completes in about 44 minutes:

iozone test complete.

real	44m13.372s
user	0m2.969s
sys	1m35.001s

When I tested the u4 bits on the same physical HW it took:

executing iozone
start:08:32:36

real	38m31.960s
user	0m3.690s
sys	1m37.292s
end:09:11:08 

On the u5 bits I killed the task after 375 minutes:

iozone: interrupted

exiting iozone


real	375m51.173s
user	0m0.816s
sys	0m18.000s

This appears to be u5 specific as the 2.1 bits perform pretty close to the u4 bits.

Comment 9 Ben Turner 2013-07-31 19:51:34 UTC
Created attachment 781332 [details]
IOzone output

Slow iozone run.

Comment 11 Ben Turner 2013-07-31 20:07:22 UTC
Created attachment 781346 [details]
Good iozone run from 2.1 bits

Comment 12 Amar Tumballi 2013-08-01 10:37:33 UTC
considering comment #10, should we nack this bug for 'rhs-2.1.0' flag?

Comment 13 Nagaprasad Sathyanarayana 2013-08-05 08:22:59 UTC
The same issue is also fixed in Big Bend.  glusterfs-3.4.0.15rhs-1.el6rhs has those changes. Can you please try it?

Comment 16 Ben England 2013-08-06 22:08:22 UTC
re cmt 11: the iozone run here looks like it runs at line speed, but I don't understand why it does.  Were any volume parameters set?  Also, write behind translator is on by default is it not?   This would convert 4-KB writes into 128 KB writes at protocol level.  If not, is this big bend with eager lock on as default?  If so, then that might explain it.   A gluster volume profile output would show if either or both were happening.  With Anshi RHS 2.0U5 gluster volume profile output in comment 3, it appeared eager-lock was not on.

Comment 17 Nagaprasad Sathyanarayana 2013-08-16 04:31:10 UTC
Below mail from Ben England says that issue is not with Big Bend.  Hence marking this bug only for 2.0.z.


The bz is about RHS 2.0.  the comment 11 contained data from an iozone run with RHS 2.1.   I was trying to help people figure out why it was so slow in RHS 2.0 but not in RHS 2.1  and suggested a couple of possible explanations - no write-behind translator and no eager-lock.  Without these things, RHS 2.0U5 glusterfsd could be doing an fsync on the file as often as every 4-KB.   

There is a patch from Avati to gluster 3.4 that eliminates fsyncs on files that are append-only, but this patch cannot be backported to RHS 2.0U5 because it has a dependency on RHEL6.4 XFS ability to maintain inode size correctly in event of a crash.  So for RHS 2.0 you can tune the system to lessen the problem, but you might not be able to make RHS 2.0U5 write performance regression go away completely.   This should be confirmed in the context of a 1-GbE network link, which I don't typically test with.

Comment 18 Nagaprasad Sathyanarayana 2013-08-16 05:05:54 UTC
Ben England has confirmed that he does not see the reported issue in 2.1.  If you want to re-verify, please do so.  Otherwise, please mark it to Verified.

Thanks

Comment 19 Ben Turner 2013-08-19 15:10:33 UTC
I think that this needs to be put in release notes.  I am marking verified on the idea that this bug is resolved in 2.1 and will pursue getting this into release notes in a new BZ/other avenues.

Comment 20 Scott Haines 2013-09-23 22:35:43 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.