Bug 764601 (GLUSTER-2869) - Threads waiting on file open
Summary: Threads waiting on file open
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2869
Product: GlusterFS
Classification: Community
Component: core
Version: 3.1.3
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
Assignee: GlusterFS Bugs list
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-02 18:00 UTC by mohitanchlia
Modified: 2014-09-30 12:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-30 12:18:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
A bash script that does multithreaded file creation (1.58 KB, text/plain)
2011-09-30 04:12 UTC, tcp
no flags Details

Description mohitanchlia 2011-05-02 18:00:03 UTC
Using gluster native client and running tests in which multiple threads (50) create files in parallel to test performance. It puzzled me that writes are very fast but still get only 20 MB/s throughput where disks "dd" with bs=128K gives around 130MB/s with 4 X 2 configuration.

So I looked deep and generated stack trace. In the stack trace I see most of the threads are waiting for file open/creation. Now I started recording the latency and show that while writes themselves are fast, most of the time is taken in file open/creation. It ranges from 60ms - 8 secs. Is this fuse? Is this gluster? Don't know but sounds like a problem.

"pool-1-thread-25" prio=10 tid=0x00002aaab4085800 nid=0x14ae runnable [0x000000004417f000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
"pool-1-thread-24" prio=10 tid=0x00002aaab4083800 nid=0x14ad runnable [0x000000004407e000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileOutputStream.open(Native Method)


Latency:

Sample latency in ms to open/create a file:

8124
7385
7338
7335
7334
7333
7193
7176
7175
7162
7161
7148
7142
7114
7087
7083
7012
7008
6637
6608
6549
6525
6524
6522
6522
6315
6310
6274
6201
6196
6063
6012
5971
5970
5950
5905
5887
5884
5883
5879
5847
5845
5840
5830
5811
5777
5774
5773
5762
5743
5695
5515
5505
5497
5487
5483
5453
5404
5399
5341
--More--

Comment 1 Anand Avati 2011-05-03 01:57:54 UTC
Are the 50 threads working on a single glusterfs/fuse mount point?

Comment 2 mohitanchlia 2011-05-03 13:21:37 UTC
(In reply to comment #1)
> Are the 50 threads working on a single glusterfs/fuse mount point?

Yes. I have tried spreading clients 25 threads each on 2 different hosts and 2 different mount points but same volume. Still same results

Comment 3 mohitanchlia 2011-05-06 13:10:51 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Are the 50 threads working on a single glusterfs/fuse mount point?
> Yes. I have tried spreading clients 25 threads each on 2 different hosts and 2
> different mount points but same volume. Still same results

I can also give you the entire client application that I use if it will make it easy to reproduce

Comment 4 mohitanchlia 2011-05-24 14:28:44 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Are the 50 threads working on a single glusterfs/fuse mount point?
> > Yes. I have tried spreading clients 25 threads each on 2 different hosts and 2
> > different mount points but same volume. Still same results
> I can also give you the entire client application that I use if it will make it
> easy to reproduce

Haven't seen any updates. I got chance to talk to Avati on IIRC and tries io-threads on the client side which didn't help. Avati alluded towards too much context switching. How do we fix this issue?

Comment 5 tcp 2011-05-25 03:24:39 UTC
Please provide the application code. Also, give the compilation instructions.

I need more information on your setup:

* What is your machine configuration in terms of CPU make/speed, and memory?
(I need this information for both the client and the brick machines)
* What is your network fabric? (1GigE? 10GigE? IB?)

Since you say the you can reproduce it consistently, can you run your program as suggested below? -

On the client:
# strace -F -f -c java <your java class file>

On the bricks, before you start you program, get the pid of the glusterfs process and run:

# strace -f -c -p <glusterfs.pid>
[terminate strace once your program on the client has terminated]

and provide the strace output?

In addition, can you give me the output of the following on the client? 
# vmstat 1

Note: vmstat will run endlessly. So you will need to run it in another terminal before you start your program and terminate it after your program ends.

Pavan

Comment 6 mohitanchlia 2011-06-03 20:01:00 UTC
(In reply to comment #5)
> Please provide the application code. Also, give the compilation instructions.
> I need more information on your setup:
> * What is your machine configuration in terms of CPU make/speed, and memory?
> (I need this information for both the client and the brick machines)
> * What is your network fabric? (1GigE? 10GigE? IB?)
> Since you say the you can reproduce it consistently, can you run your program
> as suggested below? -
> On the client:
> # strace -F -f -c java <your java class file>
> On the bricks, before you start you program, get the pid of the glusterfs
> process and run:
> # strace -f -c -p <glusterfs.pid>
> [terminate strace once your program on the client has terminated]
> and provide the strace output?
> In addition, can you give me the output of the following on the client? 
> # vmstat 1
> Note: vmstat will run endlessly. So you will need to run it in another terminal
> before you start your program and terminate it after your program ends.
> Pavan

Is this glusterfs or glusterfsd process? Is strace of both not going to be useful? I thought glusterfs is to help mount the volumes and glusterfsd does actual work on the brick.

Comment 7 tcp 2011-06-04 02:20:45 UTC
On the bricks, I meant <glusterfsd.pid>. If you are using NFS, then it would be <glusterfs.pid>

One more addition on the client. Please provide the strace output (with the same flags as the ones for the brick) for the client glusterfs process as well.

Comment 8 mohitanchlia 2011-06-07 20:17:22 UTC
(In reply to comment #7)
> On the bricks, I meant <glusterfsd.pid>. If you are using NFS, then it would be
> <glusterfs.pid>
> One more addition on the client. Please provide the strace output (with the
> same flags as the ones for the brick) for the client glusterfs process as well.

For some reason I can't get strace output from java client. I have this so far:

configuration:

1 GigE, 12 core, 98 GB RAM.

client:

[root@dsdb1 gluster]# more strace.client.110607.out
Process 26934 attached with 4 threads - interrupt to quit
Process 26934 detached
Process 26935 detached
Process 26937 detached
Process 26939 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 69.09  201.986677      226951       890           nanosleep
 28.70   83.894846           9   8980557           readv
  1.78    5.211715           1   4089579     14825 writev
  0.32    0.944466           1    824349    101405 futex
  0.05    0.153399           0    644674           epoll_wait
  0.02    0.069988       69988         1           restart_syscall
  0.01    0.043690           0     97423           read
  0.01    0.036499         159       230           munmap
  0.00    0.003112           0     17070           epoll_ctl
  0.00    0.002187           1      4308           mprotect
  0.00    0.000949           3       344           madvise
  0.00    0.000127           1       228           mmap
  0.00    0.000116           0       822           brk
------ ----------- ----------- --------- --------- ----------------
100.00  292.347771              14660475    116230 total


server brick:


[root@dsdb2 gluster]# more strace.brick.110607.out
Process 27175 attached with 7 threads - interrupt to quit
Process 23995 attached (waiting for parent)
Process 23995 resumed (parent 27175 ready)
Process 23927 detached
Process 23995 detached
Process 23926 detached
Process 27175 detached
Process 27176 detached
Process 27178 detached
Process 27185 detached
Process 23558 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 76.34  305.858111         185   1654159    241935 futex
 19.25   77.124746       95926       804           nanosleep
  3.76   15.067313          24    631034           epoll_wait
  0.23    0.921791          15     62135           pwrite
  0.13    0.539514           0   2452419    101003 readv
  0.07    0.261733          12     21605           open
  0.05    0.213151           0    530031           writev
  0.04    0.170313           0    424248    147016 lstat
  0.04    0.150028           1    134844           fsetxattr
  0.03    0.133101           0    633763    281044 lgetxattr
  0.01    0.057955           0    280191     42158 fgetxattr
  0.01    0.049992       49992         1           restart_syscall
  0.01    0.041818           0    305093           clock_gettime
  0.01    0.022012           0    145875           fstat
  0.00    0.008942           0     21077           lsetxattr
  0.00    0.008443           0     21620           close
  0.00    0.007305           0     21077           chown
  0.00    0.003636           1      2640           read
  0.00    0.002724           0     10032           stat
  0.00    0.000991           2       528           munmap
  0.00    0.000256           2       125           mprotect
  0.00    0.000189           0       528           statfs
  0.00    0.000138           0      1056           lseek
  0.00    0.000056           0       529           mmap
  0.00    0.000000           0         6           brk
  0.00    0.000000           0         1           clone
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00  400.644258               7355422    813156 total

vmstat from the server:

 cat vmstat110607.out|grep [0-9]|tr -s " "|sort -k15 -n|head -40
 0 1 0 256256 884944 93792128 0 0 0 5668 4153 98052 1 2 93 4 0
 0 1 0 261244 884196 93789424 0 0 0 11880 4576 95809 1 2 93 4 0
 1 1 0 244996 892584 93808416 0 0 0 7060 4403 104565 1 2 93 4 0
 1 2 0 263612 885672 93786088 0 0 0 43968 3666 92071 1 2 93 4 0
 2 1 0 252460 891748 93802208 0 0 0 17632 4248 100031 1 2 93 4 0
 0 1 0 241916 884920 93807304 0 0 0 43388 4471 111109 1 2 94 3 0
 0 1 0 250684 889464 93801312 0 0 0 52924 3598 88938 1 1 94 4 0
 0 1 0 254976 886424 93795712 0 0 0 5768 4196 94476 1 2 94 3 0
 0 1 0 257360 888656 93794080 0 0 0 46740 4238 101809 1 2 94 4 0
 0 1 0 260700 890248 93791784 0 0 0 49140 3992 96760 1 2 94 4 0
 0 1 0 260864 887992 93790720 0 0 0 46264 3965 101462 1 2 94 3 0
 0 2 0 259712 887192 93791168 0 0 0 16920 4097 97856 1 2 94 4 0
 1 1 0 244384 890964 93808336 0 0 0 19632 3755 84599 1 1 94 4 0
 1 1 0 247644 887184 93802192 0 0 0 70816 3959 100446 1 2 94 3 0
 1 1 0 256684 885688 93794016 0 0 0 6980 4132 90356 1 2 94 4 0
 1 1 0 263612 886420 93786816 0 0 0 46252 3765 90134 1 2 94 3 0
 2 1 0 246236 888000 93802736 0 0 0 9240 5062 109899 1 2 94 4 0
 0 0 0 260216 882704 93786368 0 0 0 10956 4403 7588 0 0 95 4 0
 0 1 0 241576 889476 93810456 0 0 0 6772 4168 89451 1 2 95 3 0
 0 1 0 248352 884176 93800896 0 0 0 27600 3505 87239 1 2 95 2 0
 0 1 0 258128 883548 93790272 0 0 0 65868 3738 96707 1 2 95 2 0
 0 1 0 262820 882880 93782624 0 0 1228 1160 3994 7619 1 0 95 3 0
 0 0 0 242152 884572 93806736 0 0 0 43692 3715 98202 1 2 96 1 0
 0 0 0 242544 892704 93811872 0 0 0 13804 3873 101296 1 2 96 1 0
 0 0 0 243636 885036 93805064 0 0 0 46868 3734 96862 1 2 96 1 0
 0 0 0 245940 888296 93804592 0 0 0 46952 3649 92151 1 2 96 1 0
 0 0 0 246396 888400 93803856 0 0 0 45336 3831 95486 1 2 96 1 0
 0 0 0 250704 885872 93797928 0 0 0 37816 4033 107104 1 2 96 1 0
 0 0 0 250912 887308 93800992 0 0 0 48716 3952 104777 1 2 96 1 0
 0 0 0 251184 889336 93802032 0 0 0 45524 4134 106283 1 2 96 1 0
 0 0 0 251376 887416 93797856 0 0 0 45644 4244 111249 1 2 96 1 0
 0 0 0 255876 883664 93793392 0 0 0 41332 3961 101064 1 2 96 1 0
 0 0 0 256448 890136 93795456 0 0 0 43628 3869 96257 1 2 96 1 0
 0 0 0 257012 889220 93796688 0 0 0 48528 4070 105122 1 1 96 2 0
 0 0 0 258560 885352 93791856 0 0 0 42500 3888 95288 1 2 96 1 0
 0 0 0 262552 883356 93786104 0 0 0 43984 3770 97145 1 2 96 1 0
 0 0 0 263480 890480 93789760 0 0 0 28388 3770 97069 1 2 96 1 0
 0 1 0 251232 888672 93799232 0 0 0 4512 3622 79740 0 1 96 2 0
 0 2 0 243948 894416 93808048 0 0 0 47984 3941 102996 1 2 96 1 0
 0 2 0 251240 893552 93802032 0 0 0 44980 4427 118441 1 2 96 1 0

sar -u from the client:

02:40:01 PM       all      1.50      0.01      0.95      0.72      0.00     96.81
02:50:01 PM       all      0.01      0.00      0.00      0.02      0.00     99.96
03:00:01 PM       all      0.01      0.00      0.00      0.02      0.00     99.97
03:10:01 PM       all      0.01      0.00      0.00      0.02      0.00     99.97
03:20:01 PM       all      0.01      0.00      0.00      0.02      0.00     99.97
03:30:01 PM       all      0.01      0.00      0.00      0.02      0.00     99.97
03:40:01 PM       all      0.01      0.01      0.00      0.02      0.00     99.95
03:50:01 PM       all      0.01      0.00      0.00      0.02      0.00     99.97
04:00:01 PM       all      2.39      0.00      3.79      0.31      0.00     93.52
04:10:01 PM       all      0.39      0.00      0.61      0.08      0.00     98.92
Average:          all      0.08      0.00      0.08      0.04      0.00     99.80

Comment 9 mohitanchlia 2011-06-20 17:24:02 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > On the bricks, I meant <glusterfsd.pid>. If you are using NFS, then it would be
> > <glusterfs.pid>
> > One more addition on the client. Please provide the strace output (with the
> > same flags as the ones for the brick) for the client glusterfs process as well.
> For some reason I can't get strace output from java client. I have this so far:
> configuration:
> 1 GigE, 12 core, 98 GB RAM.
> client:
> [root@dsdb1 gluster]# more strace.client.110607.out
> Process 26934 attached with 4 threads - interrupt to quit
> Process 26934 detached
> Process 26935 detached
> Process 26937 detached
> Process 26939 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  69.09  201.986677      226951       890           nanosleep
>  28.70   83.894846           9   8980557           readv
>   1.78    5.211715           1   4089579     14825 writev
>   0.32    0.944466           1    824349    101405 futex
>   0.05    0.153399           0    644674           epoll_wait
>   0.02    0.069988       69988         1           restart_syscall
>   0.01    0.043690           0     97423           read
>   0.01    0.036499         159       230           munmap
>   0.00    0.003112           0     17070           epoll_ctl
>   0.00    0.002187           1      4308           mprotect
>   0.00    0.000949           3       344           madvise
>   0.00    0.000127           1       228           mmap
>   0.00    0.000116           0       822           brk
> ------ ----------- ----------- --------- --------- ----------------
> 100.00  292.347771              14660475    116230 total
> server brick:
> [root@dsdb2 gluster]# more strace.brick.110607.out
> Process 27175 attached with 7 threads - interrupt to quit
> Process 23995 attached (waiting for parent)
> Process 23995 resumed (parent 27175 ready)
> Process 23927 detached
> Process 23995 detached
> Process 23926 detached
> Process 27175 detached
> Process 27176 detached
> Process 27178 detached
> Process 27185 detached
> Process 23558 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  76.34  305.858111         185   1654159    241935 futex
>  19.25   77.124746       95926       804           nanosleep
>   3.76   15.067313          24    631034           epoll_wait
>   0.23    0.921791          15     62135           pwrite
>   0.13    0.539514           0   2452419    101003 readv
>   0.07    0.261733          12     21605           open
>   0.05    0.213151           0    530031           writev
>   0.04    0.170313           0    424248    147016 lstat
>   0.04    0.150028           1    134844           fsetxattr
>   0.03    0.133101           0    633763    281044 lgetxattr
>   0.01    0.057955           0    280191     42158 fgetxattr
>   0.01    0.049992       49992         1           restart_syscall
>   0.01    0.041818           0    305093           clock_gettime
>   0.01    0.022012           0    145875           fstat
>   0.00    0.008942           0     21077           lsetxattr
>   0.00    0.008443           0     21620           close
>   0.00    0.007305           0     21077           chown
>   0.00    0.003636           1      2640           read
>   0.00    0.002724           0     10032           stat
>   0.00    0.000991           2       528           munmap
>   0.00    0.000256           2       125           mprotect
>   0.00    0.000189           0       528           statfs
>   0.00    0.000138           0      1056           lseek
>   0.00    0.000056           0       529           mmap
>   0.00    0.000000           0         6           brk
>   0.00    0.000000           0         1           clone
>   0.00    0.000000           0         1           set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00  400.644258               7355422    813156 total
> vmstat from the server:
>  cat vmstat110607.out|grep [0-9]|tr -s " "|sort -k15 -n|head -40
>  0 1 0 256256 884944 93792128 0 0 0 5668 4153 98052 1 2 93 4 0
>  0 1 0 261244 884196 93789424 0 0 0 11880 4576 95809 1 2 93 4 0
>  1 1 0 244996 892584 93808416 0 0 0 7060 4403 104565 1 2 93 4 0
>  1 2 0 263612 885672 93786088 0 0 0 43968 3666 92071 1 2 93 4 0
>  2 1 0 252460 891748 93802208 0 0 0 17632 4248 100031 1 2 93 4 0
>  0 1 0 241916 884920 93807304 0 0 0 43388 4471 111109 1 2 94 3 0
>  0 1 0 250684 889464 93801312 0 0 0 52924 3598 88938 1 1 94 4 0
>  0 1 0 254976 886424 93795712 0 0 0 5768 4196 94476 1 2 94 3 0
>  0 1 0 257360 888656 93794080 0 0 0 46740 4238 101809 1 2 94 4 0
>  0 1 0 260700 890248 93791784 0 0 0 49140 3992 96760 1 2 94 4 0
>  0 1 0 260864 887992 93790720 0 0 0 46264 3965 101462 1 2 94 3 0
>  0 2 0 259712 887192 93791168 0 0 0 16920 4097 97856 1 2 94 4 0
>  1 1 0 244384 890964 93808336 0 0 0 19632 3755 84599 1 1 94 4 0
>  1 1 0 247644 887184 93802192 0 0 0 70816 3959 100446 1 2 94 3 0
>  1 1 0 256684 885688 93794016 0 0 0 6980 4132 90356 1 2 94 4 0
>  1 1 0 263612 886420 93786816 0 0 0 46252 3765 90134 1 2 94 3 0
>  2 1 0 246236 888000 93802736 0 0 0 9240 5062 109899 1 2 94 4 0
>  0 0 0 260216 882704 93786368 0 0 0 10956 4403 7588 0 0 95 4 0
>  0 1 0 241576 889476 93810456 0 0 0 6772 4168 89451 1 2 95 3 0
>  0 1 0 248352 884176 93800896 0 0 0 27600 3505 87239 1 2 95 2 0
>  0 1 0 258128 883548 93790272 0 0 0 65868 3738 96707 1 2 95 2 0
>  0 1 0 262820 882880 93782624 0 0 1228 1160 3994 7619 1 0 95 3 0
>  0 0 0 242152 884572 93806736 0 0 0 43692 3715 98202 1 2 96 1 0
>  0 0 0 242544 892704 93811872 0 0 0 13804 3873 101296 1 2 96 1 0
>  0 0 0 243636 885036 93805064 0 0 0 46868 3734 96862 1 2 96 1 0
>  0 0 0 245940 888296 93804592 0 0 0 46952 3649 92151 1 2 96 1 0
>  0 0 0 246396 888400 93803856 0 0 0 45336 3831 95486 1 2 96 1 0
>  0 0 0 250704 885872 93797928 0 0 0 37816 4033 107104 1 2 96 1 0
>  0 0 0 250912 887308 93800992 0 0 0 48716 3952 104777 1 2 96 1 0
>  0 0 0 251184 889336 93802032 0 0 0 45524 4134 106283 1 2 96 1 0
>  0 0 0 251376 887416 93797856 0 0 0 45644 4244 111249 1 2 96 1 0
>  0 0 0 255876 883664 93793392 0 0 0 41332 3961 101064 1 2 96 1 0
>  0 0 0 256448 890136 93795456 0 0 0 43628 3869 96257 1 2 96 1 0
>  0 0 0 257012 889220 93796688 0 0 0 48528 4070 105122 1 1 96 2 0
>  0 0 0 258560 885352 93791856 0 0 0 42500 3888 95288 1 2 96 1 0
>  0 0 0 262552 883356 93786104 0 0 0 43984 3770 97145 1 2 96 1 0
>  0 0 0 263480 890480 93789760 0 0 0 28388 3770 97069 1 2 96 1 0
>  0 1 0 251232 888672 93799232 0 0 0 4512 3622 79740 0 1 96 2 0
>  0 2 0 243948 894416 93808048 0 0 0 47984 3941 102996 1 2 96 1 0
>  0 2 0 251240 893552 93802032 0 0 0 44980 4427 118441 1 2 96 1 0
> sar -u from the client:
> 02:40:01 PM       all      1.50      0.01      0.95      0.72      0.00    
> 96.81
> 02:50:01 PM       all      0.01      0.00      0.00      0.02      0.00    
> 99.96
> 03:00:01 PM       all      0.01      0.00      0.00      0.02      0.00    
> 99.97
> 03:10:01 PM       all      0.01      0.00      0.00      0.02      0.00    
> 99.97
> 03:20:01 PM       all      0.01      0.00      0.00      0.02      0.00    
> 99.97
> 03:30:01 PM       all      0.01      0.00      0.00      0.02      0.00    
> 99.97
> 03:40:01 PM       all      0.01      0.01      0.00      0.02      0.00    
> 99.95
> 03:50:01 PM       all      0.01      0.00      0.00      0.02      0.00    
> 99.97
> 04:00:01 PM       all      2.39      0.00      3.79      0.31      0.00    
> 93.52
> 04:10:01 PM       all      0.39      0.00      0.61      0.08      0.00    
> 98.92
> Average:          all      0.08      0.00      0.08      0.04      0.00    
> 99.80

Has anyone been able to look at this data?

Comment 10 mohitanchlia 2011-07-27 17:25:03 UTC
any updates?

Comment 11 tcp 2011-07-28 03:13:46 UTC
(In reply to comment #10)
> any updates?

Will take a look at it and will respond in a couple of days.

Comment 12 tcp 2011-08-01 05:39:14 UTC
(In reply to comment #10)
> any updates?

I thought you had provided your program and instructions on how to compile/execute it. I don't see that in the bug updates. Can you please provide the same. It will help recreating the problem faster.

Pavan

Comment 13 mohitanchlia 2011-08-23 15:27:35 UTC
(In reply to comment #12)
> (In reply to comment #10)
> > any updates?
> I thought you had provided your program and instructions on how to
> compile/execute it. I don't see that in the bug updates. Can you please provide
> the same. It will help recreating the problem faster.
> Pavan

It's easy to reproduce. Just write files between 70K-400K randomly using multiple threads. I don't think it's anything special in the code. It just creates files.

Comment 14 tcp 2011-09-30 04:12:15 UTC
Created attachment 681

Comment 15 tcp 2011-09-30 04:14:36 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #10)
> > > any updates?
> > I thought you had provided your program and instructions on how to
> > compile/execute it. I don't see that in the bug updates. Can you please provide
> > the same. It will help recreating the problem faster.
> > Pavan
> 
> It's easy to reproduce. Just write files between 70K-400K randomly using
> multiple threads. I don't think it's anything special in the code. It just
> creates files.

Hi Mohit,

Please find attached the script that I used to simulate your problem. I did not
find serious issues with it. I strace'd the script to see what kind of
latencies open() was suffering. I did not find that to be too large. Let me
know if you hit issues on your system with the same script.

I ran it as below:
# strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000

It creates 50 threads which cumulatively write 10000 files.

Measuring open latencies:
grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r |
less

The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see
if there are any problems here.

Pavan

Comment 16 mohitanchlia 2011-09-30 10:58:38 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > (In reply to comment #12)
> > > (In reply to comment #10)
> > > > any updates?
> > > I thought you had provided your program and instructions on how to
> > > compile/execute it. I don't see that in the bug updates. Can you please provide
> > > the same. It will help recreating the problem faster.
> > > Pavan
> > 
> > It's easy to reproduce. Just write files between 70K-400K randomly using
> > multiple threads. I don't think it's anything special in the code. It just
> > creates files.
> Hi Mohit,
> Please find attached the script that I used to simulate your problem. I did not
> find serious issues with it. I strace'd the script to see what kind of
> latencies open() was suffering. I did not find that to be too large. Let me
> know if you hit issues on your system with the same script.
> I ran it as below:
> # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000
> It creates 50 threads which cumulatively write 10000 files.
> Measuring open latencies:
> grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r |
> less
> The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see
> if there are any problems here.
> Pavan

.6 is HUGE. 600 ms for file open is HUGE and have never seen that before. Not sure why you would think it's ok. So it means you are able to see the issue?

Comment 17 tcp 2011-09-30 11:54:00 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #13)
> > > (In reply to comment #12)
> > > > (In reply to comment #10)
> > > > > any updates?
> > > > I thought you had provided your program and instructions on how to
> > > > compile/execute it. I don't see that in the bug updates. Can you please provide
> > > > the same. It will help recreating the problem faster.
> > > > Pavan
> > > 
> > > It's easy to reproduce. Just write files between 70K-400K randomly using
> > > multiple threads. I don't think it's anything special in the code. It just
> > > creates files.
> > Hi Mohit,
> > Please find attached the script that I used to simulate your problem. I did not
> > find serious issues with it. I strace'd the script to see what kind of
> > latencies open() was suffering. I did not find that to be too large. Let me
> > know if you hit issues on your system with the same script.
> > I ran it as below:
> > # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000
> > It creates 50 threads which cumulatively write 10000 files.
> > Measuring open latencies:
> > grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r |
> > less
> > The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see
> > if there are any problems here.
> > Pavan
> 
> .6 is HUGE. 600 ms for file open is HUGE and have never seen that before. Not
> sure why you would think it's ok. So it means you are able to see the issue?

Well, you could call that HUGE, but let me reveal other aspects. When I issued 200 threads using this script, I got a a delay of 1+ seconds. The client side was getting overloaded. Clearly with more and more threads, I can cause more and more delay. Using /dev/urandom created a heavy load on the CPU, and two CPUs (out of 4 on my client system) were almost 0% idle. With this data, I am not sure if this is entirely a GlusterFS problem. It will help if you can confirm you are seeing similar latencies on your setup with this script. That will bring us to a common ground.

Another question for you - When you say you are just writing the files, what is your source of data? In my case, it is /dev/urandom and I see this has some effects on the system.

As mentioned earlier, I will look into the matter.

Comment 18 mohitanchlia 2011-09-30 13:28:32 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > (In reply to comment #15)
> > > (In reply to comment #13)
> > > > (In reply to comment #12)
> > > > > (In reply to comment #10)
> > > > > > any updates?
> > > > > I thought you had provided your program and instructions on how to
> > > > > compile/execute it. I don't see that in the bug updates. Can you please provide
> > > > > the same. It will help recreating the problem faster.
> > > > > Pavan
> > > > 
> > > > It's easy to reproduce. Just write files between 70K-400K randomly using
> > > > multiple threads. I don't think it's anything special in the code. It just
> > > > creates files.
> > > Hi Mohit,
> > > Please find attached the script that I used to simulate your problem. I did not
> > > find serious issues with it. I strace'd the script to see what kind of
> > > latencies open() was suffering. I did not find that to be too large. Let me
> > > know if you hit issues on your system with the same script.
> > > I ran it as below:
> > > # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000
> > > It creates 50 threads which cumulatively write 10000 files.
> > > Measuring open latencies:
> > > grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r |
> > > less
> > > The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see
> > > if there are any problems here.
> > > Pavan
> > 
> > .6 is HUGE. 600 ms for file open is HUGE and have never seen that before. Not
> > sure why you would think it's ok. So it means you are able to see the issue?
> Well, you could call that HUGE, but let me reveal other aspects. When I issued
> 200 threads using this script, I got a a delay of 1+ seconds. The client side
> was getting overloaded. Clearly with more and more threads, I can cause more
> and more delay. Using /dev/urandom created a heavy load on the CPU, and two
> CPUs (out of 4 on my client system) were almost 0% idle. With this data, I am
> not sure if this is entirely a GlusterFS problem. It will help if you can
> confirm you are seeing similar latencies on your setup with this script. That
> will bring us to a common ground.
> Another question for you - When you say you are just writing the files, what is
> your source of data? In my case, it is /dev/urandom and I see this has some
> effects on the system.
> As mentioned earlier, I will look into the matter.

I used only 50 threads and client CPU was very idle. As you can see from my traces it's doing lot of context switching. And most of the client process threads waiting on file open as you can see from my first post. I am just writing to native glusterfs mount point. If file open takes 60ms I say it's a big issue and is not really acceptable response time for file open.

Comment 19 Amar Tumballi 2011-10-13 07:32:00 UTC
Avati,

Assigning it to you for now.

Comment 20 Kaleb KEITHLEY 2014-09-30 12:18:42 UTC
Very old version, apparently no longer an issue in current versions.


Note You need to log in before you can comment on or make changes to this bug.