| Summary: | Threads waiting on file open | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | mohitanchlia | ||||
| Component: | core | Assignee: | GlusterFS Bugs list <gluster-bugs> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 3.1.3 | CC: | gluster-bugs, jdarcy, kkeithle, vijay | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-09-30 12:18:42 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Are the 50 threads working on a single glusterfs/fuse mount point? (In reply to comment #1) > Are the 50 threads working on a single glusterfs/fuse mount point? Yes. I have tried spreading clients 25 threads each on 2 different hosts and 2 different mount points but same volume. Still same results (In reply to comment #2) > (In reply to comment #1) > > Are the 50 threads working on a single glusterfs/fuse mount point? > Yes. I have tried spreading clients 25 threads each on 2 different hosts and 2 > different mount points but same volume. Still same results I can also give you the entire client application that I use if it will make it easy to reproduce (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > Are the 50 threads working on a single glusterfs/fuse mount point? > > Yes. I have tried spreading clients 25 threads each on 2 different hosts and 2 > > different mount points but same volume. Still same results > I can also give you the entire client application that I use if it will make it > easy to reproduce Haven't seen any updates. I got chance to talk to Avati on IIRC and tries io-threads on the client side which didn't help. Avati alluded towards too much context switching. How do we fix this issue? Please provide the application code. Also, give the compilation instructions. I need more information on your setup: * What is your machine configuration in terms of CPU make/speed, and memory? (I need this information for both the client and the brick machines) * What is your network fabric? (1GigE? 10GigE? IB?) Since you say the you can reproduce it consistently, can you run your program as suggested below? - On the client: # strace -F -f -c java <your java class file> On the bricks, before you start you program, get the pid of the glusterfs process and run: # strace -f -c -p <glusterfs.pid> [terminate strace once your program on the client has terminated] and provide the strace output? In addition, can you give me the output of the following on the client? # vmstat 1 Note: vmstat will run endlessly. So you will need to run it in another terminal before you start your program and terminate it after your program ends. Pavan (In reply to comment #5) > Please provide the application code. Also, give the compilation instructions. > I need more information on your setup: > * What is your machine configuration in terms of CPU make/speed, and memory? > (I need this information for both the client and the brick machines) > * What is your network fabric? (1GigE? 10GigE? IB?) > Since you say the you can reproduce it consistently, can you run your program > as suggested below? - > On the client: > # strace -F -f -c java <your java class file> > On the bricks, before you start you program, get the pid of the glusterfs > process and run: > # strace -f -c -p <glusterfs.pid> > [terminate strace once your program on the client has terminated] > and provide the strace output? > In addition, can you give me the output of the following on the client? > # vmstat 1 > Note: vmstat will run endlessly. So you will need to run it in another terminal > before you start your program and terminate it after your program ends. > Pavan Is this glusterfs or glusterfsd process? Is strace of both not going to be useful? I thought glusterfs is to help mount the volumes and glusterfsd does actual work on the brick. On the bricks, I meant <glusterfsd.pid>. If you are using NFS, then it would be <glusterfs.pid> One more addition on the client. Please provide the strace output (with the same flags as the ones for the brick) for the client glusterfs process as well. (In reply to comment #7) > On the bricks, I meant <glusterfsd.pid>. If you are using NFS, then it would be > <glusterfs.pid> > One more addition on the client. Please provide the strace output (with the > same flags as the ones for the brick) for the client glusterfs process as well. For some reason I can't get strace output from java client. I have this so far: configuration: 1 GigE, 12 core, 98 GB RAM. client: [root@dsdb1 gluster]# more strace.client.110607.out Process 26934 attached with 4 threads - interrupt to quit Process 26934 detached Process 26935 detached Process 26937 detached Process 26939 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 69.09 201.986677 226951 890 nanosleep 28.70 83.894846 9 8980557 readv 1.78 5.211715 1 4089579 14825 writev 0.32 0.944466 1 824349 101405 futex 0.05 0.153399 0 644674 epoll_wait 0.02 0.069988 69988 1 restart_syscall 0.01 0.043690 0 97423 read 0.01 0.036499 159 230 munmap 0.00 0.003112 0 17070 epoll_ctl 0.00 0.002187 1 4308 mprotect 0.00 0.000949 3 344 madvise 0.00 0.000127 1 228 mmap 0.00 0.000116 0 822 brk ------ ----------- ----------- --------- --------- ---------------- 100.00 292.347771 14660475 116230 total server brick: [root@dsdb2 gluster]# more strace.brick.110607.out Process 27175 attached with 7 threads - interrupt to quit Process 23995 attached (waiting for parent) Process 23995 resumed (parent 27175 ready) Process 23927 detached Process 23995 detached Process 23926 detached Process 27175 detached Process 27176 detached Process 27178 detached Process 27185 detached Process 23558 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 76.34 305.858111 185 1654159 241935 futex 19.25 77.124746 95926 804 nanosleep 3.76 15.067313 24 631034 epoll_wait 0.23 0.921791 15 62135 pwrite 0.13 0.539514 0 2452419 101003 readv 0.07 0.261733 12 21605 open 0.05 0.213151 0 530031 writev 0.04 0.170313 0 424248 147016 lstat 0.04 0.150028 1 134844 fsetxattr 0.03 0.133101 0 633763 281044 lgetxattr 0.01 0.057955 0 280191 42158 fgetxattr 0.01 0.049992 49992 1 restart_syscall 0.01 0.041818 0 305093 clock_gettime 0.01 0.022012 0 145875 fstat 0.00 0.008942 0 21077 lsetxattr 0.00 0.008443 0 21620 close 0.00 0.007305 0 21077 chown 0.00 0.003636 1 2640 read 0.00 0.002724 0 10032 stat 0.00 0.000991 2 528 munmap 0.00 0.000256 2 125 mprotect 0.00 0.000189 0 528 statfs 0.00 0.000138 0 1056 lseek 0.00 0.000056 0 529 mmap 0.00 0.000000 0 6 brk 0.00 0.000000 0 1 clone 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 400.644258 7355422 813156 total vmstat from the server: cat vmstat110607.out|grep [0-9]|tr -s " "|sort -k15 -n|head -40 0 1 0 256256 884944 93792128 0 0 0 5668 4153 98052 1 2 93 4 0 0 1 0 261244 884196 93789424 0 0 0 11880 4576 95809 1 2 93 4 0 1 1 0 244996 892584 93808416 0 0 0 7060 4403 104565 1 2 93 4 0 1 2 0 263612 885672 93786088 0 0 0 43968 3666 92071 1 2 93 4 0 2 1 0 252460 891748 93802208 0 0 0 17632 4248 100031 1 2 93 4 0 0 1 0 241916 884920 93807304 0 0 0 43388 4471 111109 1 2 94 3 0 0 1 0 250684 889464 93801312 0 0 0 52924 3598 88938 1 1 94 4 0 0 1 0 254976 886424 93795712 0 0 0 5768 4196 94476 1 2 94 3 0 0 1 0 257360 888656 93794080 0 0 0 46740 4238 101809 1 2 94 4 0 0 1 0 260700 890248 93791784 0 0 0 49140 3992 96760 1 2 94 4 0 0 1 0 260864 887992 93790720 0 0 0 46264 3965 101462 1 2 94 3 0 0 2 0 259712 887192 93791168 0 0 0 16920 4097 97856 1 2 94 4 0 1 1 0 244384 890964 93808336 0 0 0 19632 3755 84599 1 1 94 4 0 1 1 0 247644 887184 93802192 0 0 0 70816 3959 100446 1 2 94 3 0 1 1 0 256684 885688 93794016 0 0 0 6980 4132 90356 1 2 94 4 0 1 1 0 263612 886420 93786816 0 0 0 46252 3765 90134 1 2 94 3 0 2 1 0 246236 888000 93802736 0 0 0 9240 5062 109899 1 2 94 4 0 0 0 0 260216 882704 93786368 0 0 0 10956 4403 7588 0 0 95 4 0 0 1 0 241576 889476 93810456 0 0 0 6772 4168 89451 1 2 95 3 0 0 1 0 248352 884176 93800896 0 0 0 27600 3505 87239 1 2 95 2 0 0 1 0 258128 883548 93790272 0 0 0 65868 3738 96707 1 2 95 2 0 0 1 0 262820 882880 93782624 0 0 1228 1160 3994 7619 1 0 95 3 0 0 0 0 242152 884572 93806736 0 0 0 43692 3715 98202 1 2 96 1 0 0 0 0 242544 892704 93811872 0 0 0 13804 3873 101296 1 2 96 1 0 0 0 0 243636 885036 93805064 0 0 0 46868 3734 96862 1 2 96 1 0 0 0 0 245940 888296 93804592 0 0 0 46952 3649 92151 1 2 96 1 0 0 0 0 246396 888400 93803856 0 0 0 45336 3831 95486 1 2 96 1 0 0 0 0 250704 885872 93797928 0 0 0 37816 4033 107104 1 2 96 1 0 0 0 0 250912 887308 93800992 0 0 0 48716 3952 104777 1 2 96 1 0 0 0 0 251184 889336 93802032 0 0 0 45524 4134 106283 1 2 96 1 0 0 0 0 251376 887416 93797856 0 0 0 45644 4244 111249 1 2 96 1 0 0 0 0 255876 883664 93793392 0 0 0 41332 3961 101064 1 2 96 1 0 0 0 0 256448 890136 93795456 0 0 0 43628 3869 96257 1 2 96 1 0 0 0 0 257012 889220 93796688 0 0 0 48528 4070 105122 1 1 96 2 0 0 0 0 258560 885352 93791856 0 0 0 42500 3888 95288 1 2 96 1 0 0 0 0 262552 883356 93786104 0 0 0 43984 3770 97145 1 2 96 1 0 0 0 0 263480 890480 93789760 0 0 0 28388 3770 97069 1 2 96 1 0 0 1 0 251232 888672 93799232 0 0 0 4512 3622 79740 0 1 96 2 0 0 2 0 243948 894416 93808048 0 0 0 47984 3941 102996 1 2 96 1 0 0 2 0 251240 893552 93802032 0 0 0 44980 4427 118441 1 2 96 1 0 sar -u from the client: 02:40:01 PM all 1.50 0.01 0.95 0.72 0.00 96.81 02:50:01 PM all 0.01 0.00 0.00 0.02 0.00 99.96 03:00:01 PM all 0.01 0.00 0.00 0.02 0.00 99.97 03:10:01 PM all 0.01 0.00 0.00 0.02 0.00 99.97 03:20:01 PM all 0.01 0.00 0.00 0.02 0.00 99.97 03:30:01 PM all 0.01 0.00 0.00 0.02 0.00 99.97 03:40:01 PM all 0.01 0.01 0.00 0.02 0.00 99.95 03:50:01 PM all 0.01 0.00 0.00 0.02 0.00 99.97 04:00:01 PM all 2.39 0.00 3.79 0.31 0.00 93.52 04:10:01 PM all 0.39 0.00 0.61 0.08 0.00 98.92 Average: all 0.08 0.00 0.08 0.04 0.00 99.80 (In reply to comment #8) > (In reply to comment #7) > > On the bricks, I meant <glusterfsd.pid>. If you are using NFS, then it would be > > <glusterfs.pid> > > One more addition on the client. Please provide the strace output (with the > > same flags as the ones for the brick) for the client glusterfs process as well. > For some reason I can't get strace output from java client. I have this so far: > configuration: > 1 GigE, 12 core, 98 GB RAM. > client: > [root@dsdb1 gluster]# more strace.client.110607.out > Process 26934 attached with 4 threads - interrupt to quit > Process 26934 detached > Process 26935 detached > Process 26937 detached > Process 26939 detached > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 69.09 201.986677 226951 890 nanosleep > 28.70 83.894846 9 8980557 readv > 1.78 5.211715 1 4089579 14825 writev > 0.32 0.944466 1 824349 101405 futex > 0.05 0.153399 0 644674 epoll_wait > 0.02 0.069988 69988 1 restart_syscall > 0.01 0.043690 0 97423 read > 0.01 0.036499 159 230 munmap > 0.00 0.003112 0 17070 epoll_ctl > 0.00 0.002187 1 4308 mprotect > 0.00 0.000949 3 344 madvise > 0.00 0.000127 1 228 mmap > 0.00 0.000116 0 822 brk > ------ ----------- ----------- --------- --------- ---------------- > 100.00 292.347771 14660475 116230 total > server brick: > [root@dsdb2 gluster]# more strace.brick.110607.out > Process 27175 attached with 7 threads - interrupt to quit > Process 23995 attached (waiting for parent) > Process 23995 resumed (parent 27175 ready) > Process 23927 detached > Process 23995 detached > Process 23926 detached > Process 27175 detached > Process 27176 detached > Process 27178 detached > Process 27185 detached > Process 23558 detached > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 76.34 305.858111 185 1654159 241935 futex > 19.25 77.124746 95926 804 nanosleep > 3.76 15.067313 24 631034 epoll_wait > 0.23 0.921791 15 62135 pwrite > 0.13 0.539514 0 2452419 101003 readv > 0.07 0.261733 12 21605 open > 0.05 0.213151 0 530031 writev > 0.04 0.170313 0 424248 147016 lstat > 0.04 0.150028 1 134844 fsetxattr > 0.03 0.133101 0 633763 281044 lgetxattr > 0.01 0.057955 0 280191 42158 fgetxattr > 0.01 0.049992 49992 1 restart_syscall > 0.01 0.041818 0 305093 clock_gettime > 0.01 0.022012 0 145875 fstat > 0.00 0.008942 0 21077 lsetxattr > 0.00 0.008443 0 21620 close > 0.00 0.007305 0 21077 chown > 0.00 0.003636 1 2640 read > 0.00 0.002724 0 10032 stat > 0.00 0.000991 2 528 munmap > 0.00 0.000256 2 125 mprotect > 0.00 0.000189 0 528 statfs > 0.00 0.000138 0 1056 lseek > 0.00 0.000056 0 529 mmap > 0.00 0.000000 0 6 brk > 0.00 0.000000 0 1 clone > 0.00 0.000000 0 1 set_robust_list > ------ ----------- ----------- --------- --------- ---------------- > 100.00 400.644258 7355422 813156 total > vmstat from the server: > cat vmstat110607.out|grep [0-9]|tr -s " "|sort -k15 -n|head -40 > 0 1 0 256256 884944 93792128 0 0 0 5668 4153 98052 1 2 93 4 0 > 0 1 0 261244 884196 93789424 0 0 0 11880 4576 95809 1 2 93 4 0 > 1 1 0 244996 892584 93808416 0 0 0 7060 4403 104565 1 2 93 4 0 > 1 2 0 263612 885672 93786088 0 0 0 43968 3666 92071 1 2 93 4 0 > 2 1 0 252460 891748 93802208 0 0 0 17632 4248 100031 1 2 93 4 0 > 0 1 0 241916 884920 93807304 0 0 0 43388 4471 111109 1 2 94 3 0 > 0 1 0 250684 889464 93801312 0 0 0 52924 3598 88938 1 1 94 4 0 > 0 1 0 254976 886424 93795712 0 0 0 5768 4196 94476 1 2 94 3 0 > 0 1 0 257360 888656 93794080 0 0 0 46740 4238 101809 1 2 94 4 0 > 0 1 0 260700 890248 93791784 0 0 0 49140 3992 96760 1 2 94 4 0 > 0 1 0 260864 887992 93790720 0 0 0 46264 3965 101462 1 2 94 3 0 > 0 2 0 259712 887192 93791168 0 0 0 16920 4097 97856 1 2 94 4 0 > 1 1 0 244384 890964 93808336 0 0 0 19632 3755 84599 1 1 94 4 0 > 1 1 0 247644 887184 93802192 0 0 0 70816 3959 100446 1 2 94 3 0 > 1 1 0 256684 885688 93794016 0 0 0 6980 4132 90356 1 2 94 4 0 > 1 1 0 263612 886420 93786816 0 0 0 46252 3765 90134 1 2 94 3 0 > 2 1 0 246236 888000 93802736 0 0 0 9240 5062 109899 1 2 94 4 0 > 0 0 0 260216 882704 93786368 0 0 0 10956 4403 7588 0 0 95 4 0 > 0 1 0 241576 889476 93810456 0 0 0 6772 4168 89451 1 2 95 3 0 > 0 1 0 248352 884176 93800896 0 0 0 27600 3505 87239 1 2 95 2 0 > 0 1 0 258128 883548 93790272 0 0 0 65868 3738 96707 1 2 95 2 0 > 0 1 0 262820 882880 93782624 0 0 1228 1160 3994 7619 1 0 95 3 0 > 0 0 0 242152 884572 93806736 0 0 0 43692 3715 98202 1 2 96 1 0 > 0 0 0 242544 892704 93811872 0 0 0 13804 3873 101296 1 2 96 1 0 > 0 0 0 243636 885036 93805064 0 0 0 46868 3734 96862 1 2 96 1 0 > 0 0 0 245940 888296 93804592 0 0 0 46952 3649 92151 1 2 96 1 0 > 0 0 0 246396 888400 93803856 0 0 0 45336 3831 95486 1 2 96 1 0 > 0 0 0 250704 885872 93797928 0 0 0 37816 4033 107104 1 2 96 1 0 > 0 0 0 250912 887308 93800992 0 0 0 48716 3952 104777 1 2 96 1 0 > 0 0 0 251184 889336 93802032 0 0 0 45524 4134 106283 1 2 96 1 0 > 0 0 0 251376 887416 93797856 0 0 0 45644 4244 111249 1 2 96 1 0 > 0 0 0 255876 883664 93793392 0 0 0 41332 3961 101064 1 2 96 1 0 > 0 0 0 256448 890136 93795456 0 0 0 43628 3869 96257 1 2 96 1 0 > 0 0 0 257012 889220 93796688 0 0 0 48528 4070 105122 1 1 96 2 0 > 0 0 0 258560 885352 93791856 0 0 0 42500 3888 95288 1 2 96 1 0 > 0 0 0 262552 883356 93786104 0 0 0 43984 3770 97145 1 2 96 1 0 > 0 0 0 263480 890480 93789760 0 0 0 28388 3770 97069 1 2 96 1 0 > 0 1 0 251232 888672 93799232 0 0 0 4512 3622 79740 0 1 96 2 0 > 0 2 0 243948 894416 93808048 0 0 0 47984 3941 102996 1 2 96 1 0 > 0 2 0 251240 893552 93802032 0 0 0 44980 4427 118441 1 2 96 1 0 > sar -u from the client: > 02:40:01 PM all 1.50 0.01 0.95 0.72 0.00 > 96.81 > 02:50:01 PM all 0.01 0.00 0.00 0.02 0.00 > 99.96 > 03:00:01 PM all 0.01 0.00 0.00 0.02 0.00 > 99.97 > 03:10:01 PM all 0.01 0.00 0.00 0.02 0.00 > 99.97 > 03:20:01 PM all 0.01 0.00 0.00 0.02 0.00 > 99.97 > 03:30:01 PM all 0.01 0.00 0.00 0.02 0.00 > 99.97 > 03:40:01 PM all 0.01 0.01 0.00 0.02 0.00 > 99.95 > 03:50:01 PM all 0.01 0.00 0.00 0.02 0.00 > 99.97 > 04:00:01 PM all 2.39 0.00 3.79 0.31 0.00 > 93.52 > 04:10:01 PM all 0.39 0.00 0.61 0.08 0.00 > 98.92 > Average: all 0.08 0.00 0.08 0.04 0.00 > 99.80 Has anyone been able to look at this data? any updates? (In reply to comment #10) > any updates? Will take a look at it and will respond in a couple of days. (In reply to comment #10) > any updates? I thought you had provided your program and instructions on how to compile/execute it. I don't see that in the bug updates. Can you please provide the same. It will help recreating the problem faster. Pavan (In reply to comment #12) > (In reply to comment #10) > > any updates? > I thought you had provided your program and instructions on how to > compile/execute it. I don't see that in the bug updates. Can you please provide > the same. It will help recreating the problem faster. > Pavan It's easy to reproduce. Just write files between 70K-400K randomly using multiple threads. I don't think it's anything special in the code. It just creates files. Created attachment 681 (In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #10) > > > any updates? > > I thought you had provided your program and instructions on how to > > compile/execute it. I don't see that in the bug updates. Can you please provide > > the same. It will help recreating the problem faster. > > Pavan > > It's easy to reproduce. Just write files between 70K-400K randomly using > multiple threads. I don't think it's anything special in the code. It just > creates files. Hi Mohit, Please find attached the script that I used to simulate your problem. I did not find serious issues with it. I strace'd the script to see what kind of latencies open() was suffering. I did not find that to be too large. Let me know if you hit issues on your system with the same script. I ran it as below: # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000 It creates 50 threads which cumulatively write 10000 files. Measuring open latencies: grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r | less The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see if there are any problems here. Pavan (In reply to comment #15) > (In reply to comment #13) > > (In reply to comment #12) > > > (In reply to comment #10) > > > > any updates? > > > I thought you had provided your program and instructions on how to > > > compile/execute it. I don't see that in the bug updates. Can you please provide > > > the same. It will help recreating the problem faster. > > > Pavan > > > > It's easy to reproduce. Just write files between 70K-400K randomly using > > multiple threads. I don't think it's anything special in the code. It just > > creates files. > Hi Mohit, > Please find attached the script that I used to simulate your problem. I did not > find serious issues with it. I strace'd the script to see what kind of > latencies open() was suffering. I did not find that to be too large. Let me > know if you hit issues on your system with the same script. > I ran it as below: > # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000 > It creates 50 threads which cumulatively write 10000 files. > Measuring open latencies: > grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r | > less > The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see > if there are any problems here. > Pavan .6 is HUGE. 600 ms for file open is HUGE and have never seen that before. Not sure why you would think it's ok. So it means you are able to see the issue? (In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #13) > > > (In reply to comment #12) > > > > (In reply to comment #10) > > > > > any updates? > > > > I thought you had provided your program and instructions on how to > > > > compile/execute it. I don't see that in the bug updates. Can you please provide > > > > the same. It will help recreating the problem faster. > > > > Pavan > > > > > > It's easy to reproduce. Just write files between 70K-400K randomly using > > > multiple threads. I don't think it's anything special in the code. It just > > > creates files. > > Hi Mohit, > > Please find attached the script that I used to simulate your problem. I did not > > find serious issues with it. I strace'd the script to see what kind of > > latencies open() was suffering. I did not find that to be too large. Let me > > know if you hit issues on your system with the same script. > > I ran it as below: > > # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000 > > It creates 50 threads which cumulatively write 10000 files. > > Measuring open latencies: > > grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r | > > less > > The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see > > if there are any problems here. > > Pavan > > .6 is HUGE. 600 ms for file open is HUGE and have never seen that before. Not > sure why you would think it's ok. So it means you are able to see the issue? Well, you could call that HUGE, but let me reveal other aspects. When I issued 200 threads using this script, I got a a delay of 1+ seconds. The client side was getting overloaded. Clearly with more and more threads, I can cause more and more delay. Using /dev/urandom created a heavy load on the CPU, and two CPUs (out of 4 on my client system) were almost 0% idle. With this data, I am not sure if this is entirely a GlusterFS problem. It will help if you can confirm you are seeing similar latencies on your setup with this script. That will bring us to a common ground. Another question for you - When you say you are just writing the files, what is your source of data? In my case, it is /dev/urandom and I see this has some effects on the system. As mentioned earlier, I will look into the matter. (In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #15) > > > (In reply to comment #13) > > > > (In reply to comment #12) > > > > > (In reply to comment #10) > > > > > > any updates? > > > > > I thought you had provided your program and instructions on how to > > > > > compile/execute it. I don't see that in the bug updates. Can you please provide > > > > > the same. It will help recreating the problem faster. > > > > > Pavan > > > > > > > > It's easy to reproduce. Just write files between 70K-400K randomly using > > > > multiple threads. I don't think it's anything special in the code. It just > > > > creates files. > > > Hi Mohit, > > > Please find attached the script that I used to simulate your problem. I did not > > > find serious issues with it. I strace'd the script to see what kind of > > > latencies open() was suffering. I did not find that to be too large. Let me > > > know if you hit issues on your system with the same script. > > > I ran it as below: > > > # strace -f -tt -T -o log -e trace=open ./parallel_create 50 10000 > > > It creates 50 threads which cumulatively write 10000 files. > > > Measuring open latencies: > > > grep -o " <[0-9]\.[0-9]*>$" log | sed -e 's/^ <//' -e 's/>$//' | sort -n -r | > > > less > > > The maximum latencies that I hit was about 0.6 seconds. I'll investigate to see > > > if there are any problems here. > > > Pavan > > > > .6 is HUGE. 600 ms for file open is HUGE and have never seen that before. Not > > sure why you would think it's ok. So it means you are able to see the issue? > Well, you could call that HUGE, but let me reveal other aspects. When I issued > 200 threads using this script, I got a a delay of 1+ seconds. The client side > was getting overloaded. Clearly with more and more threads, I can cause more > and more delay. Using /dev/urandom created a heavy load on the CPU, and two > CPUs (out of 4 on my client system) were almost 0% idle. With this data, I am > not sure if this is entirely a GlusterFS problem. It will help if you can > confirm you are seeing similar latencies on your setup with this script. That > will bring us to a common ground. > Another question for you - When you say you are just writing the files, what is > your source of data? In my case, it is /dev/urandom and I see this has some > effects on the system. > As mentioned earlier, I will look into the matter. I used only 50 threads and client CPU was very idle. As you can see from my traces it's doing lot of context switching. And most of the client process threads waiting on file open as you can see from my first post. I am just writing to native glusterfs mount point. If file open takes 60ms I say it's a big issue and is not really acceptable response time for file open. Avati, Assigning it to you for now. Very old version, apparently no longer an issue in current versions. |
Using gluster native client and running tests in which multiple threads (50) create files in parallel to test performance. It puzzled me that writes are very fast but still get only 20 MB/s throughput where disks "dd" with bs=128K gives around 130MB/s with 4 X 2 configuration. So I looked deep and generated stack trace. In the stack trace I see most of the threads are waiting for file open/creation. Now I started recording the latency and show that while writes themselves are fast, most of the time is taken in file open/creation. It ranges from 60ms - 8 secs. Is this fuse? Is this gluster? Don't know but sounds like a problem. "pool-1-thread-25" prio=10 tid=0x00002aaab4085800 nid=0x14ae runnable [0x000000004417f000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:179) "pool-1-thread-24" prio=10 tid=0x00002aaab4083800 nid=0x14ad runnable [0x000000004407e000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.open(Native Method) Latency: Sample latency in ms to open/create a file: 8124 7385 7338 7335 7334 7333 7193 7176 7175 7162 7161 7148 7142 7114 7087 7083 7012 7008 6637 6608 6549 6525 6524 6522 6522 6315 6310 6274 6201 6196 6063 6012 5971 5970 5950 5905 5887 5884 5883 5879 5847 5845 5840 5830 5811 5777 5774 5773 5762 5743 5695 5515 5505 5497 5487 5483 5453 5404 5399 5341 --More--