Bug 688232
Summary: | problem with nfs ls latency during high IO | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Judith Flo <jflo> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 14 | CC: | edgar.hoch, gansalmon, harshula, itamar, jonathan, kernel-maint, madhu.chinakonda |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-08-16 13:47:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Judith Flo
2011-03-16 15:51:54 UTC
RHEL 6.0 released with RH kernel 2.6.32-71 and it *does* contain the commit acdc53b2 "NFS: Replace __nfs_write_mapping with sync_inode()" backported to 2.6.32. So, I doubt that the reported bad Fedora performance is due to that commit. (In reply to comment #1) > RHEL 6.0 released with RH kernel 2.6.32-71 and it *does* contain the > commit acdc53b2 "NFS: Replace __nfs_write_mapping with sync_inode()" > backported to 2.6.32. So, I doubt that the reported bad Fedora > performance is due to that commit. I have no problem with RHEL 6 but, with fedora. RHEL 6 does contain the change, but apparently fedora 14 (kernel 2.6.35) does not. The code of the inode.c function is definitely different between the two kernels, I don't know if this is an issue or maybe the code is simply different but it does the same. The behavior of the ls command in both distributions against the same server is completely different, that's for sure. (In reply to comment #2) > RHEL 6 does contain the change, but apparently fedora 14 (kernel 2.6.35) does > not. > The code of the inode.c function is definitely different between the two > kernels ... Both Fedora 14 and RHEL 6.0 have commit acdc53b2 "NFS: Replace __nfs_write_mapping with sync_inode()": kernel-2.6.35.11-83.fc14.src.rpm: ------------------------------------------ 495 int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) 496 { 497 struct inode *inode = dentry->d_inode; 498 int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; 499 int err; 500 501 /* Flush out writes to the server in order to update c/mtime. */ 502 if (S_ISREG(inode->i_mode)) { 503 err = filemap_write_and_wait(inode->i_mapping); 504 if (err) 505 goto out; 506 } ------------------------------------------ 2.6.32-71.el6: ------------------------------------------ 494 int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) 495 { 496 struct inode *inode = dentry->d_inode; 497 int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; 498 int err; 499 500 /* Flush out writes to the server in order to update c/mtime. */ 501 if (S_ISREG(inode->i_mode)) { 502 err = filemap_write_and_wait(inode->i_mapping); 503 if (err) 504 goto out; 505 } ------------------------------------------ Can you run strace on the ls -l whilst reproducing the problem and attach the strace here? i.e. $ strace -T -tt -f -v -q -s 4096 -o <STRACE.log> ls -l /nfs_server/file36G The size of the file only needs to be slightly larger than the amount of RAM on the system. How much RAM do you have on the system? Start the tcpdump capture just before you do the ls -l and stop it (Ctrl-C) just after the ls -l completes. e.g. tcpdump -n -s 0 -i <INTERFACE> -w <OUTPUT.log> host <NFS-SERVER-IP> The file is probably going to be very large, so you might have to put it somewhere I can download it from. (In reply to comment #4) > Can you run strace on the ls -l whilst reproducing the problem and attach the > strace here? i.e. $ strace -T -tt -f -v -q -s 4096 -o <STRACE.log> ls -l > /nfs_server/file36G > The size of the file only needs to be slightly larger than the amount of RAM on > the system. How much RAM do you have on the system? This client has 64Gb of ram (the nfs server has 48Gb), that's why I picked that file size, with 12Gb of file size, the times are not that big and ls doesn't get that much stuck, but the true is that the most usual files that this nfs server will be dealing with, will be big, 36Gb is a "reasonable" example. > > Start the tcpdump capture just before you do the ls -l and stop it (Ctrl-C) > just after the ls -l completes. > e.g. tcpdump -n -s 0 -i <INTERFACE> -w <OUTPUT.log> host <NFS-SERVER-IP> > The file is probably going to be very large, so you might have to put it > somewhere I can download it from. Command lines issued: tcpdump -n -s 0 -i bond0 -w /home/judith/tcpdump.log host xxxxxxx.43 strace -T -tt -f -v -q -s 4096 -o /home/judith/strace.log ls -l /ibrix_sas/110317/prova.tar You can download the file from here [1], please let me know when you have it, I will remove it, the size of the tgz file including tcpdump.log and strace is 12G. If you want I can provide also the same information without the vm.dirty_ratio value changed, it was 20 and now it is 10 (times for ls increase from 1-2 to 3-4,5... ) Thanks [1] http://gattaca.imppc.org/userspace/judith/logs_test_ls_nfs.tgz The lstat() is stalled for ~ 100 seconds: ----------------------------------- 10080 12:39:04.194023 lstat("/ibrix_sas/110317/prova.tar", {st_dev=makedev(0, 22), st_ino=383974370, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=32768, st_blocks=21558200, st_size=12082348032, st_atime=2011/03/17-12:38:57, st_mtime=2011/03/17-12:40:47, st_ctime=2011/03/17-12:40:47}) = 0 <103.767545> ----------------------------------- There were no GETATTRs in the tcpdump. I would expect to have seen at least one. If you have 64GiBs of RAM, 10% of that is still 6.4GiBs. That's a lot dirty pages to push over a network to the NFS server. Try setting /proc/sys/vm/dirty_bytes to 500000000 (500MB). See if that improves the ls -l latency. (In reply to comment #6) > The lstat() is stalled for ~ 100 seconds: > ----------------------------------- > 10080 12:39:04.194023 lstat("/ibrix_sas/110317/prova.tar", {st_dev=makedev(0, > 22), st_ino=383974370, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, > st_blksize=32768, st_blocks=21558200, st_size=12082348032, > st_atime=2011/03/17-12:38:57, st_mtime=2011/03/17-12:40:47, > st_ctime=2011/03/17-12:40:47}) = 0 <103.767545> > ----------------------------------- > There were no GETATTRs in the tcpdump. I would expect to have seen at least > one. > > If you have 64GiBs of RAM, 10% of that is still 6.4GiBs. That's a lot dirty > pages to push over a network to the NFS server. Try setting > /proc/sys/vm/dirty_bytes to 500000000 (500MB). See if that improves the ls -l > latency. Hello, thanks a lot for your suggestion, and forgive my late answer. That change definitely changed the behavior of the ls command, now it runs smoothly. The point now is that the copy takes more time, is this expected? I need to read more about the vm.dirty* parameters to understand what the implications of these changes are on the system. For a machine with 8Gb of ram, what are the values that I should use for the dirty_ratio and the dirty_bytes? proportional to the new ones or should I keep the default ones? The amount of waiting time for ls is smaller but if all the issue with the ls -l was about this values, why do I have latencies for ls when using this 8Gb machine and I don't have them with (In reply to comment #7) > That change definitely changed the behavior of the ls command, now it runs > smoothly. > The point now is that the copy takes more time, is this expected? Yes, when you copy the file to the NFS server, the data is copied into RAM first. By reducing /proc/sys/vm/dirty_bytes, you reduce the amount of RAM available to the copy, hence triggering the dirty pages to be flushed to the NFS server earlier. When cp closes the file, it does not guarantee that the dirty pages are flushed over the network to the NFS server. Therefore, though it may have looked like the copy finished earlier in your first test, it is likely that there were more dirty pages still in RAM compared to the second test. > For a machine with 8Gb of ram, what are the values that I should use for the > dirty_ratio and the dirty_bytes? You would need to tune that by doing a few experiments. There's no magic solution, it's more of a compromise between write latency Vs stat latency. I'm not sure if I'm right - but I expect that the nfs client code should look in the vm cache for infos about the file first before asking the nfs server about this file. If it doesn't then this is a source of systematic wrong informations about nfs files. No matter how low you set dirty_ratio - only zero guaranties right file information - what means not using vm cache for nfs files at all. Are my considerations right? (In reply to comment #9) Please see the Technical Notes of Bug 469848 - [RHEL5.2] nfs_getattr() hangs during heavy write workloads for a complete explanation. This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping |