Bug 529948
Summary: | rpm kills the machine with fsync | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pierre Ossman <pierre-bugzilla> | ||||
Component: | rpm | Assignee: | Panu Matilainen <pmatilai> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 12 | CC: | alice_knoll_drouin, ffesti, jnovy, n3npq, pmatilai | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-12-04 07:25:46 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Pierre Ossman
2009-10-20 20:33:35 UTC
Add nofsync to your rpmdb configuration. grep fsync /usr/lib/rpm/macros to see how to do that. This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping *** Bug 600716 has been marked as a duplicate of this bug. *** Created attachment 425355 [details]
sudo yum update trace on USB-Flash Drive
Attached is the trace during updating fedora packages on the USB-Flash Drive. Downloading packages time is normal. Updating each package is very slow. Cleanup each package is very slow.
The total time to update and cleanup the forty packages on the USB-Flash Drive was thirty-five minutes
As a comparison, updating and cleanup similar forty packages onto the Hard Disk took just eleven minutes. It is apparent that the same update process is at least three time longer when updating packages on the USB-Flash Drive.
It was also observed that other functions like starting a file browser, browsing on Firefox, entering a command line, saving an edited file and others had a slow response and may freeze for more than ten seconds when USB-Flash Drive packages updates is in progress.
Please note that updating Fedora 12 packages on USB-Flash Drive was very very very slow, to the point that it is not acceptable. Updating and cleanup forty packages in Fedora 12 on USB-Flash Drive may take over five hours.
Noone disputes what you are seeing on a USB flash drive. What isn't clear is how to fix the issue because the root cause cannot be seen in a yum trace. All that can be confirmed from your observations is that indeed, USB operations are S-L-O-O-W, and noone is disputing that at all. There are two "likely" causes of slow, each with different "cures": 1) redundant database operations sync'd to disk 2) file system operations. What is needed is some way to guage which is the S-L-O-W-E-R issue. Something like a strace (which WILL be quite large and slow, run on a single, not 40, package upgrade) is needed with timestamps on the system calls. It will be possible to tell from context whether its database reads/writes or file system operations that are the root cause. Note that there's not much that can be done by RPM to "cure" 2) file system operations. An RPM installer MUST read and write content. There is a chance that stat(2) is the culprit, and there's a chance that some stat(2) operations can be avoided. But there's no way to tell until the S-L-O-O-W syscalls are identified in context. If fdatasync is the issue (as hinted in the original comment) that might be avoided in RPM. But I/O can only be delayed, not avoided, when writing content onto the file system. So all that can be done is to avoid fdatasync, which will mean that I/O will likely proceed after both yum (and rpmlib) have stopped executing. Again only a strace can tell the root cause. I'm just guessing based on the available info. I totally agree that finding the exact root cause is the only way to resolve the problem. What is needed is to identify, isolate and determine the root cause. Maybe a new idea to find the problem is needed. 1. Injecting a code segment to test the read/write cycle. 2. An established bug trace platform like backtrace or abrt. ------------------------------ I have read from somewhere in wiki that the USB Flash Drive functions, especially the read/write, is slow because the Linux kernel did not reclaim the redundant disk/storage space. ------------------------------ Fedora on USB-Flash Drive is an extremely useful feature. 1. Anyone can boot up Fedora on most machine (i386, x86_64, PowerPC) located anywhere in the world by just carrying USB-Flash Drives. 2. USB-Flash Drives are useful for testing as any Fedora version can be installed and updated and can be booted up on most machine without disturbing the original OS like Windows Vista or Unix. 3. USB-Flash Drives are useful to prepare Fedora to run on the up and coming Solid-state drives. ------------------------------ The following has been observed for USB-Flash Drives. 1. Installing Fedora 11, Fedora 12 and Fedora 13 took more than five hours. 2. Updating Fedora 11 and Fedora 12 is very very very slow. Updating 100 packages many take more than 12 hours. There is an improvement in Fedora 13. It is still slow and it takes three times longer compare to updating onto Hard Disk. 3. Opening a File Browser, entering a command line, scrolling up and down a Firefox web page, and others were very slow. The operations may not response or may freeze for more than one minute to the point that it is not usable. There is a slight improvement in Fedora 13. ------------------------------ The slow response on USB-Flash Drives may be all interconnected. Could it be the display drivers, the buses, the interfaces (USB 2.0), unclaimed resources during the read/write cycles, etc? Perhaps, a global effort of the whole Fedora/Linux team is need to get to the root cause. ------------------------------ A very specific testing procedure is helpful. The following is an example of the specific instructions from a Red Hat developer. "Can you try Fedora 12 yum --enablerepo=updates-testing upgrade xorg-x11-drv-ati plymouth libdrm mesa-libGL xorg-x11-server-Xorg?" @Jeff Johnson, I will execute the test procedures, even if it takes more than 12 hours, after you give the specific instructions. ------------------------------ FYI: rpm (but not yum) is heavily insturmented. Try adding --stats to a package install. Then try similar on USB drives. ============================================================ # Installing onto Hard Disk $ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.x86_64.rpm total: 1 0.000000 MB 32.474503 secs check: 1 0.000000 MB 0.076866 secs order: 1 0.000000 MB 0.000038 secs fingerprint: 3 0.000867 MB 0.044960 secs install: 1 0.000000 MB 26.645644 secs scriptlets: 1 0.000000 MB 8.570231 secs uncompress: 6542 53.586176 MB 5.690441 secs digest: 3575 54.550768 MB 0.693563 secs dbadd: 1 0.000000 MB 7.159725 secs dbget: 2833 31.119184 MB 8.599861 secs dbput: 1892 12.432032 MB 0.132041 secs ============================================================ # Installing onto USB-Flash Drive $ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.i686.rpm total: 1 0.000000 MB 63.744582 secs check: 1 0.000000 MB 0.977819 secs order: 1 0.000000 MB 0.000039 secs fingerprint: 3 0.000867 MB 0.006459 secs install: 1 0.000000 MB 55.877518 secs scriptlets: 1 0.000000 MB 1.652308 secs uncompress: 6403 52.446984 MB 28.479775 secs digest: 3537 53.401696 MB 1.217264 secs dbadd: 1 0.000000 MB 0.215415 secs dbget: 2845 33.292856 MB 0.227921 secs dbput: 1908 13.815620 MB 0.041771 secs ============================================================ Thank you for the numbers. Here's what I can see (which "calibrates" my faster/slower expectations): 1) flash drives are ~2x slower than hard disks (see the "total" numbers) 2) its _NOT_ rpmdb syncing or redundancy that is to blame (see dbadd/dbget/dbput entries. Your anomalous results are due to "warm cache" != "cold cache" and other peripheral issues, the numbers are approximately equal, as they should be, depending on where you have chosen to place the rpmdb). 3) installs (where most of the syscall issues are located) are ~2x slower. OTOH, "uncompress" (which is likely summed into install, don't get tricked) is more than 4x slower. That is already useful info to direct attention to where a fix might be attempted. The "fix" that needs to be done is likely this: RPM goes to some lengths to ensure that installs (on hard disks) are high performing. Specific to the "install" data reported, RPM attempts to decompress files directly onto the file system using am mmap(2) allocation buffer for 1 decompression operation for the _ENTIRE_ file, and also has some addition speedup's to trigger "write behind" I/O onto a hard disk using madvise(2) MADV_DONTNEED. I suspect that what is the root cause of S-L-O-O-W RPM is that the hard disk optimizations don't "work" on flash file systems because mmap(2) and madvise(2) almost certainly have a dramatically different implementation. Lucklly lib/fsm.c already is insturmenetd to just write buffers to the file system, the "good old way" if/when mmap(2) isn't available or otherwise fails to "work". The "fix" should _NOT_ be to just rip out the mmap(2) optimazations which re/were significant. (don't worry if all of the above is meaningless. my role here is to analyze your data and suggest the "fix" for RPM.) The update process for Fedora 11 and Fedora 12 was observed to be very very very slow. The following has been done for comparison. ============================================================ ### Installing Fedora 12 package onto USB-Flash Drive $ sudo rpm -i --stats thunderbird-3.0.4-2.fc12.i686.rpm total: 1 0.000000 MB 252.003117 secs check: 1 0.000000 MB 0.197098 secs order: 1 0.000000 MB 0.000095 secs fingerprint: 3 0.000866 MB 0.024173 secs install: 1 0.000000 MB 247.055594 secs scriptlets: 1 0.000000 MB 8.049387 secs uncompress: 6327 51.825564 MB 13.795798 secs digest: 3516 52.968118 MB 1.114657 secs dbadd: 1 0.000000 MB 142.680015 secs dbget: 2908 31.777704 MB 4.100888 secs dbput: 1984 13.532812 MB 0.080660 secs ============================================================ ### Installing Fedora 11 package onto USB-Flash Drive $ sudo rpm -i --stats thunderbird-3.0.4-2.fc11.i586.rpm total: 1 0.000000 MB 205.405740 secs check: 1 0.000000 MB 0.386119 secs order: 1 0.000000 MB 0.000075 secs fingerprint: 3 0.000866 MB 0.030944 secs install: 1 0.000000 MB 197.630207 secs scriptlets: 1 0.000000 MB 11.477078 secs uncompress: 6419 52.578096 MB 0.779961 secs digest: 3532 53.660407 MB 1.192857 secs dbadd: 1 0.000000 MB 146.210606 secs dbget: 2880 28.799172 MB 30.965261 secs dbput: 1973 12.247520 MB 32.985469 secs ============================================================ These numbers cannot be compared to a hard disk install and show significant time accessing an rpmdb. What is different in how you measured? Note that varying the package used as input is unlikely to have any significance (there are changes in the file digest used from MD5 -> SHA256 which is predictably slower). What's likely different is that you used different rpm versions. Yes there are changes in how/when an rpmdb is accessed between F13 and F12 and F11. But if you want a fix for RPM going forward, you need to start identifying (at least) what rpm version is in use and control for where the rpmdb is located so that the root cause can be identified. See comment #5 for the two general areas that may be related. There are tunables for improving rpmdb performance (see comment #1 which disables synchronous writes, there's another bug report for tuning rpmdb read performance whioch is quite a bit mor complicated). ============================================================ I had F11 installed on a 8GB USB-Flash Drive, F12 installed on the another 16GB USB-Flash Drive and F13 installed in the next 16GB USB-Flash Drive. F11, F12, F13 was booted up as each completely different Fedora version on the same USB port of the same x86_64 Dell Server from three different HP v135w USB-Flash Drives. $ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.x86_64.rpm >>>>>>>>>> Measured on Fedora-13-x86_64-DVD booted on the Dell PowerEdge 1800 Server Hard Disk. $ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.i686.rpm >>>>>>>>>> Measured on Fedora-13-i386-DVD booted from a USB-Flash Drive attached on the same Dell PowerEdge 1800 USB port. $ sudo rpm -i --stats thunderbird-3.0.4-2.fc12.i686.rpm >>>>>>>>>> Measured on Fedora-12-i386-DVD booted from another USB-Flash Drive attached on the same Dell PowerEdge 1800 USB port. $ sudo rpm -i --stats thunderbird-3.0.4-2.fc11.i586.rpm >>>>>>>>>> Measured on Fedora-11-i386-DVD booted from another USB-Flash Drive attached on the same Dell PowerEdge 1800 USB port. ============================================================ $ grep %__dbi_rebuild /usr/lib/rpm/macros %__dbi_rebuild nofsync !log !txn !cdb grep operation on F13 Hard Disk, F13 USB-Flash Drive, F12 USB-Flash Drive and F11 USB-Flash Drive. It is assumed that the fsync function has already been disabled by default in all the Fedora versions under test as all the four grep operations return the '%__dbi_rebuild nofsync !log !txn !cdb' from the /usr/lib/rpm/macros file. ============================================================ F11 and F12 were measured to give more --stats into the problem, that is, to compare the code changes between F11, F12 and F13 on this specific rpm problem. F11 and F12 have many other USB-Flash Drive problems. Solving one may create another. It may be more appropriate to target all USB-Flash Drive problems from F13 onwards. ============================================================ Is is possible for me to do an strace on my machine? If you give the exact instructions to run strace, I will do it. ============================================================ This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |