Bug 529948

Summary: rpm kills the machine with fsync
Product: [Fedora] Fedora Reporter: Pierre Ossman <pierre-bugzilla>
Component: rpmAssignee: Panu Matilainen <pmatilai>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: alice_knoll_drouin, ffesti, jnovy, n3npq, pmatilai
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-04 07:25:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sudo yum update trace on USB-Flash Drive none

Description Pierre Ossman 2009-10-20 20:33:35 UTC
I upgraded one of my machines from F10 to Rawhide, and during this I notice that rpm tends to be extremely excessive in its use of fsync. The machine uses a fairly average flash disk, and on this it took over 12 hours to remove a couple of hundred packages. Doing an strace reveals that rpm keeps calling fdatasync() constantly, something that completely kills the performance.

Comment 1 Jeff Johnson 2009-10-21 01:28:13 UTC
Add nofsync to your rpmdb configuration.

grep fsync /usr/lib/rpm/macros to see how to do that.

Comment 2 Bug Zapper 2009-11-16 13:55:40 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Panu Matilainen 2010-06-07 06:33:49 UTC
*** Bug 600716 has been marked as a duplicate of this bug. ***

Comment 4 Alice K 2010-06-19 16:51:44 UTC
Created attachment 425355 [details]
sudo yum update trace on USB-Flash Drive


Attached is the trace during updating fedora packages on the USB-Flash Drive.  Downloading packages time is normal.  Updating each package is very slow.  Cleanup each package is very slow.

The total time to update and cleanup the forty packages on the USB-Flash Drive was thirty-five minutes

As a comparison, updating and cleanup similar forty packages onto the Hard Disk took just eleven minutes.  It is apparent that the same update process is at least three time longer when updating packages on the USB-Flash Drive.

It was also observed that other functions like starting a file browser, browsing on Firefox, entering a command line, saving an edited file and others had a slow response and may freeze for more than ten seconds when USB-Flash Drive packages updates is in progress.

Please note that updating Fedora 12 packages on USB-Flash Drive was very very very slow, to the point that it is not acceptable.  Updating and cleanup forty packages in Fedora 12 on USB-Flash Drive may take over five hours.

Comment 5 Jeff Johnson 2010-06-19 17:09:11 UTC
Noone disputes what you are seeing on a USB flash drive.

What isn't clear is how to fix the issue because the root cause
cannot be seen in a yum trace. All that can be confirmed
from your observations is that indeed, USB operations
are S-L-O-O-W, and noone is disputing that at all.

There are two "likely" causes of slow, each with different "cures":

1) redundant database operations sync'd to disk

2) file system operations.

What is needed is some way to guage which is the S-L-O-W-E-R
issue.

Something like a strace (which WILL be quite large and slow, run on 
a single, not 40, package upgrade) is needed with timestamps on the
system calls. It will be possible to tell from context whether its
database reads/writes or file system operations that are the root cause.

Note that there's not much that can be done by RPM to "cure" 2) file system
operations. An RPM installer MUST read and write content. There is a chance
that stat(2) is the culprit, and there's a chance that some stat(2) operations
can be avoided.

But there's no way to tell until the S-L-O-O-W syscalls are identified in context.

Comment 6 Jeff Johnson 2010-06-19 17:13:22 UTC
If fdatasync is the issue (as hinted in the original comment) that might
be avoided in RPM. But I/O can only be delayed, not avoided, when writing
content onto the file system. So all that can be done is to avoid
fdatasync, which will mean that I/O will likely proceed after
both yum (and rpmlib) have stopped executing.

Again only a strace can tell the root cause. I'm just guessing based on the
available info.

Comment 7 Alice K 2010-06-20 06:41:57 UTC
I totally agree that finding the exact root cause is the only way to resolve the problem.

What is needed is to identify, isolate and determine the root cause.  Maybe a new idea to find the problem is needed.

1.  Injecting a code segment to test the read/write cycle.

2.  An established bug trace platform like backtrace or abrt.

------------------------------

I have read from somewhere in wiki that the USB Flash Drive functions, especially the read/write, is slow because the Linux kernel did not reclaim the redundant disk/storage space.

------------------------------

Fedora on USB-Flash Drive is an extremely useful feature.

1.  Anyone can boot up Fedora on most machine (i386, x86_64, PowerPC) located anywhere in the world by just carrying USB-Flash Drives.

2.  USB-Flash Drives are useful for testing as any Fedora version can be installed and updated and can be booted up on most machine without disturbing the original OS like Windows Vista or Unix.

3.  USB-Flash Drives are useful to prepare Fedora to run on the up and coming Solid-state drives.

------------------------------

The following has been observed for USB-Flash Drives.

1.  Installing Fedora 11, Fedora 12 and Fedora 13 took more than five hours.

2.  Updating Fedora 11 and Fedora 12 is very very very slow.  Updating 100 packages many take more than 12 hours.  There is an improvement in Fedora 13.  It is still slow and it takes three times longer compare to updating onto Hard Disk.

3.  Opening a File Browser, entering a command line, scrolling up and down a Firefox web page, and others were very slow.  The operations may not response or may freeze for more than one minute to the point that it is not usable.  There is a slight improvement in Fedora 13.

------------------------------

The slow response on USB-Flash Drives may be all interconnected.  Could it be the display drivers, the buses, the interfaces (USB 2.0), unclaimed resources during the read/write cycles, etc?  Perhaps, a global effort of the whole Fedora/Linux team is need to get to the root cause.

------------------------------

A very specific testing procedure is helpful.  The following is an example of the specific instructions from a Red Hat developer.

"Can you try Fedora 12 yum --enablerepo=updates-testing upgrade xorg-x11-drv-ati plymouth libdrm mesa-libGL xorg-x11-server-Xorg?"

@Jeff Johnson,

I will execute the test procedures, even if it takes more than 12 hours, after you give the specific instructions.

------------------------------

Comment 8 Jeff Johnson 2010-06-20 11:50:24 UTC
FYI: rpm (but not yum) is heavily insturmented.

Try adding --stats to a package install.

Then try similar on USB drives.

Comment 9 Alice K 2010-06-21 17:44:42 UTC
============================================================

# Installing onto Hard Disk


$ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.x86_64.rpm
   total:             1      0.000000 MB     32.474503 secs
   check:             1      0.000000 MB      0.076866 secs
   order:             1      0.000000 MB      0.000038 secs
   fingerprint:       3      0.000867 MB      0.044960 secs
   install:           1      0.000000 MB     26.645644 secs
   scriptlets:        1      0.000000 MB      8.570231 secs
   uncompress:     6542     53.586176 MB      5.690441 secs
   digest:         3575     54.550768 MB      0.693563 secs
   dbadd:             1      0.000000 MB      7.159725 secs
   dbget:          2833     31.119184 MB      8.599861 secs
   dbput:          1892     12.432032 MB      0.132041 secs


============================================================

# Installing onto USB-Flash Drive


$ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.i686.rpm
   total:             1      0.000000 MB     63.744582 secs
   check:             1      0.000000 MB      0.977819 secs
   order:             1      0.000000 MB      0.000039 secs
   fingerprint:       3      0.000867 MB      0.006459 secs
   install:           1      0.000000 MB     55.877518 secs
   scriptlets:        1      0.000000 MB      1.652308 secs
   uncompress:     6403     52.446984 MB     28.479775 secs
   digest:         3537     53.401696 MB      1.217264 secs
   dbadd:             1      0.000000 MB      0.215415 secs
   dbget:          2845     33.292856 MB      0.227921 secs
   dbput:          1908     13.815620 MB      0.041771 secs


============================================================

Comment 10 Jeff Johnson 2010-06-21 17:56:07 UTC
Thank you for the numbers.

Here's what I can see (which "calibrates" my faster/slower expectations):

1) flash drives are ~2x slower than hard disks (see the "total" numbers)

2) its _NOT_ rpmdb syncing or redundancy that is to blame (see dbadd/dbget/dbput
entries. Your anomalous results are due to "warm cache" != "cold cache" and other
peripheral issues, the numbers are approximately equal, as they should be,
depending on where you have chosen to place the rpmdb).

3) installs (where most of the syscall issues are located) are ~2x slower.
OTOH, "uncompress" (which is likely summed into install, don't get tricked)
is more than 4x slower.

That is already useful info to direct attention to where a fix
might be attempted.

Comment 11 Jeff Johnson 2010-06-21 18:07:26 UTC
The "fix" that needs to be done is likely this:

RPM goes to some lengths to ensure that installs (on hard disks)
are high performing.

Specific to the "install" data reported, RPM attempts to decompress
files directly onto the file system using am mmap(2) allocation
buffer for 1 decompression operation for the _ENTIRE_ file,
and also has some addition speedup's to trigger "write behind"
I/O onto a hard disk using madvise(2) MADV_DONTNEED.

I suspect that what is the root cause of S-L-O-O-W RPM is that
the hard disk optimizations don't "work" on flash file systems
because mmap(2) and madvise(2) almost certainly have a dramatically
different implementation.

Lucklly lib/fsm.c already is insturmenetd to just write buffers to
the file system, the "good old way" if/when mmap(2) isn't available
or otherwise fails to "work". The "fix" should _NOT_ be to just
rip out the mmap(2) optimazations which re/were significant.

(don't worry if all of the above is meaningless. my role here is to analyze your
data and suggest the "fix" for RPM.)

Comment 12 Alice K 2010-06-22 11:29:29 UTC
The update process for Fedora 11 and Fedora 12 was observed to be very very very slow.  The following has been done for comparison.


============================================================

### Installing Fedora 12 package onto USB-Flash Drive


$ sudo rpm -i --stats thunderbird-3.0.4-2.fc12.i686.rpm
   total:             1      0.000000 MB    252.003117 secs
   check:             1      0.000000 MB      0.197098 secs
   order:             1      0.000000 MB      0.000095 secs
   fingerprint:       3      0.000866 MB      0.024173 secs
   install:           1      0.000000 MB    247.055594 secs
   scriptlets:        1      0.000000 MB      8.049387 secs
   uncompress:     6327     51.825564 MB     13.795798 secs
   digest:         3516     52.968118 MB      1.114657 secs
   dbadd:             1      0.000000 MB    142.680015 secs
   dbget:          2908     31.777704 MB      4.100888 secs
   dbput:          1984     13.532812 MB      0.080660 secs


============================================================

### Installing Fedora 11 package onto USB-Flash Drive


$ sudo rpm -i --stats thunderbird-3.0.4-2.fc11.i586.rpm
   total:             1      0.000000 MB    205.405740 secs
   check:             1      0.000000 MB      0.386119 secs
   order:             1      0.000000 MB      0.000075 secs
   fingerprint:       3      0.000866 MB      0.030944 secs
   install:           1      0.000000 MB    197.630207 secs
   scriptlets:        1      0.000000 MB     11.477078 secs
   uncompress:     6419     52.578096 MB      0.779961 secs
   digest:         3532     53.660407 MB      1.192857 secs
   dbadd:             1      0.000000 MB    146.210606 secs
   dbget:          2880     28.799172 MB     30.965261 secs
   dbput:          1973     12.247520 MB     32.985469 secs


============================================================

Comment 13 Jeff Johnson 2010-06-22 12:55:24 UTC
These numbers cannot be compared to a hard disk install
and show significant time accessing an rpmdb.

What is different in how you measured?

Note that varying the package used as input is unlikely
to have any significance (there are changes in the file digest
used from MD5 -> SHA256 which is predictably slower).

Comment 14 Jeff Johnson 2010-06-22 13:36:09 UTC
What's likely different is that you used different rpm versions. Yes there
are changes in how/when an rpmdb is accessed between F13 and F12 and F11.

But if you want a fix for RPM going forward, you need to start identifying (at least)
what rpm version is in use and control for where the rpmdb is located so
that the root cause can be identified.

See comment #5 for the two general areas that may be related. There are
tunables for improving rpmdb performance (see comment #1 which disables
synchronous writes, there's another bug report for tuning rpmdb read performance
whioch is quite a bit mor complicated).

Comment 15 Alice K 2010-06-23 08:08:10 UTC
============================================================    

I had F11 installed on a 8GB USB-Flash Drive, F12 installed on the another 16GB USB-Flash Drive and F13 installed in the next 16GB USB-Flash Drive.  F11, F12, F13 was booted up as each completely different Fedora version on the same USB port of the same x86_64 Dell Server from three different HP v135w USB-Flash Drives.


$ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.x86_64.rpm
>>>>>>>>>> Measured on Fedora-13-x86_64-DVD booted on the Dell PowerEdge 1800 Server Hard Disk.


$ sudo rpm -i --stats thunderbird-3.0.4-3.fc13.i686.rpm
>>>>>>>>>> Measured on Fedora-13-i386-DVD booted from a USB-Flash Drive attached on the same Dell PowerEdge 1800 USB port.


$ sudo rpm -i --stats thunderbird-3.0.4-2.fc12.i686.rpm
>>>>>>>>>> Measured on Fedora-12-i386-DVD booted from another USB-Flash Drive attached on the same Dell PowerEdge 1800 USB port.


$ sudo rpm -i --stats thunderbird-3.0.4-2.fc11.i586.rpm
>>>>>>>>>> Measured on Fedora-11-i386-DVD booted from another USB-Flash Drive attached on the same Dell PowerEdge 1800 USB port.


============================================================    

$ grep %__dbi_rebuild /usr/lib/rpm/macros
%__dbi_rebuild			nofsync !log !txn !cdb


grep operation on F13 Hard Disk, F13 USB-Flash Drive, F12 USB-Flash Drive and F11 USB-Flash Drive.  It is assumed that the fsync function has already been disabled by default in all the Fedora versions under test as all the four grep operations return the '%__dbi_rebuild  nofsync !log !txn !cdb' from the /usr/lib/rpm/macros file.


============================================================    

F11 and F12 were measured to give more --stats into the problem, that is, to compare the code changes between F11, F12 and F13 on this specific rpm problem.  F11 and F12 have many other USB-Flash Drive problems.  Solving one may create another.  It may be more appropriate to target all USB-Flash Drive problems from F13 onwards.


============================================================    

Is is possible for me to do an strace on my machine?

If you give the exact instructions to run strace, I will do it.


============================================================

Comment 16 Bug Zapper 2010-11-04 09:20:09 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Bug Zapper 2010-12-04 07:25:46 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.