Bug 783480

Summary: RFE optimize write updates
Product: [Fedora] Fedora Reporter: Zdenek Kabelac <zkabelac>
Component: rpmAssignee: Fedora Packaging Toolset Team <packaging-team>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aros, dmach, ffesti, jpokorny, pmatilai, rvokal
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-22 11:56:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Zdenek Kabelac 2012-01-20 15:19:02 UTC
Description of problem:

Since many users have SSD drives and mass updates seems to be generating GB updates where only a fraction of data are really changing - it might interesting feature to have an option to extract files to different disk (i.e. ramdisk, /dev/shm or whatever) and update only those files in the system where the md5 hash differs. For matching md5 only hardlinks could be used. If there would be not enough space to extract package, than it may fallback to regular extraction to the system.

It could have also a nice benefit for the users of rotational drives - where less file updates should be generated  thus speeding up whole update process - though it's just my assumption and only real-live test might prove something here.

Version-Release number of selected component (if applicable):
rpm-4.9.1.2-11.fc17.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Panu Matilainen 2012-01-23 08:44:13 UTC
Something like this has been suggested before... Avoiding writes on unchanged files would require rpm to calculate the digests (sha256 in fedora, not md5 btw) of all already installed files (often involving prelink undo) to see whether they should be replaced or not. This would generate an enormous amount of read io where none is currently needed.

It might be beneficial for systems where writes are very expensive though.

Comment 2 Zdenek Kabelac 2012-01-23 09:56:11 UTC
Read ops are pretty cheap on SSD - thus I wouldn't see it as a problem
and prefer this instead of pointless rewrites of SSD blocks with same data 
(since number of block rewrites have there limits).

Also - aren't there already some hash sums stored with each file from package
(at least I've though that's what  rpm -Va is doing - it compares all
files whether its sum is matching ('sha256').

BTW - on my system with  2433 packages with 4 year old Lenovo T61 (2.2GHz, 4GB) with some debug kernel options enabled.

time rpm -Va 

real	5m0.893s
user	2m8.382s
sys	1m27.364s

While today's rawhide upgrade on my machine with 1628 files took this:
(Upgrade 1628, Remove 2, Skip (deps) 25,  Download 1.3GB)
(BTW would be probably nice, if yum would remove installed file
immediately after successful installation, not after whole upgrade is 
completed, since 1.3G takes quite lot RAM when downloaded to ramdisk)
takes already over an hour and still unfinished...

And also I'm wiling to take a risk of using just DB stored sums
thus not doing a life system check of all stored files.
Since for most packages I don't really care that much - maybe
I'd like to be able to select list of 'core' package, where I'd like
to have hash check - and for the rest use DB stored sums.

Comment 3 Panu Matilainen 2012-01-23 10:24:49 UTC
Relying on the stored digests is not an option. If you dont see why then think some more.

Comment 4 Fedora Admin XMLRPC Client 2012-04-13 23:07:40 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 5 Fedora Admin XMLRPC Client 2012-04-13 23:11:05 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Panu Matilainen 2013-09-09 13:20:51 UTC
*** Bug 530284 has been marked as a duplicate of this bug. ***

Comment 7 Artem S. Tashkinov 2013-09-09 14:27:51 UTC
Yeah, a duplicate a 4 years old feature request, why not vice versa?

Comment 8 Jan PokornĂ˝ [poki] 2017-07-08 09:56:32 UTC
(Artem, I pretty much prefer temporal ordering preserved as well, but
sadly people occasionally, sometimes accidentally don't share this view,
we have to deal with it).

Thanks for making this idea happen (suggestion originators and PavlĂ­na):
https://github.com/rpm-software-management/rpm/commit/29c48e14de414ca512e404a2d773c3fcb3578040

In the Fedora context, will there be any further integration,
e.g. distro-driven switch of "_minimize_writes" macro where suitable
(per SSD drive detection)?

Comment 9 Panu Matilainen 2017-08-22 11:56:45 UTC
So, this is now in rawhide/F27 as of rpm >= 4.13.90.

Autodetection is a would-be-nice some day in the future, but not going to happen in 4.14.

Comment 10 Artem S. Tashkinov 2019-05-31 09:58:19 UTC
CC'ing Daniel Mach 'cause it looks like he's into it.

Comment 11 Artem S. Tashkinov 2019-05-31 09:58:58 UTC
And why was this bug closed again?