Bug 504579

Summary:	[RFE]: re-use hashes for files with the same device and inode numbers
Product:	[Fedora] Fedora	Reporter:	Daniel Mach <dmach>
Component:	coreutils	Assignee:	Ondrej Vasik <ovasik>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	low	Docs Contact:
Priority:	low
Version:	rawhide	CC:	kdudka, ovasik, twaugh
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-12-11 12:24:39 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Daniel Mach 2009-06-08 10:50:40 UTC

When md5sum is run on bunch of files, some of them might be hardlinked and I don't see any reason to compute hash on the same content again.

It should be possible to implement hash caching without significant code change:
- create a new function which will handle (dev, ino) -> hash cache
- call it instead of original function
- on cache hit, return cached hash
- otherwise call original function, compute hash, store it to cache and return

Comment 1 Ondrej Vasik 2009-06-08 14:31:21 UTC

It's quite easy to handle such thing with short wrapper shell script(just get info about file dev/inode (ls/find/stat/whatever) , sort it by device and inode, and call md5/shaxxxsum just for the case that device and inode differs from previous file (they are sorted, so you have hardlinks with same sums in the row, you just need to remember only last one file)). At the moment md5/shaxxxsum utitilities are not calling stat(2) - just fopen(3) - and in file desriptor structure there is AFAIK no information about dev/inode of the file - so IMHO adding cache/stat/dynamicmemoryallocation will make compact code of md5sum.c much more difficult to read for quite a small benefit for common users. I'll check the upstream opinion before working on it...