504579 – [RFE]: re-use hashes for files with the same device and inode numbers

Bug 504579 - [RFE]: re-use hashes for files with the same device and inode numbers

Summary: [RFE]: re-use hashes for files with the same device and inode numbers

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	coreutils
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Ondrej Vasik
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-06-08 10:50 UTC by Daniel Mach
Modified:	2012-12-11 12:24 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-12-11 12:24:39 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Daniel Mach 2009-06-08 10:50:40 UTC

When md5sum is run on bunch of files, some of them might be hardlinked and I don't see any reason to compute hash on the same content again.

It should be possible to implement hash caching without significant code change:
- create a new function which will handle (dev, ino) -> hash cache
- call it instead of original function
- on cache hit, return cached hash
- otherwise call original function, compute hash, store it to cache and return

Comment 1 Ondrej Vasik 2009-06-08 14:31:21 UTC

It's quite easy to handle such thing with short wrapper shell script(just get info about file dev/inode (ls/find/stat/whatever) , sort it by device and inode, and call md5/shaxxxsum just for the case that device and inode differs from previous file (they are sorted, so you have hardlinks with same sums in the row, you just need to remember only last one file)). At the moment md5/shaxxxsum utitilities are not calling stat(2) - just fopen(3) - and in file desriptor structure there is AFAIK no information about dev/inode of the file - so IMHO adding cache/stat/dynamicmemoryallocation will make compact code of md5sum.c much more difficult to read for quite a small benefit for common users. I'll check the upstream opinion before working on it...

Note You need to log in before you can comment on or make changes to this bug.