When md5sum is run on bunch of files, some of them might be hardlinked and I don't see any reason to compute hash on the same content again. It should be possible to implement hash caching without significant code change: - create a new function which will handle (dev, ino) -> hash cache - call it instead of original function - on cache hit, return cached hash - otherwise call original function, compute hash, store it to cache and return
It's quite easy to handle such thing with short wrapper shell script(just get info about file dev/inode (ls/find/stat/whatever) , sort it by device and inode, and call md5/shaxxxsum just for the case that device and inode differs from previous file (they are sorted, so you have hardlinks with same sums in the row, you just need to remember only last one file)). At the moment md5/shaxxxsum utitilities are not calling stat(2) - just fopen(3) - and in file desriptor structure there is AFAIK no information about dev/inode of the file - so IMHO adding cache/stat/dynamicmemoryallocation will make compact code of md5sum.c much more difficult to read for quite a small benefit for common users. I'll check the upstream opinion before working on it...