Bug 624964
Summary: | Wasting lots of diskspace for /var/lib/yum/yumdb | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Zdenek Kabelac <zkabelac> |
Component: | yum | Assignee: | Seth Vidal <skvidal> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | fedora, ffesti, james.antill, maxamillion, pmatilai, tim.lauridsen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-18 14:21:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Zdenek Kabelac
2010-08-18 09:02:25 UTC
The latest yum (3.2.28) will now hardlink the information which is similar, you can see the best case for this with: hardlink -c /var/lib/yumdb ...which we _might_ start calling sparodically from yum. It's true that even after that we will store a 4 byte file using 4k, on ext4 ... but that problem goes away with btrfs. As for the rationale for why yumdb was designed/created as it was, feel free to read: http://yum.baseurl.org/wiki/YumDB IMHO wrong design is not excuse to keep it 'broken' forever. And I seriously doubt btrfs will solve such thing (at least not in any near future) - in fact if you track lkml - you might have noticed that current btrfs is actually extremely bad in storing lots of small files. Of course pushing users to use one specific filesystem to get decent performance from yum tools is also quite 'strange' idea.... Using plain simple easy to use text file at least for package - would significantly reduce bloat - and would be still complaint to the URL you are mentioning. And of course sqlite/libdb isn't the smartest/fastest DB for this purpose. I really think this issue deserves to be resolved differently than saying not a bug.... > IMHO wrong design is not excuse to keep it 'broken' forever. Did you read the page? The most common operations are: 1. Read one entry. 2. Write one entry. ...using a single file for all addon data for a package would make #2 into a read-modify-write cycle, would introduce a lot of complexity as we'd need to serial/unserialize all the data (and likely need to change the API so we could only do a single write for multiple updates), would likely make hand editing the data with vim/etc. hard (if not virtually impossible). And with 3.2.28, it will almost certainly make a bunch of operations _slower_ due to not only reading much more data and deserializing ... but the fact we can't skip the reads with the hardlinks. Given that, I'm happy with "wrong and broken". > And I seriously doubt btrfs will solve such thing You should let the btrfs developers know then, because they have explicitly told me that if you store 3 small files in a directory btrfs will store much closer to "wc -c" bytes than 12k. > Of course pushing users to use one specific filesystem to get > decent performance Here by "performance" you mean disk space. And, yeh, if users need to save 5¢ in disk space ... my recommendation is they look at btrfs. (In reply to comment #3) > > IMHO wrong design is not excuse to keep it 'broken' forever. > > Did you read the page? The most common operations are: Sure I did. > > 1. Read one entry. > > 2. Write one entry. > > ...using a single file for all addon data for a package would make #2 into a > read-modify-write cycle, would introduce a lot of complexity as we'd need to Well as I said - you could reuse lots of projects which already hide this complexity for you - so your tool will really read & write one entry - everything else stays hidden to you. It will be obviously faster then your current implementation - as only one block from filesystem will be taken. Eveything else would be resolved in userspace land (you somehow forget that application should not offload it's task to kernel - if it can do far better job in userspace) Noone else seems to be going in direction of storing 1 byte information in 1 file yet - as it is big waste of diskspace & CPU. And I hope you are not going to argue it's more safe - as I could keep this 30M database probably efficiently stored in less then 64KB - that gives extra space to keep nearly 500 copies of this db... > serial/unserialize all the data (and likely need to change the API so we could > only do a single write for multiple updates), would likely make hand editing > the data with vim/etc. hard (if not virtually impossible). > And with 3.2.28, it will almost certainly make a bunch of operations _slower_ > due to not only reading much more data and deserializing ... but the fact we > can't skip the reads with the hardlinks. > > Given that, I'm happy with "wrong and broken". It is so broken that I'm curios how you can't see that. Do you know how much data is written to SSD when you create those 7 small files? And how much CPU is wasted by looking at free space in filesystem. > > > And I seriously doubt btrfs will solve such thing > > You should let the btrfs developers know then, because they have explicitly > told me that if you store 3 small files in a directory btrfs will store much > closer to "wc -c" bytes than 12k. They already know: http://lkml.org/lkml/2010/6/3/313 > > Of course pushing users to use one specific filesystem to get > > decent performance > > Here by "performance" you mean disk space. And, yeh, if users need to save 5¢ > in disk space ... my recommendation is they look at btrfs. Well so far reiserfs is much more reliable in this case - but still it's so inefficient that it's hard to believe you want to keep it this was forever. Is it really that bad to reconsider to keep at least package dir in one file? |