Bug 1915976
| Summary: | DNF/RPM Copy on Write enablement for all variants | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Ben Cotton <bcotton> |
| Component: | Changes Tracking | Assignee: | Matthew Almond <malmond> |
| Status: | ASSIGNED --- | QA Contact: | |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 39 | CC: | amoloney, chantr4, davide, dustymabe, fedoraproject, Hi-Angel, malmond, richardphibel, thrcka, walters |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1919003, 1922920 | ||
| Bug Blocks: | 2158243 | ||
|
Description
Ben Cotton
2021-01-13 20:58:19 UTC
Here's my plan 1. Fix the simple stuff in librepo PR 2. Re-write my super trivial dnf python plugin as a libdnf plugin. This eliminates a new top level package from Fedora. It will be a sub-package. 3. Prototype signature/digest verification during transcode (rpm2extents) in rpm. I think this is challenging, but not impossible. 4. Address other issues in rpm PR 5. Produce PRs for the src.fedoraproject.org/rpms packages. This is simple spec changes, and patches derived from PRs. Depending on rate of updates, I might get some or all of the upstream PRs accepted/merged. I expect each will need some kind of version bump so we can update rawhide 6. Performance numbers. I have another patch for rpm that adds a "measure" plugin. It's super hacky and need a dnf plugin to collect the values. I aim to get some public numbers 7. I want to make the measure tool usable for others, so we go beyond "trust me" to something independently verifiable. I am aiming for end of January for all this, and I realize that this is insanely optimistic. For visibility, I'm giving a talk at CentOS Dojo 2021 @ Fosdem (https://hopin.com/events/centos-dojo-fosdem) on Feb 5th which is a 45 minute presentation. I aim to cover what we've got so far, and (more interestingly) what I think we can do next. 1. done 2. I thought I could avoid creating a new package in Fedora by contributing a change to libdnf. Turns out this is not the right approach. The right approach for libdnf is to use seperate sources and reference libdnf-devel. Add to this the complexity that CentOS doesn't ship with -devel packages, and the complexity is much higher than I anticipated. My plan now: I *will* create a new package in Fedora, nominally called 'dnf-plugin-cow' that builds python3-dnf-plugin-cow (naming convention), and later will build libdnf-plugin-cow. 3. I will switch focus to this - I have partial / code. I need to spend more time on it. Proposal authors say: > Ballpark performance difference is about half the duration for file download+install time It sounds all good, the problem is, reflinks are not necessarily faster than plain copy, so seeing actual numbers would be helpful. Incidentally, just yesterday I benchmarked ccache with and without reflinks¹, and I found out that reflinks/CoW was consistently 30 *slower* in building libinput than plain copy. Of course `ccache` is not dnf, and since I'm no ccache dev, I can't know for sure that it isn't because ccache screwed up their reflinks usage badly. I doubt it though, in part because the benchmark also shows 30% less CPU usage, so apparently the excess time is spent doing IO, which is likely inside BTRFS. Anyway, that's the reasoning behind the question on actual numbers. 1: https://github.com/ccache/ccache/issues/213#issuecomment-762714286 I wrote an excellent reply, and I just lost it all due to a form-repost action in Bugzilla. Gah. I'm going to summarize what I remember. > 1. Fix the simple stuff in librepo PR Done. Now we're pending PR1470 on rpm before this is accepted. > 2. Re-write my super trivial dnf python plugin as a libdnf plugin. This eliminates a new top level package from Fedora. It will be a sub-package. See https://github.com/facebookincubator/dnf-plugin-cow/issues/1 > 3. Prototype signature/digest verification during transcode (rpm2extents) in rpm. I think this is challenging, but not impossible. There's two concerns: ## DoS of disk space on client There's a potential DoS exploit where a rogue mirror could feed valid data into a compressor that could fill up the local storage during download. This sounds bad, but after looking at rpmfiArchiveReadToFilePsm() I see the header's idea of size is honored, so we only need to verify the header, not the header+payload. ### Potential of a Remote Code Execution (RCE) exploit on a vulnerable decompression library. I've been experimenting with conditional transcoding dependant on file size. Files under a limit (e.g. 64MB?) can be buffered in memory, revealing the full file digest. It's possible to employ all the checks. The main restriction is that signatures with trusted key should not be transcoded. If a given file is over the limit, it is not transcoded. The bit that sucks here is that larger rpms are the ones that benefit most from transcoding. > 4. Address other issues in rpm PR https://github.com/rpm-software-management/rpm/pull/1470/commits > 5. Produce PRs for the src.fedoraproject.org/rpms packages. This is simple spec changes, and patches derived from PRs. Depending on rate of updates, I might get some or all of the upstream PRs accepted/merged. I expect each will need some kind of version bump so we can update rawhide I've got forks of rpm, librepo which I need to keep in sync with the PRs. I've made a COPR repo for testing on f33: https://copr.fedorainfracloud.org/coprs/malmond/rpmcow/ > 6. Performance numbers. I have another patch for rpm that adds a "measure" plugin. It's super hacky and need a dnf plugin to collect the values. I aim to get some public numbers > 7. I want to make the measure tool usable for others, so we go beyond "trust me" to something independently verifiable. The goal is to satisfy: https://fedoraproject.org/wiki/Changes/RPMCoW#Performance_Metrics . I've added a new section https://fedoraproject.org/wiki/Changes/RPMCoW#update_2021-01-31 to explain progress. I'm going to use bug 1922920 to track this work. I've been communicating with the maintainer of RPM on the pull request and it's become clear that this likely depends on the creation of a public, supportable API for RPM. This is not achievable within the window for Fedora 34, so I'm withdrawing the change for Fedora 34 at this time. I will continue to work on this, and expect to re-submit for Fedora 35. [1] https://github.com/rpm-software-management/rpm/pull/1470#issuecomment-772410935 (In reply to Matthew Almond from comment #5) > I'm withdrawing the change for Fedora 34 at this time. I will > continue to work on this, and expect to re-submit for Fedora 35. You don't need to withdraw it if you don't want to. We can defer it to F35. However, if you think the end result will be significantly different from your F34 proposal, then it's better to withdraw and resubmit at a later time. I'll let you decide which approach is more appropriate. Didn't know that was an option here[1]. I do prefer to defer - the substance of the change proposal doesn't change, just some of the implementation details. [1] based on https://docs.fedoraproject.org/en-US/program_management/changes_policy/ Okay, I'll update the appropriate trackers, etc. I'll also add improving that documentation to explicitly address deferring changes. Deferring to F36 per malmond in chat. It'd be clearer to call this change something like "DNF/RPM Copy on Write" i.e. drop the "for all variants", since it doesn't apply for rpm-ostree based systems. This bug appears to have been reported against 'rawhide' during the Fedora Linux 36 development cycle. Changing version to 36. Today we reached the Code Complete (testable) milestone in the F36 schedule: https://fedorapeople.org/groups/schedule/f-36/f-36-key-tasks.html All code for this change should be complete enough for testing. You can indicate this by setting the bug status to MODIFIED. (If the code is fully complete, you can go ahead and set it to ON_QA.) If you need to defer this Change to F37, please needinfo bcotton. Work has been resumed on RPMCoW, progress made, but this is definitely not on schedule for F36. Let's defer to F37. Will do, thanks! Today we reached the Code Complete (Testable) milestone on the F37 schedule: https://fedorapeople.org/groups/schedule/f-37/f-37-key-tasks.html At this time, all F37 Changes should be complete enough to be testable. You can indicate this by setting this tracker to the MODIFIED status. If the Change is 100% code complete, you can set the tracker to ON_QA. If you need to defer this Change to F38, please NEEDINFO me. Changes that have not reached at least the MODIFIED status will be given to FESCo for evaluation of contingency plans. This is still in progress, we're going to defer to Fedora 38. Deferring to F38, thanks! Today we reached the Code Complete (Testable) milestone on the F38 schedule: https://fedorapeople.org/groups/schedule/f-38/f-38-key-tasks.html At this time, all F38 Changes should be complete enough to be testable. You can indicate this by setting this tracker to the MODIFIED status. If the Change is 100% code complete, you can set the tracker to ON_QA. If you need to defer this Change to F39, please NEEDINFO me. Changes that have not reached at least the MODIFIED status will be given to FESCo for evaluation of contingency plans. Let's defer to F39 please, https://github.com/rpm-software-management/rpm/pull/2378 is the latest iteration on this but it's not ready to merge yet. Paperwork handled |