Bug 1535637
Summary: | Free disk space check (for raw image) ignores sparseness | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Ian Pilcher <ipilcher> |
Component: | openstack-ironic | Assignee: | Steve Baker <sbaker> |
Status: | CLOSED ERRATA | QA Contact: | mlammon |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 16.1 (Train) | CC: | akaris, bfournie, jkreger, mburns, pweeks, rhel-osp-director-maint, sbaker, shdunne, srevivo |
Target Milestone: | beta | Keywords: | Reopened, Triaged |
Target Release: | 17.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-ironic-17.0.4-0.20210828041812.43cc2b6.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-09-21 12:07:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ian Pilcher
2018-01-17 19:17:37 UTC
Related doc bug - https://bugzilla.redhat.com/show_bug.cgi?id=1532745 Thanks for the report. To be honest, I'm not sure how to solve it. Do you by chance know a way to figure out the raw image size before converting it? (In reply to Dmitry Tantsur from comment #2) > Thanks for the report. To be honest, I'm not sure how to solve it. Do you by > chance know a way to figure out the raw image size before converting it? I don't think that there's any way to know *exactly* how much space the sparse raw image will use, but I think that the size of the QCOW file (assuming that's what we're dealing with) is a reasonable proxy. Maybe check for 2X the size of the QCOW file and/or change from a fatal error to a warning? Yeah, this may work.
> I don't think that there's any way to know *exactly* how much space the
> sparse raw image will use, but I think that the size of the QCOW file
> (assuming that's what we're dealing with) is a reasonable proxy. Maybe
> check for 2X the size of the QCOW file and/or change from a fatal error
> to a warning?
Just wanted to note some edge cases for us to consider:
1. Generally, image size growth coefficient might be dependent on the
source and destination image formats. For instance, if our source image
is already `raw` the coefficient would be 1X rather than 2X.
2. I am guessing that even for `qcow` format, the 2X growth coefficient may
not always work. My concern is that if you take a freshly build `qcow` file
and do a bunch of writes it will grow (because of copy-on-write stacking).
However, when you convert such "used" `qcow` into `raw`, the once copied
data might get reduced out (only the latest version ends up in `raw` image).
3. Finally, `qcow` files can optionally be compressed what may influence
their growth when dumped into `raw`.
4. Purely theoretically (noting just for posterity), not all filesystems/OSes
support sparse files. Though modern filesystems and Linux seem to handle
sparse files pretty well. On the other hand, Solaris and network filesystems
might not be that advanced in that regard. Does this require further
research?
Given all these complications I am thinking that may be we could just bite
the bullet and try writing the destination image regardless of its expected
size while keep watching the free space being allocated. When we [somehow]
observe the free space if running low, we just abort the conversion and
clean up the remnants?
(In reply to Ilya Etingof from comment #5) > Given all these complications I am thinking that may be we could just bite > the bullet and try writing the destination image regardless of its expected > size while keep watching the free space being allocated. When we [somehow] > observe the free space if running low, we just abort the conversion and > clean up the remnants? Makes sense to me. The tricky part, however, is how to watch the growing space allocation on the file system. I can imagine SIGSTOP'ing running qemu-img periodically, checking out free space, clean up some caches and SIGCONT'ing qemu-img for some more time. Besides general weirdness of such design, it seems non-trivial to implement within the oslo process execution facilities. Therefore, please welcome the quick and straightforward fix: https://review.openstack.org/#/c/544839/ (In reply to Ilya Etingof from comment #7) > The tricky part, however, is how to watch the growing space allocation on > the file system. I can imagine SIGSTOP'ing running qemu-img periodically, > checking out free space, clean up some caches and SIGCONT'ing qemu-img for > some more time. Besides general weirdness of such design, it seems > non-trivial to implement within the oslo process execution facilities. There's always the wait for it to die and clean up approach ... > Therefore, please welcome the quick and straightforward fix: > > https://review.openstack.org/#/c/544839/ I, for one, welcome our new heuristic overlords! ;-) Patches seem to have stalled, do we still plan on making this change? I've re-read the original bug filing, and I'm wondering if we're over thinking this issue. The issue at hand is we have guard rails for the size check that don't account for virtual vs physical size. At what point do we "really" need to have a size check beyond we have some percentage of free space? I guess this kind of heads into the debate of "gracefully fail" vs "hard failure" vs "failure where corrective action can be taken or forward path identified (i.e. an alarm goes off in a monitoring system) and maybe a periodic task begins to delete old files." Upstream patches have stalled, we're closing this as not fix for now based on Juila's comment 11. I just encountered this in OSP 16.1 and the upstream patch changed recently, in October: https://review.opendev.org/c/openstack/ironic/+/544839 I am reopening this issue; feel free to close it again, but given that the upstream bug moved, it might make sense to keep this open? I refreshed the upstream patch, we'll have a talk about what to do with this bug. Flag of RHOS-15 is probably wrong, but otherwise, what's missing here to move this bug to the next step? The community originally pushed back against the original code proposal and the patches stalled until someone else (sbaker in this case) picked it up and revised the check based on the discussion. That patch landed in time for OSP17 so I don't believe any additional action is really required. I anticipate we would possibly get pushback if we tried to backport this upstream down to Train, as such I've updated the tag for 17, and I'm moving it to modified state and removing the triaged flag so the team revisits this item for discussion. I'll set back to MODIFIED when I've dug up a package with the fix Steve, reminder to follow up on this please. I've confirmed this change is in the latest RHOSP-17.0 build package Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543 |