Bug 1775875 - 'oc adm release extract' uses low-sensitivity match-and-replace and silently ignores partial writes
Summary: 'oc adm release extract' uses low-sensitivity match-and-replace and silently ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: W. Trevor King
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks: 1778882
TreeView+ depends on / blocked
 
Reported: 2019-11-23 05:18 UTC by W. Trevor King
Modified: 2020-07-13 17:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1778882 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:12:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 167 0 'None' closed Bug 1775875: pkg/cli/admin/release/extract_tools: Pass []replacements to copyAndReplace 2020-09-22 20:36:14 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:12:31 UTC

Description W. Trevor King 2019-11-23 05:18:59 UTC
Since it landed way back in 2019-04-04 [1], there have been some bugs in the extraction match-and-replace implementation:

* It uses len(value) of the marker when searching the incoming bytes. Especially for short strings like version replacement (which is new in 4.3 [2]), this meant we were only looking for five bytes for 4.2.7, which is \x00_REL. That's not very specific. It does not distinguish between \x00_RELEASE_IMAGE_LOCATION_\x00XX... and \x00_RELEASE_VERSION_LOCATION_\x00XX.... It doesn't even distinguish between the defaultVersionPadded and defaultVersionPrefix constants in pkg/version. We should search for the full marker, regardless of len(value).

* I has:

    nextOffset := end - len(marker)
    ...
    _, wErr := w.Write(buf[:end-nextOffset])
    ...
    copy(buf[:nextOffset], buf[end-nextOffset:end])
    offset = nextOffset

  That is problematic for a few reasons:

  * It doesn't write much data. Substituting for nextOffset, we were writing to:

          end - nextOffset
        = end - (end - len(marker))
        = len(marker)
        	  ```

    this makes the buffer size largely meaningless, and means lots of inefficient, small reads/writes.  We should try to write everything except the *last* `len(marker)`.

  * It ignores the amount of data written. You can want to write 1k bytes but only actually write 50 bytes with a given Write call.  That didn't happen often before; because of the previous list entry we were only attempting to write a hundred or so bytes at a time. But once we start trying to write the bulk of the buffer size, it will happen more often. We should keep attempting Write until it errors out on us or we finish pushing all the bytes we no longer need.

[1]: https://github.com/openshift/origin/pull/22439#event-2252366741
[2]: https://github.com/openshift/oc/pull/88

Comment 2 zhou ying 2019-12-18 01:16:28 UTC
Compared with old version, the issue still  could reproduce :

[root@dhcp-140-138 ~]# ./oc-old version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"openshift-clients-4.2.0-201909020729", GitCommit:"9ff96feb1aea1217938e2f1aeaf0be091cc59728", GitTreeState:"clean", BuildDate:"2019-09-02T15:22:22Z", GoVersion:"go1.12.8", Compiler:"gc", Platform:"linux/amd64"}

[root@dhcp-140-138 ~]# strace -fo strace-43 ./oc-old adm release extract --command=oc registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2019-12-17-223355

[root@dhcp-140-138 ~]# oc version 
Client Version: v4.4.0
[root@dhcp-140-138 ~]# strace -fo strace-44 oc adm release extract --command=oc registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2019-12-17-223355

[root@dhcp-140-138 ~]# grep write strace-44 |tail -n 20
5251  write(12, "ine a default period\\n\"\n\"\\t\\tbef"..., 16384 <unfinished ...>
5251  <... write resumed> )             = 16384
5251  write(12, "rt of the kubernetes api and als"..., 16384) = 16384
5249  write(11, "\27\3\3\0%\0\0\0\0\0\0\3\201\303\354\7\tX\320 \262\364;\32kS\322\372/\256\302\250"..., 42 <unfinished ...>
5249  <... write resumed> )             = 42
5249  write(12, "s of all of the pods in the clus"..., 16384) = 16384
5249  write(12, "xisting image, ignored otherwise"..., 16384) = 16384
5249  write(12, "\nmsgid \"\"\n\"The resource requirem"..., 16384) = 16384
5249  write(12, "h the app=nginx label\\n\"\n\"\\t\\t# "..., 16384) = 16384
5249  write(12, "# Listen on ports 5000 and 6000 "..., 16384) = 16384
5249  write(12, "<arg1> ... <argN>\\n\"\n\"\\n\"\n\"\\t\\t#"..., 16384 <unfinished ...>
5249  <... write resumed> )             = 16384
5249  write(12, " autoscaler can automatically in"..., 16384 <unfinished ...>
5249  <... write resumed> )             = 16384
5249  write(12, "e spec must be provided. This ca"..., 16384) = 16384
5249  write(12, "' and the value 'my \"\n\"frontend "..., 16384 <unfinished ...>
5249  <... write resumed> )             = 16384
5249  write(12, "kubernetes/blob/master/pkg/kubec"..., 16384) = 16384
5249  write(12, "ed object.\"\nmsgstr \"The name for"..., 11928 <unfinished ...>
5249  <... write resumed> )             = 11928
[root@dhcp-140-138 ~]# 
[root@dhcp-140-138 ~]# grep write strace-43 |tail -n 20
5291  write(10, " 'foo' only if the resource is u"..., 16384) = 16384
5291  write(10, "ine a default period\\n\"\n\"\\t\\tbef"..., 16384) = 16384
5291  write(10, "rt of the kubernetes api and als"..., 16384) = 16384
5280  write(9, "\27\3\3\0%\0\0\0\0\0\0\5\326\227\26\0\\\345\31q\345\r\230\230|CH\243\332\222(3"..., 42) = 42
5280  write(10, "s of all of the pods in the clus"..., 16384) = 16384
5280  write(10, "xisting image, ignored otherwise"..., 16384 <unfinished ...>
5280  <... write resumed> )             = 16384
5280  write(10, "\nmsgid \"\"\n\"The resource requirem"..., 16384 <unfinished ...>
5280  <... write resumed> )             = 16384
5280  write(10, "h the app=nginx label\\n\"\n\"\\t\\t# "..., 16384) = 16384
5280  write(10, "# Listen on ports 5000 and 6000 "..., 16384) = 16384
5280  write(10, "<arg1> ... <argN>\\n\"\n\"\\n\"\n\"\\t\\t#"..., 16384 <unfinished ...>
5280  <... write resumed> )             = 16384
5280  write(10, " autoscaler can automatically in"..., 16384 <unfinished ...>
5280  <... write resumed> )             = 16384
5280  write(10, "e spec must be provided. This ca"..., 16384) = 16384
5280  write(10, "' and the value 'my \"\n\"frontend "..., 16384 <unfinished ...>
5280  <... write resumed> )             = 16384
5280  write(10, "kubernetes/blob/master/pkg/kubec"..., 16384) = 16384
5280  write(10, "ed object.\"\nmsgstr \"The name for"..., 11928) = 11928

Comment 3 W. Trevor King 2019-12-18 03:29:07 UTC
Ah, I think I understand what's wrong with our verification procedure.  By looking at the tail, we're only hitting the post-replacement phase where even the old code made nice, fat writes.  I'll see if I can get a more-specific test to use...

Comment 4 Maciej Szulik 2020-02-27 16:46:44 UTC
I'm pushing this over to 4.5, since this is not a regression and I don't think we'll be able to fix that in 4.4.
Trevor any chances you can pick it up in 4.5 or share what/how needs changing?

Comment 5 W. Trevor King 2020-04-02 15:41:06 UTC
I'll take another look, but https://github.com/openshift/oc/pull/167 did not get reverted, so I'm pretty happy with where things stand ;).  The issue may just be how we're trying to verify existing behavior, but I'm not sure.

Comment 6 Maciej Szulik 2020-04-03 11:07:54 UTC
Based on Trevor previous comment I'd move this for verification, we can revisit this at any time.

Comment 8 W. Trevor King 2020-04-04 02:21:57 UTC
This failed verification back in comment 2.  I think it's unlikely that the current code will pass verification unless we either land additional patches or figure out why the previous verification procedure didn't turn out like we'd expected.

Comment 10 W. Trevor King 2020-04-09 16:44:50 UTC
> ... or figure out why the previous verification procedure didn't turn out like we'd expected.


I still don't have a procedure for this, but if nobody else thinks one up I will work one up eventually.

Comment 12 W. Trevor King 2020-04-13 19:52:53 UTC
If folks want to test this, it's been in 4.3 oc since:

$ git --no-pager log -1 'origin/release-4.3^{/replacements to copyAndReplace}'
commit a82d2ce4efdbfa9836480d8567acccfde19d297b
Merge: 57997ba65 960c5070b
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Tue Dec 3 06:06:00 2019 -0800

    Merge pull request #198 from openshift-cherrypick-robot/cherry-pick-167-to-release-4.3
    
    Bug 1778882: pkg/cli/admin/release/extract_tools: Pass []replacements to copyAndReplace

and 4.4 oc since:

$ git --no-pager log -1 'origin/release-4.4^{/replacements to copyAndReplace}'
commit d3f137b710de331abc946b9ad42e42cc8d5ded52
Merge: a8894f3e0 267009391
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Mon Dec 2 16:22:23 2019 -0800

    Merge pull request #167 from wking/release-extract-cleanup
    
    Bug 1775875: pkg/cli/admin/release/extract_tools: Pass []replacements to copyAndReplace

So anything 4.3.0 [1] or later should be fine for testing.  Double-checking:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.0-x86_64 | grep cli-artifacts
  cli-artifacts                                 https://github.com/openshift/oc                                            6a937dfe56ff26255d09702c69b8406040c14505
$ git log --first-parent --oneline origin/release-4.3 | grep 'a82d2ce4\|6a937dfe'
6a937dfe5 Merge pull request #208 from openshift-cherrypick-robot/cherry-pick-207-to-release-4.3
a82d2ce4e Merge pull request #198 from openshift-cherrypick-robot/cherry-pick-167-to-release-4.3

[1]: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.3.0/

Comment 14 zhou ying 2020-04-30 02:14:08 UTC
Hi W. Trevor King :

Since this issue has been On_QA for a long time, I'm wondering do we have some method  to verify this issue ? thanks.

Comment 15 zhou ying 2020-06-19 06:03:11 UTC
I've checked no regression issue for command `oc adm release extract` , will move to verified.

Comment 17 W. Trevor King 2020-06-20 02:44:13 UTC
I guess I'm off-the-hook for figuring out why my previous testing plan wasn't giving the expected results :p

Comment 19 errata-xmlrpc 2020-07-13 17:12:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.