Bug 2080006 - Pipelines that worked fine on operator version 1.5 are failing in version 1.7
Summary: Pipelines that worked fine on operator version 1.5 are failing in version 1.7
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat OpenShift Pipelines
Classification: Red Hat
Component: pipelines
Version: 1.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Vincent Demeester
QA Contact: Ruchir Garg
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-28 17:26 UTC by Anand Paladugu
Modified: 2023-09-09 03:20 UTC (History)
9 users (show)

Fixed In Version: 1.7.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-28 10:13:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Anand Paladugu 2022-04-28 17:26:20 UTC
Description of problem:

Pipeline fails when more than 2694 characters are fed to the result field in a task, with error message "could not find result with name RESULT_STRING for task generate-result"

Customer never had an issue with these pipelines in the pipelines operator version 1.5

Version-Release number of selected component (if applicable):

OCP 4.10, Pipelines operator version 1.7

How reproducible:

Every time

Steps to Reproduce:
1.  Run a pipeline that consists of a task that generates result with more than 2694 characters for another task
2.
3.

Actual results:

PipelineRun fails with error message "could not find result with name RESULT_STRING for task generate-result"


Expected results:

PipelineRun should succeed.

Additional info:

The taskRun is that generating the result field is successful.

Comment 3 Anand Paladugu 2022-04-29 14:03:51 UTC
Yeah, I can change it to reflect the actual problem (things worked fine on 1.5 but failing on the 1.7 operator).

Comment 4 Anand Paladugu 2022-05-02 14:56:22 UTC
@vdemeest 

From the doc -> https://github.com/tektoncd/pipeline/issues/4808  I notice that you have suggested the following amongst others.

1. Split results over multiple steps in case of multiple results — maybe by having results per step and TaskResults submiting all ?


But from the emitting results page, I note the following:

If your Task writes a large number of small results, you can work around this limitation by writing each result from a separate Step so that each Step has its own termination message. If a termination message is detected as being too large the TaskRun will be placed into a failed state with the following message: Termination message is above max allowed size 4096, caused by large task result. Since Tekton also uses the termination message for some internal information, so the real available size will less than 4096 bytes.


The above seems to imply that we already have the step results in place ?  and also that a warning would be issues if the message exceeds the limit 4k ?  

Plus how much of 4K is used for internal information ?

Thx

Anand

Comment 6 Anand Paladugu 2022-05-03 18:43:44 UTC
@vdemeest   Thanks.


It seems the limit 12k is due to other constraints in the architecture. Previous PR says this:

"I have a separate PR that will be part of this proposal to limit number of bytes for container termination message. I think the limit has to be around 5-15k because we have a hard limit in etcd and size is very important. Algorithm would probably be max size / containers per container."

So you are right in saying that the only option would be to reduce the # of containers.

Any way customer can achieve that now without code changes ??

Comment 7 Vincent Demeester 2022-05-04 08:25:25 UTC
@apaladug as of today, not really. If the `Task` have multiple steps, it would be to group some steps that uses the same image and use script.
The PR https://github.com/tektoncd/pipeline/pull/4826 should remove one init container, and effectively make the result size the same as before 1.7. We will try to backport it to 1.7.x as well.

Comment 8 Vincent Demeester 2022-05-09 14:50:06 UTC
This has been backported to upcoming 1.7.1, and is currently getting validating by QAs.

Comment 9 Anand Paladugu 2022-05-10 00:07:07 UTC
Thank you. I have created a KCS in the interim.

Comment 11 Vincent Demeester 2022-09-28 06:59:52 UTC
Looking at the date and the 1.7.1 numbers, this should have been already released as we are working on 1.7.3 (for bugfixes) now. So I would assume this has been fixed and can be closed.
cc @ppitonak @varadhya

Comment 12 Veeresh Aradhya 2022-09-28 10:13:19 UTC
Marking the bug as closed as the fix for the bug has been released as part of 1.8.0 and 1.7.3


Note You need to log in before you can comment on or make changes to this bug.