1908704 – OpenShift Dockerfile build slowness

Bug 1908704 - OpenShift Dockerfile build slowness

Summary: OpenShift Dockerfile build slowness

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Build
Sub Component:
Version:	4.6
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Adam Kaplan
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-17 11:47 UTC by Vinu K
Modified:	2024-06-13 23:45 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-05 17:38:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Build logs (275.04 KB, text/plain) 2020-12-17 11:47 UTC, Vinu K	no flags	Details
View All

Description Vinu K 2020-12-17 11:47:24 UTC

Created attachment 1739955 [details]
Build logs

Description of problem:

OpenShift Dockerfile build takes 40 seconds to complete a simple RUN instruction like 'mkdir foo'. The build log level 10 shows it stuck in the below:

---
stdio is not a terminal, defaulting to not using a terminal
---

Version-Release number of selected component (if applicable):

OpenShift 4.6

How reproducible:

Hard to reproduce in another environment

Steps to Reproduce:

1. oc project foo

2. cat << EOF | oc new-build --dockerfile=- --to=bar
   FROM registry.access.redhat.com/ubi8/ubi
   RUN mkdir /tmp/data1
   RUN mkdir /tmp/data2
   RUN mkdir /tmp/data3
   ENTRYPOINT ["sleep", "infinity"]
   EOF

3. oc start-build bar --follow=true --wait=true --build-loglevel=10 | tee build-bar.log

Actual results:

Each RUN instruction in Dockerfile completes in 40 seconds.

Expected results:

The RUN instruction completes in one second.

Additional info:

Build logs are attached.

Comment 2 Adam Kaplan 2020-12-22 15:00:01 UTC

Tested on GCP running OCP 4.9. Could not reproduce this issue - `mkdir` commands take no more than 1 second to complete.

Comment 4 Adam Kaplan 2020-12-23 15:36:46 UTC

Correction - test was running 4.6.9 on GCP.

Comment 22 Timothée Ravier 2021-01-25 17:25:01 UTC

As a comparison point, can they try building the same Dockerfile with podman directly on a node via oc debug node/... ?

Comment 51 Adam Kaplan 2021-03-05 17:38:39 UTC

Root Cause:

Some customers run the Dynatrace OneAgent operator on their clusters. OneAgent by default enables automatic "deep" monitoring of all processes, which causes the performance of OpenShift Builds to degrade significantly [1]. Any fix to address the performance degradation would need to be provided by Dynatrace (in partnership with Red Hat if necessary).

Work Around:

OpenShift admins who install Dynatrace OneAgent can configure Dynatrace to exclude deep monitoring of certain workloads [2]. Admins can add a monitoring rule which excludes all OpenShift Builds, if that is desired.

Admins can also use Tolerations and Node Selectors to isolate Builds from nodes that run Dynatrace OneAgent. This could be accomplished as follows:

1. Add node labels and taints to the build worker nodes
a. Taint worker nodes to be used for builds with a desired key, value, and the `NoSchedule` effect:
```
$ oc taint node <worker-node> build-node=true:NoSchedule-
```
b. Label these worker nodes with a desired key and value. These can be the same as above:
```
$ oc label node <worker-node> build-node=true
```
2. Alternatively, add or update the labels and taints on a MachineSet, like so [3]:
```
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
...
spec:
template: # this is the template for the Machines to be provisioned
metadata:
labels:
build-node: "true"
...
spec:
metadata: # this is metadata applied to all Nodes underlying the MachineSet
labels:
build-node: "true"
taints: # taints applied to all Nodes underlying the MachineSet
- effect: NoSchedule
key: build-node
value: "true"

```

3. Set up a cluster-wide BuildOverride that allows builds to tolerate the "build-node" taint and forces builds onto the labeled build-nodes [4].

```
$ oc edit build.config.openshift.io/cluster

spec:
buildOverrides:
nodeSelector:
build-node: "true"
tolerations:
- effect: NoSchedule
key: build-node
```

4. Deploy Dynatrace OneAgent via Operator Hub. The agents will not tolerate the custom "build-node" taint by default and therefore will not run on these nodes.

[1] https://access.redhat.com/solutions/4978291
[2] https://www.dynatrace.com/support/help/shortlink/process-group-monitoring#enable-automatic-deep-monitoring
[3] https://docs.openshift.com/container-platform/4.7/machine_management/creating_machinesets/creating-machineset-aws.html
[4] https://docs.openshift.com/container-platform/4.7/cicd/builds/build-configuration.html

Comment 56 Red Hat Bugzilla 2023-09-15 00:53:12 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.