Bug 1908704 - OpenShift Dockerfile build slowness
Summary: OpenShift Dockerfile build slowness
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.6
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Adam Kaplan
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-17 11:47 UTC by Vinu K
Modified: 2024-06-13 23:45 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-05 17:38:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Build logs (275.04 KB, text/plain)
2020-12-17 11:47 UTC, Vinu K
no flags Details

Description Vinu K 2020-12-17 11:47:24 UTC
Created attachment 1739955 [details]
Build logs

Description of problem:

OpenShift Dockerfile build takes 40 seconds to complete a simple RUN instruction like 'mkdir foo'. The build log level 10 shows it stuck in the below:

---
stdio is not a terminal, defaulting to not using a terminal
---

Version-Release number of selected component (if applicable):

OpenShift 4.6

How reproducible:

Hard to reproduce in another environment

Steps to Reproduce:

1. oc project foo

2. cat << EOF | oc new-build --dockerfile=- --to=bar
   FROM registry.access.redhat.com/ubi8/ubi
   RUN mkdir /tmp/data1
   RUN mkdir /tmp/data2
   RUN mkdir /tmp/data3
   ENTRYPOINT ["sleep", "infinity"]
   EOF

3. oc start-build bar --follow=true --wait=true --build-loglevel=10 | tee build-bar.log

Actual results:

Each RUN instruction in Dockerfile completes in 40 seconds.

Expected results:

The RUN instruction completes in one second.

Additional info:

Build logs are attached.

Comment 2 Adam Kaplan 2020-12-22 15:00:01 UTC
Tested on GCP running OCP 4.9. Could not reproduce this issue - `mkdir` commands take no more than 1 second to complete.

Comment 4 Adam Kaplan 2020-12-23 15:36:46 UTC
Correction - test was running 4.6.9 on GCP.

Comment 22 Timothée Ravier 2021-01-25 17:25:01 UTC
As a comparison point, can they try building the same Dockerfile with podman directly on a node via oc debug node/... ?

Comment 51 Adam Kaplan 2021-03-05 17:38:39 UTC
Root Cause:

Some customers run the Dynatrace OneAgent operator on their clusters. OneAgent by default enables automatic "deep" monitoring of all processes, which causes the performance of OpenShift Builds to degrade significantly [1]. Any fix to address the performance degradation would need to be provided by Dynatrace (in partnership with Red Hat if necessary).

Work Around:

OpenShift admins who install Dynatrace OneAgent can configure Dynatrace to exclude deep monitoring of certain workloads [2]. Admins can add a monitoring rule which excludes all OpenShift Builds, if that is desired.

Admins can also use Tolerations and Node Selectors to isolate Builds from nodes that run Dynatrace OneAgent. This could be accomplished as follows:

1. Add node labels and taints to the build worker nodes
  a. Taint worker nodes to be used for builds with a desired key, value, and the `NoSchedule` effect:
    ```
    $ oc taint node <worker-node> build-node=true:NoSchedule-
    ```
  b. Label these worker nodes with a desired key and value. These can be the same as above:
    ```
    $ oc label node <worker-node> build-node=true
    ```
2. Alternatively, add or update the labels and taints on a MachineSet, like so [3]:
  ```
  apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  ...
  spec:
    template: # this is the template for the Machines to be provisioned
      metadata:
        labels:
          build-node: "true"
      ...
      spec:
        metadata: # this is metadata applied to all Nodes underlying the MachineSet
          labels:
            build-node: "true"
      taints: # taints applied to all Nodes underlying the MachineSet
      - effect: NoSchedule
        key: build-node
        value: "true"
      
  ```

3. Set up a cluster-wide BuildOverride that allows builds to tolerate the "build-node" taint and forces builds onto the labeled build-nodes [4].

```
$ oc edit build.config.openshift.io/cluster

spec:
  buildOverrides:
    nodeSelector:
      build-node: "true"
    tolerations:
    - effect: NoSchedule
      key: build-node
```

4. Deploy Dynatrace OneAgent via Operator Hub. The agents will not tolerate the custom "build-node" taint by default and therefore will not run on these nodes.

[1] https://access.redhat.com/solutions/4978291
[2] https://www.dynatrace.com/support/help/shortlink/process-group-monitoring#enable-automatic-deep-monitoring
[3] https://docs.openshift.com/container-platform/4.7/machine_management/creating_machinesets/creating-machineset-aws.html
[4] https://docs.openshift.com/container-platform/4.7/cicd/builds/build-configuration.html

Comment 56 Red Hat Bugzilla 2023-09-15 00:53:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.