Created attachment 1739955 [details] Build logs Description of problem: OpenShift Dockerfile build takes 40 seconds to complete a simple RUN instruction like 'mkdir foo'. The build log level 10 shows it stuck in the below: --- stdio is not a terminal, defaulting to not using a terminal --- Version-Release number of selected component (if applicable): OpenShift 4.6 How reproducible: Hard to reproduce in another environment Steps to Reproduce: 1. oc project foo 2. cat << EOF | oc new-build --dockerfile=- --to=bar FROM registry.access.redhat.com/ubi8/ubi RUN mkdir /tmp/data1 RUN mkdir /tmp/data2 RUN mkdir /tmp/data3 ENTRYPOINT ["sleep", "infinity"] EOF 3. oc start-build bar --follow=true --wait=true --build-loglevel=10 | tee build-bar.log Actual results: Each RUN instruction in Dockerfile completes in 40 seconds. Expected results: The RUN instruction completes in one second. Additional info: Build logs are attached.
Tested on GCP running OCP 4.9. Could not reproduce this issue - `mkdir` commands take no more than 1 second to complete.
Correction - test was running 4.6.9 on GCP.
As a comparison point, can they try building the same Dockerfile with podman directly on a node via oc debug node/... ?
Root Cause: Some customers run the Dynatrace OneAgent operator on their clusters. OneAgent by default enables automatic "deep" monitoring of all processes, which causes the performance of OpenShift Builds to degrade significantly [1]. Any fix to address the performance degradation would need to be provided by Dynatrace (in partnership with Red Hat if necessary). Work Around: OpenShift admins who install Dynatrace OneAgent can configure Dynatrace to exclude deep monitoring of certain workloads [2]. Admins can add a monitoring rule which excludes all OpenShift Builds, if that is desired. Admins can also use Tolerations and Node Selectors to isolate Builds from nodes that run Dynatrace OneAgent. This could be accomplished as follows: 1. Add node labels and taints to the build worker nodes a. Taint worker nodes to be used for builds with a desired key, value, and the `NoSchedule` effect: ``` $ oc taint node <worker-node> build-node=true:NoSchedule- ``` b. Label these worker nodes with a desired key and value. These can be the same as above: ``` $ oc label node <worker-node> build-node=true ``` 2. Alternatively, add or update the labels and taints on a MachineSet, like so [3]: ``` apiVersion: machine.openshift.io/v1beta1 kind: MachineSet ... spec: template: # this is the template for the Machines to be provisioned metadata: labels: build-node: "true" ... spec: metadata: # this is metadata applied to all Nodes underlying the MachineSet labels: build-node: "true" taints: # taints applied to all Nodes underlying the MachineSet - effect: NoSchedule key: build-node value: "true" ``` 3. Set up a cluster-wide BuildOverride that allows builds to tolerate the "build-node" taint and forces builds onto the labeled build-nodes [4]. ``` $ oc edit build.config.openshift.io/cluster spec: buildOverrides: nodeSelector: build-node: "true" tolerations: - effect: NoSchedule key: build-node ``` 4. Deploy Dynatrace OneAgent via Operator Hub. The agents will not tolerate the custom "build-node" taint by default and therefore will not run on these nodes. [1] https://access.redhat.com/solutions/4978291 [2] https://www.dynatrace.com/support/help/shortlink/process-group-monitoring#enable-automatic-deep-monitoring [3] https://docs.openshift.com/container-platform/4.7/machine_management/creating_machinesets/creating-machineset-aws.html [4] https://docs.openshift.com/container-platform/4.7/cicd/builds/build-configuration.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days