This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2243672 - [DPDK checkup] Teardown should happen immediately after setup failure
Summary: [DPDK checkup] Teardown should happen immediately after setup failure
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.14.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.15.0
Assignee: Orel Misan
QA Contact: Yossi Segev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-10-12 13:53 UTC by Yossi Segev
Modified: 2023-12-14 16:16 UTC (History)
1 user (show)

Fixed In Version: v4.15.0.rhel9-1377
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-14 16:16:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
DPDK checkup resources (3.58 KB, application/zip)
2023-10-12 13:53 UTC, Yossi Segev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kiagnose kubevirt-dpdk-checkup pull 196 0 None open Checkup, setup: Add setup context 2023-10-17 08:15:20 UTC
Red Hat Issue Tracker   CNV-33899 0 None None None 2023-12-14 16:16:13 UTC

Description Yossi Segev 2023-10-12 13:53:23 UTC
Created attachment 1993631 [details]
DPDK checkup resources

Description of problem:
When DPDK checkup setup fails (e.g. due to an invalid parameter in the ConfigMap, like in https://bugzilla.redhat.com/show_bug.cgi?id=2188244), the setup should occur immediately, and not wait for the configured timeout (which is set in the job's ConfigMap).


Version-Release number of selected component (if applicable):
CNV 4.14.0
container-native-virtualization/kubevirt-dpdk-checkup-rhel9:v4.14.0-116


How reproducible:
Always


Steps to Reproduce:
1.
Apply the attached resources in their numeric order.
$ oc apply -f 1-dpdk-checkup-mcp.yaml
$ oc apply -f 2-dpdk-checkup-performance-profile.yaml
...
Change the parameters according to your environment.
Note that the ConfigMap (6-dpdk-checkup-configmap.yaml) has this invalid parameter on purpose:
spec.param.trafficGenTargetNodeName: "invalid-node-name"


Actual results:
The job fails and teardown removes all created resources (pods and VMIs), but it only happens after the timeout that is configured in the ConfigMap (10 minutes in this example).


Expected results:
If an invalid parameter was entered and the checkup fails - it should fail immediately, including tearing down all the created resources.

Comment 1 Orel Misan 2023-10-16 08:04:40 UTC
If there is a problem with the creation of either of the VMIs - they are deleted immediately.

After the creation of the two VMIs, there is a wait for both of the VMIs to be ready (the wait is serial).
This wait is bounded by the overall timeout, specified in the user-supplied ConfigMap.

The solution needs to limit the setup timeout to a certain time - after which the setup will fail and the VMIs will be deleted.

Comment 2 Yossi Segev 2023-10-23 08:06:20 UTC
The implemented solution is to grant a 10 minutes grace for the setup to succeed, and if the setup fails - teardown after this timeout ends.
So the teardown shouldn't occur immediately, but rather after this 10 minutes timeout (unless `spec.timeout` in the job's ConfigMap is less than 10 minutes).
To verify this bug - set `spec.timeout` to 15m, and verify the teardown occur after 10 minutes.


Note You need to log in before you can comment on or make changes to this bug.