Bug 1999689

Summary: Integrate upgrade testing from ocs-ci to the acceptance job for final builds before important milestones
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Raz Tamir <ratamir>
Component: buildAssignee: Zack Cerza <zcerza>
Status: CLOSED ERRATA QA Contact: Petr Balogh <pbalogh>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: bniver, branto, ebenahar, etamir, madam, muagarwa, ocs-bugs, odf-bz-bot, pbalogh, rcyriac, sostapov, tmuthami, zcerza
Target Milestone: ---   
Target Release: ODF 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-13 18:49:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raz Tamir 2021-08-31 15:06:32 UTC
We need to have in addition to the acceptance testing for new builds, an upgrade testing happening for builds for important milestones (FF, DF, RCs)

Comment 3 Boris Ranto 2021-09-30 07:12:05 UTC
Hi Raz,

can you be more specific as to what this is supposed to encompass? Why is this assigned to the build team? Do you want this to be fully automated, i.e. running the upgrade testing for important builds? Can you point us to a suite?

We could schedule another ocs-ci run for RC builds I suppose. (RC builds do include the DF build in our terminology, in fact that is the RC0 build). You could also have that automation in QE Jenkins by listening on the UMB messages and looking for the RELEASE_CANDIDATE field.

Regards,
Boris

Comment 4 Raz Tamir 2021-10-25 17:13:27 UTC
Hi Boris,

The request here is to have the upgrade test suite runs for the release milestones in addition to the acceptance tests that are running today.
We've seen that upgrades are breaking and it is caught only in QE testing which is the thing we are trying to improve - to have the build rejected if the upgrade is not working properly (similar to deployment).
QE have all the automation in place and this should just be integrated in the acceptance pipeline.

The reason this is assigned to the build team is because I was told to file a BZ for this request.
Please let me know if you want me to create a ticket somewhere else

Comment 5 Boris Ranto 2021-10-27 12:26:07 UTC
OK, it makes more sense now. Reversing the Assignee/QA Contact roles as you will be implementing the feature and we will be technically testing it. :)

Comment 6 Michael Adam 2021-11-09 15:17:23 UTC
(In reply to Boris Ranto from comment #5)
> OK, it makes more sense now. Reversing the Assignee/QA Contact roles as you
> will be implementing the feature and we will be technically testing it. :)

Boris, I think there is a misunderstanding. Let me explain:

Upgrade tests exist. The task is for the dev team to add a run of the upgrade tests to the dev-ci pipeline in addition to the existing run of the acceptance test suite. Both, the existing acceptance test suite and the upgrade test suite would need to pass for the build to marked stable (at least at certain milestones).

So it is correctly assigned to the build team.

Comment 7 Boris Ranto 2021-11-09 18:48:37 UTC
In that case, we need more information/details on this. What is the name of the suite? Any special configurations for the suite? How does it work -- do we need a separate OCP deployment or can we combine it somehow with the acceptance suite? What upgrade do you actually want to test?

This was requested for special builds only (DF/FF/RC) but if we do it that way, any non-RC build could simply overwrite the latest-stable tag without passing the upgrade tests.

If you want the test to run automatically for DF/FF/RC, you can setup the job (even in your own Jenkins) to listen for an UMB message with RC set in it (we currently set RC field for both FF and RC builds, we could do that even for DF build). That might actually be a better option for you as it would give you complete control on what is actually being tested.

Comment 21 Petr Balogh 2022-01-12 23:03:58 UTC
Hello Zack,

sorry for late reply.
I provided some feedback here:
https://github.com/red-hat-storage/ocs-ci/pull/5298#issuecomment-1011527471

Can you please join tomorrow the automation meeting so we can syncup there?
Thanks

Comment 22 Zack Cerza 2022-01-12 23:33:41 UTC
No problem Petr, here's my response: https://github.com/red-hat-storage/ocs-ci/pull/5298#issuecomment-1011548060

I can join tomorrow for a short time, but I do have an overlapping meeting.

Comment 23 Mudit Agarwal 2022-02-08 11:56:15 UTC
The PR is merged, is anything else pending here?

Comment 24 Elad 2022-02-09 20:42:10 UTC
Hi Tamil, Zack, Boris,

Please let us know if any help is needed. 
It is extremely important to have OCS upgrade being part of the acceptance pipeline, not only for the builds' functionality validation but also for frequently testing OCS ugrade with the OCP nightly builds, in otder to catch up any regression caused by OCP, such as bug 2035484

Comment 25 Zack Cerza 2022-02-09 22:31:24 UTC
(In reply to Elad from comment #24)
> Hi Tamil, Zack, Boris,
> 
> Please let us know if any help is needed. 
> It is extremely important to have OCS upgrade being part of the acceptance
> pipeline, not only for the builds' functionality validation but also for
> frequently testing OCS ugrade with the OCP nightly builds, in otder to catch
> up any regression caused by OCP, such as bug 2035484

Are you saying that we want the normal acceptance tests plus the upgrade+acceptance tests to gate tagging as latest-stable, or just that we want to execute upgrade+acceptance in addition to the normal acceptance tests?

All that's remaining is getting an answer to the above, and implementing the trigger - I'm assuming we want to trigger the upgrades in exactly the same way that we trigger the other job; please let me know if I'm wrong in that assumption.

Comment 26 Mudit Agarwal 2022-02-10 07:36:58 UTC
Yes, we want to trigger upgrades in exactly the same way that we trigger the other job.
But we don't want to run it for every build, either it should be run on demand or with every milestone build starting from feature freeze.

Only for the milestone builds it should do a gate tagging for marking the build latest-stable.
1. deployment + acceptance
2. upgrade + acceptance
If both pass then mark the build stable (only for milestone builds)

Comment 27 Boris Ranto 2022-02-10 10:37:03 UTC
There is a pretty fundamental issue with that workflow. If we only run the upgrade tests for milestone builds and the upgrade fails, the following non-milestone build will be tagged as stable even if it would not pass the upgrade tests. All in all, we don't have a clean way of tagging the builds that passed upgrade and acceptance tests right now. We could probably introduce something like latest-stable-upgrade-4.Y tags if that helps.

Anyway, we can now run upgrade tests on demand (there is an upgrade flag for the ocs-ci job now) so we can schedule additional ocs-ci run with upgrade for milestone builds, this is not automated in any way though.

Comment 28 Elad 2022-02-12 16:48:02 UTC
Hi Boris, Mudit,

Integrating OCS upgrade within the acceptance pipeline of all new OCS builds would be better for the reason Boris specified, as well as the additional following reasons:
1) We will be able to identify regressions caused by OCP quickly enough to raise the flag before the next OCP z-stream is released
2) In case we will have OCS upgrade included in the acceptance OCS builds pipeline, only for milestones, there is a good chance that OCS upgrade will be broken, especially for the feature freeze milestone example, and we will know about it only when the feature freeze build would be validated. This would block the milestone

Comment 29 Boris Ranto 2022-02-14 08:14:54 UTC
I believe it would be best if we introduced a new tag for builds that passed upgrade+acceptance -- this would be in the form of latest-stable-upgrade-4.Y tag.

In this case, we would be running ocs-ci twice -- the first time -- we would run a standard ocs-ci for the build itself. This would continue to use the latest-stable-4.Y tags. In parallel, we can run ocs-ci with upgrade+acceptance tests and these would modify the latest-stable-upgrade-4.Y tags. This is easily doable by us and I believe it would suit all of us needs the best. We would get the upgrade tests run regularly (for all builds...) and we could still continue to test the builds where we don't care about upgrades with the latest-stable tags.

Comment 30 Mudit Agarwal 2022-02-14 08:26:49 UTC
I agree with Boris. I just not want to block QE in case when a build passes acceptance and fails upgrade.
If we can mark it stable once it passes acceptance and then run upgrade I am fine with it.

It will serve 2 purposes:

1. QE can use a build which passed acceptance, upgrade can still be a blocker (if it fails) but overall progress can be made.
2. We don't have to wait for additional time for upgrade to pass, already build + deployment + acceptance time is huge and I don't want upgrade to add more time to it for every build. 

Elad/Eran comments?

Comment 31 Boris Ranto 2022-02-14 13:13:09 UTC
FYI: These runs would be triggered in parallel -- one of them would define the latest-stable build like it did before while the upgrade one would define the latest-stable-upgrade build.

Comment 35 Boris Ranto 2022-02-15 08:58:42 UTC
This implements the upgrade tests auto-trigger and introduces the new set of managed latest-stable-upgrade-4.Y tags:

https://gitlab.cee.redhat.com/ceph/rhcs-jenkins-jobs/-/merge_requests/872

Comment 37 Boris Ranto 2022-02-15 14:37:53 UTC
All the new CPaaS builds should now trigger both regular ocs-ci run and uograde ocs-ci run, moving to ON_QA.

Comment 43 Petr Balogh 2022-02-16 09:44:49 UTC
Once there will be such build which will pass both, can you point out which build is that?

Thanks

Comment 44 Boris Ranto 2022-02-17 08:32:16 UTC
You will get notified any time the build passes the upgrade+acceptance tests. It is a new tag maintained by the new CI run. You should have already received this e-mail:

Custom 'stable-upgrade' OCS 4.10 build '4.10.0-158' is available and ready for testing

I will update the automation to not consider this a custom build/CI run a provide a more valuable email subject.

Comment 49 errata-xmlrpc 2022-04-13 18:49:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372