Bug 1777193

Summary: CI run platform upgrade
Product: [Community] GlusterFS Reporter: Sunil Kumar Acharya <sheggodu>
Component: project-infrastructureAssignee: Deepshikha khandelwal <dkhandel>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6CC: bugs, gluster-infra, misc, mscherer, pasik, sankarshan.mukhopadhyay, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:36:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sunil Kumar Acharya 2019-11-27 05:42:25 UTC
Description of problem:

Please Upgrade the CI platform to centOS8.

Comment 1 M. Scherer 2019-11-27 08:56:53 UTC
so, what do you mean by "upgrade the CI platform" ? 

Is the request "we need to have Centos 8 builders", if so, for what jobs ?

Comment 2 Sunil Kumar Acharya 2019-11-27 09:00:18 UTC
Yes, we need to have centos 8 builders for gluster regression/nightly runs.

Comment 3 Deepshikha khandelwal 2019-11-27 10:21:03 UTC
I doubt that we just need it for regression job.

Comment 4 M. Scherer 2019-11-27 11:22:50 UTC
Ok, let's convert builder 30 to 39 to Centos 8. I will remove them from jenkins, erase them and reinstall on Centos 8, then we can test the build.

Comment 5 M. Scherer 2019-11-27 11:29:16 UTC
I am doing a test on 38 and 39 for now, no need to test on 10 hosts if that do not work.

Comment 6 M. Scherer 2019-11-28 08:45:57 UTC
So, we do have 2 centos 8 builders, but, package names changed (yeah).

For example:
- python-libcloud, cppcheck, python-docopt no longer exist

I think we should have split the package by job we wanted to run, cause right now, I can't figure what is important and what is not.

Comment 7 M. Scherer 2019-11-28 09:08:00 UTC
Even worst:

fatal: [builder39.int.rht.gluster.org]: FAILED! => {"changed": false, "failures": ["No package cmockery2-devel available.", "No package lvm2-devel available.", "No package python-devel available.", "No package userspace-rcu-devel available.", "No package python-requests available.", "No package python-flask available.", "No package python-prettytable available.", "No package python-virtualenv available.", "No package pyxattr available.", "No package lcov available.", "No package bzr available."], "msg": ["Failed to install some of the specified packages"], "rc": 1, "results": []}

So:
- bzr, not sure why we need it (GD2 testing, guess we can clean it)
- cmockery2-devel, not there
- lvm2-devel and userspace-rcu-devel are in Centos-PowerTools
- python-* , they were renamed to python3-*
- no lcov either

So I ma going to remove bzr, try to figure something for python vs python3 naming, then we have to decide how we can build without cmockery

Comment 8 M. Scherer 2019-11-28 17:03:33 UTC
So, i removed bzr and GD2 stuff. I moved a few job in separate file, make it easier to clean later. But that's still messy, and would need a big refactor. For example, I am not sure why we need some of the packages, I did some cleanup already, I will push later.

Did someone already managed to build gluster on Centos 8, and run the regression test ? The lack of cmokery2 seems to be blocking to me.

Comment 9 M. Scherer 2019-12-04 15:54:45 UTC
So, testing the regression, I did hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1779742

Comment 10 sankarshan 2019-12-23 04:36:59 UTC
(In reply to M. Scherer from comment #9)
> So, testing the regression, I did hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1779742

With that CLOSED NEXTRELEASE, what would be the next steps for this bug?

Comment 11 Deepshikha khandelwal 2020-01-03 09:50:46 UTC
Let's give another test regression run on the Centos8 machine.

Sankarshan, once this test regression run passes we should be having nightly runs on Centos8 builders.

Comment 12 M. Scherer 2020-01-03 15:21:14 UTC
before the shutdown, I was running and disabling the test from the test suite one by one to decide if it work or not. Ideally, if someone could tell me "I did a run and it work fine each time" would bring me confidence that the builder is problematic, not that the test suite si broken.

Comment 13 Deepshikha khandelwal 2020-01-23 10:16:30 UTC
We have a testing jenkins job for regression on centos8.

https://build.gluster.org/job/testing-regression-on-centos8/6/consoleFull

Currently, 9 tests are failing.

9 test(s) failed 
./tests/basic/playground/template-xlator-sanity.t
./tests/basic/trace.t
./tests/bugs/glusterfs-server/bug-887145.t
./tests/bugs/nfs/bug-1053579.t
./tests/bugs/nfs/bug-1116503.t
./tests/bugs/nfs/bug-1157223-symlink-mounting.t
./tests/bugs/posix/bug-990028.t
./tests/bugs/rpc/bug-954057.t
./tests/features/ssl-authz.t

Comment 14 Deepshikha khandelwal 2020-01-23 10:16:56 UTC
We have a testing jenkins job for regression on centos8.

https://build.gluster.org/job/testing-regression-on-centos8/6/consoleFull

Currently, 9 tests are failing.

9 test(s) failed 
./tests/basic/playground/template-xlator-sanity.t
./tests/basic/trace.t
./tests/bugs/glusterfs-server/bug-887145.t
./tests/bugs/nfs/bug-1053579.t
./tests/bugs/nfs/bug-1116503.t
./tests/bugs/nfs/bug-1157223-symlink-mounting.t
./tests/bugs/posix/bug-990028.t
./tests/bugs/rpc/bug-954057.t
./tests/features/ssl-authz.t

Comment 15 M. Scherer 2020-01-23 10:37:55 UTC
Discussing with deepshika, she pointed that a few tests are related to RPC. 


./tests/basic/playground/template-xlator-sanity.t
   not ok   3 [     23/   1198] <  24> './tests/basic/playground/../rpc-coverage.sh --no-locks /mnt/glusterfs/0' -> ''

./tests/basic/trace.t
   not ok   3 [     22/   1189] <  25> './tests/basic/rpc-coverage.sh --no-locks /mnt/glusterfs/0' -> ''


./tests/bugs/glusterfs-server/bug-887145.t
    ok  20 [     25/      2] <  61> '[ 0 -eq 0 ]'
    touch: cannot touch '/mnt/glusterfs/0/dir/file': Permission denied
    not ok  21 [     20/     10] <  65> 'touch /mnt/glusterfs/0/dir/file' -> ''
    touch: cannot touch '/mnt/nfs/0/dir/foo': Permission denied
    not ok  22 [     20/     11] <  66> 'touch /mnt/nfs/0/dir/foo' -> ''
    mkdir: cannot create directory ‘/mnt/glusterfs/0/dir/new’: Permission denied
    not ok  23 [     20/      8] <  67> 'mkdir /mnt/glusterfs/0/dir/new' -> ''
    mkdir: cannot create directory ‘/mnt/nfs/0/dir/other’: Permission denied
    not ok  24 [     20/      7] <  68> 'mkdir /mnt/nfs/0/dir/other' -> ''
    ok  25 [     20/     10] <  69> 'rm -f /mnt/glusterfs/0/dir/file /mnt/glusterfs/0/dir/foo'
    rmdir: failed to remove '/mnt/nfs/0/dir/*': No such file or directory
    not ok  26 [     23/      6] <  70> 'rmdir /mnt/nfs/0/dir/*' -> ''

887145, 1053579, 1116503, 1157223 are NFS related

So I guess we should focus on seeing what is failling for that.

Comment 16 M. Scherer 2020-01-23 15:08:22 UTC
So, Deepshika did diagnose some tests and found that it is related to the merge of nfsnobody and nobody user on Centos 8, and that there is a patch for that on https://review.gluster.org/#/c/glusterfs/+/23710/

Comment 17 sankarshan 2020-01-23 15:16:22 UTC
(In reply to M. Scherer from comment #16)
> So, Deepshika did diagnose some tests and found that it is related to the
> merge of nfsnobody and nobody user on Centos 8, and that there is a patch
> for that on https://review.gluster.org/#/c/glusterfs/+/23710/

This is excellent to know - out of general curiosity - how was the issue triangulated to be because of the CentOS8 merge?

Comment 18 M. Scherer 2020-01-23 16:07:44 UTC
She also found that ./tests/features/ssl-authz.t is failling because there is a test that verify the memory consumption, and after discussing, we suspect that the limit might be too low. It sometime work, sometime don't. 
It was limited at 5 Mb, but using a different openssl version (like on centos 8 vs 7) could consume more memory (because there would be differents ciphers enabled by default, or just different code).

I think someone should take a look at that, the related bug is https://bugzilla.redhat.com/show_bug.cgi?id=1768407 (and so the test). We found during our test that it did consume 5.6 M

Comment 19 Deepshikha khandelwal 2020-02-13 06:01:20 UTC
After fixing infra issue (env.rc not found), 11 tests are still failing https://build.gluster.org/job/testing-regression-on-centos8/26/consoleFull


./tests/bugs/posix/bug-990028.t is failing -- no space left on device 

./tests/bugs/nfs/bug-1157223-symlink-mounting.t -- timeout issue

./tests/bugs/nfs/bug-1116503.t -- timeout issue

./tests/bugs/fuse/many-groups-for-acl.t -- timeout issue

./tests/basic/afr/entry-self-heal.t -- timeout issue

./tests/basic/afr/split-brain-healing-ctime.t -- timeout issue

./tests/basic/afr/split-brain-healing.t -- timeout issue

Most of them are timing out.

Comment 20 Michael S. 2020-02-13 08:48:08 UTC
the bug-990028.t is weird, we did see that when the test where run on ext4, but now it should be xfs everywhere, so that shouldn't be a issue, unless there is some default quota/setting changed somewhere.

Comment 21 Worker Ant 2020-03-12 12:36:45 UTC
This bug is moved to https://github.com/gluster/project-infrastructure/issues/20, and will be tracked there from now on. Visit GitHub issues URL for further details