Bug 1777193
Summary: | CI run platform upgrade | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Sunil Kumar Acharya <sheggodu> |
Component: | project-infrastructure | Assignee: | Deepshikha khandelwal <dkhandel> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6 | CC: | bugs, gluster-infra, misc, mscherer, pasik, sankarshan.mukhopadhyay, ykaul |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-12 12:36:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sunil Kumar Acharya
2019-11-27 05:42:25 UTC
so, what do you mean by "upgrade the CI platform" ? Is the request "we need to have Centos 8 builders", if so, for what jobs ? Yes, we need to have centos 8 builders for gluster regression/nightly runs. I doubt that we just need it for regression job. Ok, let's convert builder 30 to 39 to Centos 8. I will remove them from jenkins, erase them and reinstall on Centos 8, then we can test the build. I am doing a test on 38 and 39 for now, no need to test on 10 hosts if that do not work. So, we do have 2 centos 8 builders, but, package names changed (yeah). For example: - python-libcloud, cppcheck, python-docopt no longer exist I think we should have split the package by job we wanted to run, cause right now, I can't figure what is important and what is not. Even worst: fatal: [builder39.int.rht.gluster.org]: FAILED! => {"changed": false, "failures": ["No package cmockery2-devel available.", "No package lvm2-devel available.", "No package python-devel available.", "No package userspace-rcu-devel available.", "No package python-requests available.", "No package python-flask available.", "No package python-prettytable available.", "No package python-virtualenv available.", "No package pyxattr available.", "No package lcov available.", "No package bzr available."], "msg": ["Failed to install some of the specified packages"], "rc": 1, "results": []} So: - bzr, not sure why we need it (GD2 testing, guess we can clean it) - cmockery2-devel, not there - lvm2-devel and userspace-rcu-devel are in Centos-PowerTools - python-* , they were renamed to python3-* - no lcov either So I ma going to remove bzr, try to figure something for python vs python3 naming, then we have to decide how we can build without cmockery So, i removed bzr and GD2 stuff. I moved a few job in separate file, make it easier to clean later. But that's still messy, and would need a big refactor. For example, I am not sure why we need some of the packages, I did some cleanup already, I will push later. Did someone already managed to build gluster on Centos 8, and run the regression test ? The lack of cmokery2 seems to be blocking to me. So, testing the regression, I did hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1779742 (In reply to M. Scherer from comment #9) > So, testing the regression, I did hit this bug: > https://bugzilla.redhat.com/show_bug.cgi?id=1779742 With that CLOSED NEXTRELEASE, what would be the next steps for this bug? Let's give another test regression run on the Centos8 machine. Sankarshan, once this test regression run passes we should be having nightly runs on Centos8 builders. before the shutdown, I was running and disabling the test from the test suite one by one to decide if it work or not. Ideally, if someone could tell me "I did a run and it work fine each time" would bring me confidence that the builder is problematic, not that the test suite si broken. We have a testing jenkins job for regression on centos8. https://build.gluster.org/job/testing-regression-on-centos8/6/consoleFull Currently, 9 tests are failing. 9 test(s) failed ./tests/basic/playground/template-xlator-sanity.t ./tests/basic/trace.t ./tests/bugs/glusterfs-server/bug-887145.t ./tests/bugs/nfs/bug-1053579.t ./tests/bugs/nfs/bug-1116503.t ./tests/bugs/nfs/bug-1157223-symlink-mounting.t ./tests/bugs/posix/bug-990028.t ./tests/bugs/rpc/bug-954057.t ./tests/features/ssl-authz.t We have a testing jenkins job for regression on centos8. https://build.gluster.org/job/testing-regression-on-centos8/6/consoleFull Currently, 9 tests are failing. 9 test(s) failed ./tests/basic/playground/template-xlator-sanity.t ./tests/basic/trace.t ./tests/bugs/glusterfs-server/bug-887145.t ./tests/bugs/nfs/bug-1053579.t ./tests/bugs/nfs/bug-1116503.t ./tests/bugs/nfs/bug-1157223-symlink-mounting.t ./tests/bugs/posix/bug-990028.t ./tests/bugs/rpc/bug-954057.t ./tests/features/ssl-authz.t Discussing with deepshika, she pointed that a few tests are related to RPC. ./tests/basic/playground/template-xlator-sanity.t not ok 3 [ 23/ 1198] < 24> './tests/basic/playground/../rpc-coverage.sh --no-locks /mnt/glusterfs/0' -> '' ./tests/basic/trace.t not ok 3 [ 22/ 1189] < 25> './tests/basic/rpc-coverage.sh --no-locks /mnt/glusterfs/0' -> '' ./tests/bugs/glusterfs-server/bug-887145.t ok 20 [ 25/ 2] < 61> '[ 0 -eq 0 ]' touch: cannot touch '/mnt/glusterfs/0/dir/file': Permission denied not ok 21 [ 20/ 10] < 65> 'touch /mnt/glusterfs/0/dir/file' -> '' touch: cannot touch '/mnt/nfs/0/dir/foo': Permission denied not ok 22 [ 20/ 11] < 66> 'touch /mnt/nfs/0/dir/foo' -> '' mkdir: cannot create directory ‘/mnt/glusterfs/0/dir/new’: Permission denied not ok 23 [ 20/ 8] < 67> 'mkdir /mnt/glusterfs/0/dir/new' -> '' mkdir: cannot create directory ‘/mnt/nfs/0/dir/other’: Permission denied not ok 24 [ 20/ 7] < 68> 'mkdir /mnt/nfs/0/dir/other' -> '' ok 25 [ 20/ 10] < 69> 'rm -f /mnt/glusterfs/0/dir/file /mnt/glusterfs/0/dir/foo' rmdir: failed to remove '/mnt/nfs/0/dir/*': No such file or directory not ok 26 [ 23/ 6] < 70> 'rmdir /mnt/nfs/0/dir/*' -> '' 887145, 1053579, 1116503, 1157223 are NFS related So I guess we should focus on seeing what is failling for that. So, Deepshika did diagnose some tests and found that it is related to the merge of nfsnobody and nobody user on Centos 8, and that there is a patch for that on https://review.gluster.org/#/c/glusterfs/+/23710/ (In reply to M. Scherer from comment #16) > So, Deepshika did diagnose some tests and found that it is related to the > merge of nfsnobody and nobody user on Centos 8, and that there is a patch > for that on https://review.gluster.org/#/c/glusterfs/+/23710/ This is excellent to know - out of general curiosity - how was the issue triangulated to be because of the CentOS8 merge? She also found that ./tests/features/ssl-authz.t is failling because there is a test that verify the memory consumption, and after discussing, we suspect that the limit might be too low. It sometime work, sometime don't. It was limited at 5 Mb, but using a different openssl version (like on centos 8 vs 7) could consume more memory (because there would be differents ciphers enabled by default, or just different code). I think someone should take a look at that, the related bug is https://bugzilla.redhat.com/show_bug.cgi?id=1768407 (and so the test). We found during our test that it did consume 5.6 M After fixing infra issue (env.rc not found), 11 tests are still failing https://build.gluster.org/job/testing-regression-on-centos8/26/consoleFull ./tests/bugs/posix/bug-990028.t is failing -- no space left on device ./tests/bugs/nfs/bug-1157223-symlink-mounting.t -- timeout issue ./tests/bugs/nfs/bug-1116503.t -- timeout issue ./tests/bugs/fuse/many-groups-for-acl.t -- timeout issue ./tests/basic/afr/entry-self-heal.t -- timeout issue ./tests/basic/afr/split-brain-healing-ctime.t -- timeout issue ./tests/basic/afr/split-brain-healing.t -- timeout issue Most of them are timing out. the bug-990028.t is weird, we did see that when the test where run on ext4, but now it should be xfs everywhere, so that shouldn't be a issue, unless there is some default quota/setting changed somewhere. This bug is moved to https://github.com/gluster/project-infrastructure/issues/20, and will be tracked there from now on. Visit GitHub issues URL for further details |