Bug 1548087 - failed to collect logs with No such file or directory error
Summary: failed to collect logs with No such file or directory error
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-log-collector
Classification: oVirt
Component: General
Version: 4.3.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Douglas Schilling Landgraf
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-22 16:38 UTC by Dafna Ron
Modified: 2018-03-14 10:48 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-09 15:25:18 UTC
oVirt Team: Integration
Embargoed:
dron: planning_ack?
dron: devel_ack?
dron: testing_ack?


Attachments (Terms of Use)

Description Dafna Ron 2018-02-22 16:38:31 UTC
we failed a test in OST 003_00_metrics_bootstrap.metrics_and_log_collector

The failure reason was no such file or directory. 

I don't know yet if its reproduced 100% or randomly but I will post further details as we have them. 


Link to Job:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5829/

Link to all logs:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5829/artifacts



/var/tmp:
drwxr-x--x. root abrt system_u:object_r:abrt_var_cache_t:s0 abrt
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.aLitM7
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.G2r7IM
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.kVymZE
-rw-------. root root unconfined_u:object_r:user_tmp_t:s0 rpm-tmp.uPDvvU
drwx------. root root system_u:object_r:tmp_t:s0       systemd-private-cd49c74726d5463f8d6f6502380e5e12-chronyd.service-i1T5IE
drwx------. root root system_u:object_r:tmp_t:s0       systemd-private-cd49c74726d5463f8d6f6502380e5e12-systemd-timedated.service-lhoUsS

/var/tmp/abrt:
-rw-------. root root system_u:object_r:abrt_var_cache_t:s0 last-via-server

/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-chronyd.service-i1T5IE:
drwxrwxrwt. root root system_u:object_r:tmp_t:s0       tmp

/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-chronyd.service-i1T5IE/tmp:

/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-systemd-timedated.service-lhoUsS:
drwxrwxrwt. root root system_u:object_r:tmp_t:s0       tmp

/var/tmp/systemd-private-cd49c74726d5463f8d6f6502380e5e12-systemd-timedated.service-lhoUsS/tmp:

/var/yp:
)
2018-02-22 07:24:05::DEBUG::__main__::251::root:: STDERR(/bin/ls: cannot open directory /rhev/data-center/mnt/blockSD/6babba93-09c8-4846-9ccb-07728f72eecb/master/tasks/bd563276-5092-4d28-86c4-63aa6c0b4344.temp: No such file or directory
)
2018-02-22 07:24:05::ERROR::__main__::832::root:: Failed to collect logs from: lago-basic-suite-master-host-0; /bin/ls: cannot open directory /rhev/data-center/mnt/blockSD/6babba93-09c8-4846-9ccb-07728f72eecb/master/tasks/bd563276-5092-4d28-86c4-63aa6c0b4344.temp: No such file or directory

Comment 1 Douglas Schilling Landgraf 2018-02-23 00:00:29 UTC
Thanks Dafna, a reproducer would be appreciated. I see you mentioned some 'suspected patches' in the mailing list [1]. Why are you suspecting from these changes?  Can you share how to setup such environment for a local test machine?

[1] https://www.mail-archive.com/infra@ovirt.org/msg32099.html

Sandro, any ideas?

Comment 2 Dafna Ron 2018-02-23 13:22:02 UTC
As Yaniv mentioned on the list, it is probably a race. 
the only thing I can think of that would reproduce it is to run OST locally several times and see if it happens randomly. maybe create a short sleep in the code on create storage domain for example to try and delay the previous tests from finishing? 

to run ost locally you need to clone ovirt-system-tests project and install lago. 
if you run ./run_suite <suite name> it would run the tests locally. 

In the mailing list I reported the patch  that failed the OST test. 
The way the automation works is that it tests a bunch of changes and zero's in to a single change that may be causing the issue. it does not however mean the change was at fault. 

if you want to create the enviornment without it being deleted you will need to install lago locally and run ost locally.

Comment 3 Douglas Schilling Landgraf 2018-02-23 22:09:07 UTC
(In reply to Dafna Ron from comment #2)
> As Yaniv mentioned on the list, it is probably a race. 
> the only thing I can think of that would reproduce it is to run OST locally
> several times and see if it happens randomly. maybe create a short sleep in
> the code on create storage domain for example to try and delay the previous
> tests from finishing? 

Thanks for information.  Yes, that would help. Specially, if we can't reproduce it.

> 
> to run ost locally you need to clone ovirt-system-tests project and install
> lago. 
> if you run ./run_suite <suite name> it would run the tests locally. 
> 
> In the mailing list I reported the patch  that failed the OST test. 
> The way the automation works is that it tests a bunch of changes and zero's
> in to a single change that may be causing the issue. it does not however
> mean the change was at fault. 
> 
> if you want to create the enviornment without it being deleted you will need
> to install lago locally and run ost locally.


$ rpm -qa | grep lago
python-lago-ovirt-0.6.0-1.fc27.noarch
python-lago-0.6.0-1.fc27.noarch
lago-ovirt-0.6.0-1.fc27.noarch
lago-0.6.0-1.fc27.noarch


<clone ovirt-system-tests>
$ ./run_suite.sh basic-suite-4.2
<snip>
+ lago init /home/douglas/ovirt-system-tests/deployment-basic-suite-4.2 /home/douglas/ovirt-system-tests/basic-suite-4.2/LagoInitFile --template-repo-path /home/douglas/ovirt-system-tests/basic-suite-4.2/template-repo.json
./run_suite.sh: line 84: lago: command not found

should it be lagocli instead ?

Comment 4 Dafna Ron 2018-02-26 10:43:59 UTC
(In reply to Douglas Schilling Landgraf from comment #3)
> (In reply to Dafna Ron from comment #2)
> > As Yaniv mentioned on the list, it is probably a race. 
> > the only thing I can think of that would reproduce it is to run OST locally
> > several times and see if it happens randomly. maybe create a short sleep in
> > the code on create storage domain for example to try and delay the previous
> > tests from finishing? 
> 
> Thanks for information.  Yes, that would help. Specially, if we can't
> reproduce it.
> 
> > 
> > to run ost locally you need to clone ovirt-system-tests project and install
> > lago. 
> > if you run ./run_suite <suite name> it would run the tests locally. 
> > 
> > In the mailing list I reported the patch  that failed the OST test. 
> > The way the automation works is that it tests a bunch of changes and zero's
> > in to a single change that may be causing the issue. it does not however
> > mean the change was at fault. 
> > 
> > if you want to create the enviornment without it being deleted you will need
> > to install lago locally and run ost locally.
> 
> 
> $ rpm -qa | grep lago
> python-lago-ovirt-0.6.0-1.fc27.noarch
> python-lago-0.6.0-1.fc27.noarch
> lago-ovirt-0.6.0-1.fc27.noarch
> lago-0.6.0-1.fc27.noarch
> 
> 
> <clone ovirt-system-tests>
> $ ./run_suite.sh basic-suite-4.2
> <snip>
> + lago init /home/douglas/ovirt-system-tests/deployment-basic-suite-4.2
> /home/douglas/ovirt-system-tests/basic-suite-4.2/LagoInitFile
> --template-repo-path
> /home/douglas/ovirt-system-tests/basic-suite-4.2/template-repo.json
> ./run_suite.sh: line 84: lago: command not found
> 
> should it be lagocli instead ?

no :) this are the packages I have: 

[dron@dron ds-ovirt-system-tests]$ rpm -qa |grep lago
lago-ovirt-0.44.0-1.el7.centos.noarch
python-lago-0.42.0-1.el7.centos.noarch
python-lago-ovirt-0.44.0-1.el7.centos.noarch
lago-0.42.0-1.el7.centos.noarch
[dron@dron ds-ovirt-system-tests]$ 


but ping me if there is any issue running tests

Comment 5 Douglas Schilling Landgraf 2018-03-09 15:25:18 UTC
As we talked, jenkins only triggered this one once and I can't reproduce as well.
For now, closing this report. Fell free to re-open Dafna.

Thanks!


Note You need to log in before you can comment on or make changes to this bug.