Hide Forgot
Description of problem: when runing 5.6.z virt-tests , lots of pv guests failed for Hinstallation timed out .like http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/01/455/45581/93230/1037413///rhel5u6_i386_pv_install.log at the bottom we can see [18;69Hinstallation timed out. then job failed . Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: 5.6 virt-test on kernel 2.6.18-194.el5 https://beaker.engineering.redhat.com/jobs/46337 https://beaker.engineering.redhat.com/jobs/46338 https://beaker.engineering.redhat.com/jobs/46339 https://beaker.engineering.redhat.com/jobs/46340 5.6.z virt-test on kernel 2.6.18-238.1.1.el5 https://beaker.engineering.redhat.com/jobs/45581 https://beaker.engineering.redhat.com/jobs/46019 https://beaker.engineering.redhat.com/jobs/46020 https://beaker.engineering.redhat.com/jobs/46021 Expected results: https://beaker.engineering.redhat.com/jobs/46018 Additional info:
Looks like installations are happening but painfully slow and it's eventually timing out.. will look in deeper since this symptom have been seen before.
I also thought that maybe virt install timeout , so I deleted "<param name="KILLTIMEOVERRIDE" value="59280"/>" option in task /distribution/virt/install and submited another job , but with the same result. see https://beaker.engineering.redhat.com/jobs/47005
Marian, Bill - any idea what's wrong here? I'm wondering if this wasn't really a temporary issue in the lab (slow filer? too many machines installing at one?).
Hangbin: the KILLTIMEOVERRIDE specifies only total test run time and if this was the problem you would see either Local or External Watchdog. This looks more like the test's internal timeout: I think it may be the one in virtinstall.exp which seems to be hardcoded in the test: > set timeout 7200 As for why this happens: 1. These failure seem to happen in clusters of multiple jobs failing at approximately the same time and 2. I noticed the majority of virt-install tasks (at least recently) is Pass and it's mostly jobs containing GuestOne and GuestTwo failing consistently. Which suggest it may be _recipe related_: Few possible failures: 1. Could it be wrong hypervisor setup? Have it checked by someone more familiar with virtual workflow than I am. 2. Is there any pattern what distro/version runs on the host of failed tests? 3. Could it be bottle-neck on (network | storage) side?
ping - is this still an issue? If I don't hear back in a week I'll close.
Still meet this question (JobID 65400) , I also think the reason is "set timeout 7200 " , but I think the best way is ask the script's owner .
Gurhan , How do you think ? If really the timeout parameter cause it , can you please change this parameter longer ? Thanks Hangbin
7200 is two hours, two hours should be plenty enough for the installations to go thru, if it's way too slow to finish it in 2 hours, then it should be reported as a bug. If this is still happening and need this increased for a reason , let me know.
There hasn't been an update to this bug for 3 months, i am closing it as NOTABUG anymore, it'd come up has this been still an issue.