Bug 1574880

Summary: failing 004_basic_sanity.verify_suspend_resume_vm0
Product: [oVirt] ovirt-engine Reporter: Dafna Ron <dron>
Component: BLL.VirtAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED INSUFFICIENT_DATA QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: bugs, dfediuck, dron, rbarry
Target Milestone: ---Flags: dron: planning_ack?
dron: devel_ack?
dron: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-15 17:05:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dafna Ron 2018-05-04 09:09:01 UTC
We had a failure in OST that is not related to the change tested. 
Also, this is not the first time I saw this failure. 

This is the failed job: http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/7220/


from what I can see, the vm was suspended and then got a stop from an unknown processes.

2018-05-03T04:11:55.774660Z qemu-kvm: terminating on signal 15 from pid 4782 (<unknown process>)

grepping for the pid I can see errors in the audit log:

lago-basic-suite-master-host-1/_var_log/audit/audit.log:type=VIRT_RESOURCE msg=audit(1525320727.733:5420): pid=4782 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm resrc=rng reason=start vm="vm0" uuid=23e86d6b-ae2f-4e8e-b55e-339351c9a025 old-rng="?" new-rng="/dev/urandom" exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'

and I am not sure who this pid belongs to but on host deploy I can see it was used by ansible:

lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20180503000218-lago-basic-suite-master-host-1-4a67ad42.log:D: create     100644  1 (   0,   0)  4782 /usr/lib/python2.7/site-packages/ansible/modules/cloud/amazon/ec2_customer_gateway_facts.py;5aea8956
lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20180503000218-lago-basic-suite-master-host-1-4a67ad42.log:D: create     100644  2 (   0,   0)  4782 /usr/lib/python2.7/site-packages/ansible/module_utils/network/dellos10/dellos10.pyo;5aea8956
 

Another issue that ykaul noticed is that we are silently failing virt-sparsify (which is hopefully unrelated and my need a separate bug) - because we do not actually check the success of it (which is why its silent). 


2018-05-03 00:12:36,070-0400 DEBUG (periodic/3) [virt.periodic] Looking for stale paused VMs (periodic:388)
2018-05-03 00:12:36,080-0400 DEBUG (periodic/0) [virt.sampling.VMBulkstatsMonitor] sampled timestamp 4295744.91 elapsed 0.020 acquired True domains all (sampling:447)
2018-05-03 00:12:37,179-0400 DEBUG (tasks/8) [root] FAILED: <err> = "virt-sparsify: error: libguestfs error: guestfs_launch failed.\nThis usually means the libguestfs appliance failed to start or crashed.\nDo:\n  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1\nand run the command again.  For further information, read:\n  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs\nYou can also run 'libguestfs-test-tool' and post the *complete* output\ninto a bug report or message to the libguestfs mailing list.\n\nIf reporting bugs, run virt-sparsify with debugging enabled and include the \ncomplete output:\n\n  virt-sparsify -v -x [...]\n"; <rc> = 1 (commands:87)
2018-05-03 00:12:37,186-0400 INFO  (tasks/8) [storage.SANLock] Releasing Lease(name='4acdf7b8-2a3b-494b-b9db-aba65b78cbc6', path=u'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/a8bf97c4-0265-4fb0-b93a-e37ec766325c/images/5483a9bf-54b6-4826-9546-d1bf3d13dfa7/4acdf7b8-2a3b-494b-b9db-aba65b78cbc6.lease', offset=0) (clusterlock:435)
2018-05-03 00:12:37,192-0400 INFO  (tasks/8) [storage.SANLock] Successfully released Lease(name='4acdf7b8-2a3b-494b-b9db-aba65b78cbc6', path=u'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/a8bf97c4-0265-4fb0-b93a-e37ec766325c/images/5483a9bf-54b6-4826-9546-d1bf3d13dfa7/4acdf7b8-2a3b-494b-b9db-aba65b78cbc6.lease', offset=0) (clusterlock:444)
2018-05-03 00:12:37,193-0400 ERROR (tasks/8) [root] Job u'93d447b5-f5b5-45e2-9821-8e54dc04305d' failed (jobs:221)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/sparsify_volume.py", line 56, in _run
    virtsparsify.sparsify_inplace(self._vol_info.path)
  File "/usr/lib/python2.7/site-packages/vdsm/virtsparsify.py", line 71, in sparsify_inplace
    raise cmdutils.Error(cmd, rc, out, err)
Error: Command ['/usr/bin/virt-sparsify', '--machine-readable', '--in-place', u'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share1/a8bf97c4-0265-4fb0-b93a-e37ec766325c/images/5483a9bf-54b6-4826-9546-d1bf3d13dfa7/4acdf7b8-2a3b-494b-b9db-aba65b78cbc6'] failed with rc=1 out=['3/12'] err=['virt-sparsify: error: libguestfs error: guestfs_launch failed.', 'This usually means the libguestfs appliance failed to start or crashed.', 'Do:', '  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1', 'and run the command again.  For further information, read:', '  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs', "You can also run 'libguestfs-test-tool' and post the *complete* output", 'into a bug report or message to the libguestfs mailing list.', '', 'If reporting bugs, run virt-sparsify with debugging enabled and include the ', 'complete output:', '', '  virt-sparsify -v -x [...]']
2018-05-03 00:12:37,194-0400 INFO  (tasks/8) [root] Job u'93d447b5-f5b5-45e2-9821-8e54dc04305d' will be deleted in 3600 seconds (jobs:249)

all the logs can be found in the job

Comment 1 Doron Fediuck 2018-06-14 09:19:45 UTC
It this still relevant?

Comment 2 Ryan Barry 2018-08-15 17:05:00 UTC
Closing since there's no response.

Please re-open if this is still relevant, Dafna.

Comment 3 Red Hat Bugzilla 2023-09-14 04:27:39 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days