Description of problem: Action of delete VM failed since '_IOProcessOs' object has no attribute 'listdir' and storage domain become unknown. Version-Release number of selected component (if applicable): rhv-4.1.3-6 How reproducible: We see it in automation for the first time. till now happened once. Steps to Reproduce: Delete VM Actual results: Delete action failed and storage domain become unknown. Additional info: delete action (from art log): 2017-06-27 06:41:40,556 - MainThread - vms - INFO - Using Correlation-Id: vms_delete_6cc7db65-fa68-45a4 Engine log: 2017-06-27 06:41:48,031+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] Failed in 'DeleteImageGroupVDS' method 2017-06-27 06:41:48,046+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [vms_delete_6cc7db65-fa68-45a4] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM command DeleteImageGroupVDS failed: '_IOProcessOs' object has no attribute 'listdir' 2017-06-27 06:41:48,046+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] IrsBroker::Failed::DeleteImageGroupVDS: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = '_IOProcessOs' object has no attribute 'listdir', code = 100 2017-06-27 06:41:48,735+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] START, SpmStopVDSCommand(HostName = host_mixed_2, SpmStopVDSCommandParameters:{runAsync='true', hostId='d722c533-2b1e-4c0d-8bb5-b4e07ef04b26', storagePoolId='25a88c81-0e3d-4ba8-8444-19dffc5a38c7'}), log id: eb204f0 2017-06-27 06:41:49,062+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' as there are uncleared tasks 2017-06-27 06:41:49,062+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] FINISH, SpmStopVDSCommand, log id: eb204f0 2017-06-27 06:41:49,062+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (default task-15) [vms_delete_6cc7db65-fa68-45a4] IRS failover failed - can't allocate vds server 2017-06-27 06:41:49,062+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] FINISH, DeleteImageGroupVDSCommand, log id: 24eadc1 2017-06-27 06:41:49,063+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] Command 'org.ovirt.engine.core.bll.storage.disk.image.RemoveImageCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = '_IOProcessOs' object has no attribute 'listdir', code = 100 (Failed with error GeneralException and code 100) 2017-06-27 06:41:49,076+03 INFO [org.ovirt.engine.core.bll.storage.pool.SetStoragePoolStatusCommand] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 25a88c81-0e3d-4ba8-8444-19dffc5a38c7 Type: StoragePool 2017-06-27 06:41:49,077+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain 'e526aaaa-61af-41e6-8f4d-cd4f5af4b34d' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,078+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '4db55473-17f5-4670-bce8-1d55fc07eb16' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,078+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '14bcd23f-1775-49fa-9b3b-d9fecb4bf1a7' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,079+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '35b248b4-7208-4c13-9797-99af769dfc3c' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,079+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '9c9b1fef-4e2b-4c59-87b1-529a5581c922' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,079+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain 'a3bf0b0a-3570-4c6f-9a4e-823a6524c610' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,079+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '3abce0aa-7d81-4222-9a83-f0bd41d4a969' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,080+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '0cbf468e-af80-4175-814c-b098524bca4f' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,080+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain 'a2810ce6-3a0c-48db-9641-0e783c18a935' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,080+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '4a320b08-f6af-49fa-b20b-3c1913a5dbc3' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,080+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '7b63366c-fccf-49f2-a430-d8664fcf9b92' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,080+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '5d07ef06-ace8-4c73-9414-aad359612b81' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,080+03 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-13) [20b6d3a2] Storage Pool '25a88c81-0e3d-4ba8-8444-19dffc5a38c7' - Updating Storage Domain '8ddcbc09-a8e9-4a5c-855b-ac8cf4b4571c' status from 'Active' to 'Unknown', reason: null 2017-06-27 06:41:49,084+03 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (default task-15) [vms_delete_6cc7db65-fa68-45a4] Removed task 'd6bb3326-348c-4686-a730-3bc11e2eebde' from DataBase 2017-06-27 06:41:49,088+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RemoveAllVmImagesCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] Can't remove image id '3d6f632d-1e26-4f99-bc34-dd2c8631f49c' for VM id 'aee91749-a298-45f9-b9fc-70ca67a19081' from domain id '5d07ef06-ace8-4c73-9414-aad359612b81' due to: GeneralException. 2017-06-27 06:41:49,088+03 INFO [org.ovirt.engine.core.bll.storage.disk.image.RemoveAllVmImagesCommand] (default task-15) [vms_delete_6cc7db65-fa68-45a4] Image id '3d6f632d-1e26-4f99-bc34-dd2c8631f49c' is not on a data domain and will not be marked as illegal. Vdsm log: 2017-06-27 06:41:46,975+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='90e56977-a3d2-4e12-a8bd-6f4dffd47eb8') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "<string>", line 2, in deleteImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1473, in deleteImage pool.deleteImage(dom, imgUUID, volsByImg) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 2021, in deleteImage domain.deleteImage(domain.sdUUID, imgUUID, volsByImg) File "/usr/share/vdsm/storage/sd.py", line 665, in deleteImage self._manifest.deleteImage(sdUUID, imgUUID, volsImgs) File "/usr/share/vdsm/storage/fileSD.py", line 229, in deleteImage self.oop.os.rename(currImgDir, toDelDir) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 243, in rename for fname in self.listdir(oldpath): AttributeError: '_IOProcessOs' object has no attribute 'listdir' 2017-06-27 06:41:47,020+0300 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='90e56977-a3d2-4e12-a8bd-6f4dffd47eb8') aborting: Task is aborted: u"'_IOProcessOs' object has no attribute 'listdir'" - code 100 (task:1175) 2017-06-27 06:41:47,020+0300 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH deleteImage error='_IOProcessOs' object has no attribute 'listdir' (dispatcher:82) Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 72, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 105, in wrapper return m(self, *a, **kw) File "/usr/share/vdsm/storage/task.py", line 1183, in prepare raise self.error AttributeError: '_IOProcessOs' object has no attribute 'listdir' VM info partially from art log: <vms> <vm href="/ovirt-engine/api/storagedomains/5d07ef06-ace8-4c73-9414-aad359612b81/vms/aee91749-a298-45f9-b9fc-70ca67a19081" id="aee91749-a298-45f9-b9fc-70ca67a19081"> <actions> <link href="/ovirt-engine/api/storagedomains/5d07ef06-ace8-4c73-9414-aad359612b81/vms/aee91749-a298-45f9-b9fc-70ca67a19081/import" rel="import"/> <link href="/ovirt-engine/api/storagedomains/5d07ef06-ace8-4c73-9414-aad359612b81/vms/aee91749-a298-45f9-b9fc-70ca67a19081/register" rel="register"/> </actions> <name>virt_export_vm</name> <link href="/ovirt-engine/api/storagedomains/5d07ef06-ace8-4c73-9414-aad359612b81/vms/aee91749-a298-45f9-b9fc-70ca67a19081/disks" rel="disks"/> <link href="/ovirt-engine/api/storagedomains/5d07ef06-ace8-4c73-9414-aad359612b81/vms/aee91749-a298-45f9-b9fc-70ca67a19081/diskattachments" rel="diskattachments"/> <bios> <boot_menu> <enabled>false</enabled> </boot_menu> </bios> <cpu> <architecture>x86_64</architecture> <topology> <cores>1</cores> <sockets>1</sockets> <threads>1</threads> </topology> </cpu> <cpu_shares>0</cpu_shares> <creation_time>2017-06-27T06:40:02.000+03:00</creation_time> <delete_protected>false</delete_protected> <display> <allow_override>false</allow_override> <copy_paste_enabled>true</copy_paste_enabled> <disconnect_action>LOCK_SCREEN</disconnect_action> <file_transfer_enabled>true</file_transfer_enabled> <monitors>1</monitors> <single_qxl_pci>false</single_qxl_pci> <smartcard_enabled>false</smartcard_enabled> </display> <high_availability> <enabled>false</enabled> <priority>0</priority> </high_availability> <initialization/> <io> <threads>0</threads> </io> <large_icon href="/ovirt-engine/api/icons/cdd8f205-0957-4b7a-9eb2-49ad4d925456" id="cdd8f205-0957-4b7a-9eb2-49ad4d925456"/> <memory>1073741824</memory> <memory_policy> <guaranteed>1073741824</guaranteed> <max>4294967296</max> </memory_policy> <migration> <auto_converge>inherit</auto_converge> <compressed>inherit</compressed> </migration> <migration_downtime>-1</migration_downtime> <origin>ovirt</origin> <os> <boot> <devices> <device>hd</device> </devices> </boot> <type>rhel_7x64</type> </os> <small_icon href="/ovirt-engine/api/icons/d5807bea-b28f-4788-8113-53bfe41b433d" id="d5807bea-b28f-4788-8113-53bfe41b433d"/> <sso> <methods> <method id="guest_agent"/> </methods> </sso> <start_paused>false</start_paused> <stateless>false</stateless> <time_zone> <name>Etc/GMT</name> </time_zone> <type>desktop</type> <usb> <enabled>false</enabled> </usb> <quota id="49b91573-18e1-4dae-829d-45e7d9545467"/> <storage_domain href="/ovirt-engine/api/storagedomains/5d07ef06-ace8-4c73-9414-aad359612b81" id="5d07ef06-ace8-4c73-9414-aad359612b81"/> <next_run_configuration_exists>false</next_run_configuration_exists> <numa_tune_mode>interleave</numa_tune_mode> <placement_policy> <affinity>migratable</affinity> </placement_policy> <status>down</status> <original_template href="/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000" id="00000000-0000-0000-0000-000000000000"/> <template href="/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000" id="00000000-0000-0000-0000-000000000000"/> </vm> </vms>
Created attachment 1292608 [details] host_logs
Created attachment 1292609 [details] engine logs
Created attachment 1292610 [details] art log
Host info: RHEV-H from: http://download-node-02.eng.bos.redhat.com/devel/candidate-trees/RHVH-4.1-20170626.6/compose/RHVH/x86_64/iso/RHVH-4.1-20170626.6-RHVH-x86_64-dvd1.iso kernel release: 3.10.0-686.el7.x86_64 redhat release: Red Hat Virtualization Host 4.1 (el7.3) vdsm: vdsm-4.19.20-1.el7ev.x86_64
The attribute error fixed by bug 1444659. It's neat to see how something reported by pylint has a real-world impact :-) Something seems "off" with the flow here, although I can't quite put my finger on it. I think the first step is to fix the attribute error by backporting 5341a1, and then we can more easily research the real root cause. Although, given that: (In reply to Israel Pinto from comment #0) > How reproducible: > We see it in automation for the first time. till now happened once. and that this flow has been touched in ages, I wonder how much of a priority it is. Nir - what do you think? P.S.: I posted the backport to https://gerrit.ovirt.org/#/c/78808/, so if we have concerns wrt the patch, we can discuss there.
(In reply to Allon Mureinik from comment #6) > The attribute error fixed by bug 1444659. It's neat to see how something > reported by pylint has a real-world impact :-) > > Something seems "off" with the flow here, although I can't quite put my > finger on it. > I think the first step is to fix the attribute error by backporting 5341a1, > and then we can more easily research the real root cause. I agree, the current error make it very hard to debug, hiding the real error.
Israel - the attribute error is now fixed for 4.1.4 (will be available for testing in the upcoming build), and upstream since late April. If you ever notice a problem with removing a VM again, please open a clean bug with the new logs. Thanks!
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'v4.19.21' doesn't contain patch 'https://gerrit.ovirt.org/78808'] gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.21 For more info please contact: infra
Francesco, same issue again, patch was merged last week.
This is most definitely in the tag: amureini@mureinik ~/src/git/vdsm [master] $ git log -n1 f8285c517e0e076d35c826f5cbe0cd8d512e1fb6 commit f8285c517e0e076d35c826f5cbe0cd8d512e1fb6 Author: Nir Soffer <nsoffer> Date: Sat Apr 22 22:56:18 2017 +0300 pylint: Remove broken and unneeded code oop.os.rename() was trying to handle the case when the destination directory exists and is not empty. In this case, the code tried to delete the destination directory, and rename the source directory. This code was broken, failing with: AttributeError: '_IOProcessOs' object has no attribute 'listdir' The broken code was introduced in commit 576428e876c2 (ioprocess implementation), and is available since 4.16.0. The only effected flow is Image.delete, when we rename the image directory to remove_me_<uuid>. In this flow, we do not expect that the destination directory to exist. This patch removes the unneeded functionality, restoring POSIX semantics. Change-Id: I352e8643a47f517c879d2085bced6847e0372b30 Bug-Url: https://bugzilla.redhat.com/1465836 Reported-by: pylint Signed-off-by: Nir Soffer <nsoffer> amureini@mureinik ~/src/git/vdsm [master] $ git tag --contains f8285c517e0e076d35c826f5cbe0cd8d512e1fb6 v4.19.21 Eyal - can someone from your team please look into this? Is this perhaps since we don't seem to have a 4.19.21 target milestone?
I can approve this is in the v4.19.21 tag. We've identified an issue with the code that moves bugs to ON_QA. the problem was we've had an old outdated mirror of gerrit.ovirt.org which didn't have the new tag. This should be fixed for the next build, sorry for the noise, moving to ON_QA. BTW, the logic doesn't involve anything around target release missing (v4.19.21).
Verify with automation run: https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-compute/74/ Version: 4.1.4.1-0.1.el7