Description of problem: Can't start vm after upgrading from 3.3.0 to 3.3.1 Error message doesn't appear on ui attaching the engine.log
Created attachment 837304 [details] engine log
I'm pretty sure this has already been fixed in 3.3.2. considering its in RC, please try to reproduce with it first.
same problem after upgrading to 3.3.2 RC 2013-12-17 11:12:19,068 ERROR [org.ovirt.engine.core.bll.SearchQuery] (ajp--127.0.0.1-8702-9) Query SearchQuery failed. Exception message is StatementCallback; bad SQL grammar [SELECT * FROM (SELECT * FROM storage_domains_for_search WHERE ( id IN (SELECT storage_domains_with_hosts_view.id FROM storage_domains_with_hosts_view )) ORDER BY storage_name ASC ) as T1 OFFSET (1 -1) LIMIT 100]; nested exception is org.postgresql.util.PSQLException: ERROR: relation "tt_temp22" does not exist Where: SQL statement "truncate table tt_TEMP22"
Please attach DB backup before upgrade
I'm attaching 3 sql files. one for 3.3.0 one for 3.3.1 and one for 3.3.2 rc
Created attachment 837695 [details] db1
Created attachment 837698 [details] db2
Created attachment 837699 [details] db3
Ohad, is that Still relevant ???
yes it is still relevant.
Can this bug get assigned? it's currently blocking using nested virtualization on oVirt.
The real cause of that is not the DB , before that , there is a canDoAction failure of teh RunVm comamnd : 2013-12-16 17:50:31,187 WARN [org.ovirt.engine.core.bll.RunVmCommand] (ajp--127.0.0.1-8702-2) [22dda41f] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM Please attach also vdsm.log in order to see more info on this failure
After analysing with remote debug , it failed on SchedulingManager::canSchedule We got an empty Host list of available Hosts altough we have one Host Up and running (the Host is filter out by the scheduling/policy mechanism) RunVmCommand fails on canDoAction on the following code : --snip-- if (!SchedulingManager.getInstance().canSchedule( vdsGroup, vm, vdsBlackList, vdsWhiteList, destVds, messages)) { return false; } --snip-- Moving the bug to SLA
(In reply to Eli Mesika from comment #16) > After analysing with remote debug , it failed on > SchedulingManager::canSchedule > We got an empty Host list of available Hosts altough we have one Host Up and > running (the Host is filter out by the scheduling/policy mechanism) > > RunVmCommand fails on canDoAction on the following code : > > --snip-- > > if (!SchedulingManager.getInstance().canSchedule( > vdsGroup, vm, vdsBlackList, vdsWhiteList, destVds, > messages)) { > return false; > } > > --snip-- > > Moving the bug to SLA In such a case the log provides filtering information (which hosts filtered out by which filter). The attached log does not show this. Can you please provide a relevant log / access to the system?
Doron is right, the logs are stale, martin already provided a fix for engine logging issues [1]. The current issue that Ohad is experiencing is a storage issue. from vdsm log: Thread-1585711::ERROR::2014-01-05 14:03:25,214::vm::2062::vm.Vm::(_startUnderlyingVm) vmId=`774c712d-a130-48ca-8280-be847e80f216`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2022, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2820, in _run self.preparePaths(devices[DISK_DEVICES]) File "/usr/share/vdsm/vm.py", line 2083, in preparePaths drive['path'] = self.cif.prepareVolumePath(drive, self.id) File "/usr/share/vdsm/clientIF.py", line 256, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', 'format': 'cow', 'bootOrder': '2', 'poolID': '3e0452c8-21b5-4df1-aced-44bc4c568229', 'volumeID': 'ace57d6a-0f7c-4ee5-a072-46ecb4df8ffe', 'apparentsize': '1073741824', 'imageID': 'c6978895-ab61-4df1-889c-d1319957574c', 'specParams': {}, 'readonly': 'false', 'domainID': 'eb317662-1dab-458b-b9dd-82634e6222ae', 'optional': 'false', 'deviceId': 'c6978895-ab61-4df1-889c-d1319957574c', 'truesize': '1073741824', 'address': {'bus': '0x00', ' slot': '0x06', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'} We try to run the vm on the host, fail ^^, try to rerun, and left with no hosts. That's why second iteration of canSchedule fails on no hosts, and there is no error popup for it since it's internal. [1] https://bugzilla.redhat.com/1002005
Re-assigning to storage based on Gilad's findings.
setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc.
(In reply to Gilad Chaplik from comment #19) > Doron is right, the logs are stale, martin already provided a fix for engine > logging issues [1]. > > The current issue that Ohad is experiencing is a storage issue. > > from vdsm log: > > Thread-1585711::ERROR::2014-01-05 > 14:03:25,214::vm::2062::vm.Vm::(_startUnderlyingVm) > vmId=`774c712d-a130-48ca-8280-be847e80f216`::The vm start process failed > Traceback (most recent call last): > File "/usr/share/vdsm/vm.py", line 2022, in _startUnderlyingVm > self._run() > File "/usr/share/vdsm/vm.py", line 2820, in _run > self.preparePaths(devices[DISK_DEVICES]) > File "/usr/share/vdsm/vm.py", line 2083, in preparePaths > drive['path'] = self.cif.prepareVolumePath(drive, self.id) > File "/usr/share/vdsm/clientIF.py", line 256, in prepareVolumePath > raise vm.VolumeError(drive) > VolumeError: Bad volume specification {'index': 0, 'iface': 'virtio', > 'reqsize': '0', 'format': 'cow', 'bootOrder': '2', 'poolID': > '3e0452c8-21b5-4df1-aced-44bc4c568229', 'volumeID': > 'ace57d6a-0f7c-4ee5-a072-46ecb4df8ffe', 'apparentsize': '1073741824', > 'imageID': 'c6978895-ab61-4df1-889c-d1319957574c', 'specParams': {}, > 'readonly': 'false', 'domainID': 'eb317662-1dab-458b-b9dd-82634e6222ae', > 'optional': 'false', 'deviceId': 'c6978895-ab61-4df1-889c-d1319957574c', > 'truesize': '1073741824', 'address': {'bus': '0x00', ' slot': '0x06', ' > domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'device': 'disk', > 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'} > > We try to run the vm on the host, fail ^^, try to rerun, and left with no > hosts. That's why second iteration of canSchedule fails on no hosts, and > there is no error popup for it since it's internal. > > [1] https://bugzilla.redhat.com/1002005 Where did you take this log from? the relevant part is missing from this excerpt
(In reply to Ayal Baron from comment #22) .. .. > Where did you take this log from? the relevant part is missing from this > excerpt attached vdsm log.
(In reply to Gilad Chaplik from comment #23) > (In reply to Ayal Baron from comment #22) > > .. > .. > > > Where did you take this log from? the relevant part is missing from this > > excerpt > > attached vdsm log. Sorry, looks like initial download wasn't fully successful and I was looking at a partial log. Looks like the images directory on the domain doesn't exist for some reason. Thread-1585711::ERROR::2014-01-05 14:03:25,206::blockVolume::446::Storage.Volume::(validateImagePath) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/blockVolume.py", line 444, in validateImagePath os.mkdir(imageDir, 0755) OSError: [Errno 2] No such file or directory: '/rhev/data-center/3e0452c8-21b5-4df1-aced-44bc4c568229/eb317662-1dab-458b-b9dd-82634e6222ae/images/c6978895-ab61-4df1-889c-d1319957574c'
(In reply to Ayal Baron from comment #24) > (In reply to Gilad Chaplik from comment #23) > > (In reply to Ayal Baron from comment #22) > > > > .. > > .. > > > > > Where did you take this log from? the relevant part is missing from this > > > excerpt > > > > attached vdsm log. > > Sorry, looks like initial download wasn't fully successful and I was looking > at a partial log. > > Looks like the images directory on the domain doesn't exist for some reason. > > Thread-1585711::ERROR::2014-01-05 > 14:03:25,206::blockVolume::446::Storage.Volume::(validateImagePath) > Unexpected error > Traceback (most recent call last): > File "/usr/share/vdsm/storage/blockVolume.py", line 444, in > validateImagePath > os.mkdir(imageDir, 0755) > OSError: [Errno 2] No such file or directory: > '/rhev/data-center/3e0452c8-21b5-4df1-aced-44bc4c568229/eb317662-1dab-458b- > b9dd-82634e6222ae/images/c6978895-ab61-4df1-889c-d1319957574c' Logs don't contain enough info to understand how setup reached this situation and the lgos no longer exist. If this happens again, please reopen.