User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36 Build Identifier: After a successful upgrade of oVirt 3.1 > 3.2.2, every vm up and running and every function verified, I decide to even upgrade my cluster compability level to 3.2 from earlier 3.1. When trying to upgrade the Data Center to 3.2 I was adviced to even upgrade the Cluster level to 3.2. When trying to do that I was adviced to put my nodes in maint. mode so I did. Everything went OK according to event log in webadmin and now I put the nodes back UP again. None of the nodes got the SPM role and just keep switching over and over again trying to contend. As a result the Data Center was in a no go state and down. It the log file I found "ImageIsNotLegalChain: Image is not a legal chain: ('5d9cc5dc-7664-4624-8e72-479a7cec35f5',)" telling us that the image should be a node in a chain of volumes. When looking in the meta file we could not find a reference to a parent UUID. PUUID=00000000-0000-0000-0000-000000000000, this is a special value for no-such-volume VOLTYPE=INTERNAL So, My vm was up and running earlier in oVirt 3.2.2 and it just stopped working after I upgraded the compatibility level of the cluster. If I moved the image out from the file structure of the data domain everything should be happy but then I would miss my vm. So what I did was to change the VOLTYPE from "INTERNAL" to "LEAF". I puted one of the nodes in maint. mode. The last node I stopped vdsmd on. Then I stopped ovirt-engine on the management server. Changed the VOLTYPE from INTERNAL to LEAF. Started the ovirt-engine. Started the vdsmd on the first node After a short while it got the SPM role and the status of everything was up. I started the vm with the affected image and checked it booted OK. I actived node 2 Everything wen up and the log files are all happy. The affected image is an old image that was first created with ovirt 3.0 and then it has men exported and imported a couple of times. Reproducible: Always Expected Results: If there is bad images (wrong meta data) it should take care of that and mark these. What we want is the Data center up and running.
Ricky, thanks for your report! Can you attach vdsm.log? it would be useful to understand how and why the convertion code failed to convert this domain.
Aharon, 1. Do we test this scenario - upgrading from ovrit-3.0 to current release? 2. Can we reproduce this issue with current version?
Setting to urgent since such error in single disk brings down the whole system.
The tested version was 3.2, but we don't have such option in the version menu.
Created attachment 866007 [details] vdsm.log
(In reply to Nir Soffer from comment #2) > Aharon, > 1. Do we test this scenario - upgrading from ovrit-3.0 to current release? > 2. Can we reproduce this issue with current version? I will clarify that the upgrade was not from 3.0 > 3.2.2. The affected image was made with ovirt 3.0 (mars or april 2012) and then it has been running on ovirt 3.1 and at last it was running on ovirt 3.2.2. Just to clear things out. regards //Ricky
Setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc.
Aharon, 1. Do we test this scenario - upgrading from ovrit-3.0/3.1 to current release? 2. Can we reproduce this issue with current version?
This is an automated message. Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.
Tried to reproduce using 3.4 in order to upgrade DC we must upgrade cluster first, 1. I upgraded the cluster from 3.1 > 3.4 2. Upgraded DC from 3.1 > 3.4 VM is up and running, didn't see any error. anyway, logs attached.
Created attachment 873491 [details] logs
(In reply to Aharon Canan from comment #10) > Tried to reproduce using 3.4 > > in order to upgrade DC we must upgrade cluster first, > > 1. I upgraded the cluster from 3.1 > 3.4 > 2. Upgraded DC from 3.1 > 3.4 > > VM is up and running, didn't see any error. > > anyway, logs attached. How did you corrupt the volume?
Allon, I didn't reproduce the issue, just checked that the upgrade works like needed.
Aharon, to reproduce this bug, you must "corrupt" the meta data file of a domain. You should deactivate all hosts accessing the domain, then open the volume mata file and change the VOLTYPE from "LEAF" to "INTERNAL". Then activate a host and try to perform an upgrade.
The manual solution should be to either remove this corrupted volume of manually fix the metadata. In any event, I'm fine with having the automatic upgrade failing on a corrupted volume, especially in light of the fact that this is an upgrade to 3.2.2, which isn't exactly the latest version.