Hide Forgot
Created attachment 834644 [details] enigne log Description of problem: Doing some NFS failure tests the master storage domain version has been incremented. Nevertheless the domain information on the NFS server was not updated. Important lines of the log are: 2013-12-09 21:58:59,015 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [27d543d8] START, DeactivateStorageDomainVDSCommand( storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, ignoreFailoverLimit = false, storageDomainId = 272ec473-6041-42ee-bd1a-732789dd18d4, masterDomainId = 2c51d320-88ce-4f23-8215-e15f55f66906, masterVersion = 3), log id: 7cc2cca 2013-12-09 21:59:07,665 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [27d543d8] FINISH, DeactivateStorageDomainVDSCommand, log id: 7cc2cca 2013-12-09 21:59:07,668 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-6-thread-50) [27d543d8] Lock freed to object EngineLock [exclusiveLocks= key: 272ec473-6041-42ee-bd1a-732789dd18d4 value: STORAGE key: 94ed7a19-fade-4bd6-83f2-2cbb2f730b95 value: POOL , sharedLocks= ] 2013-12-09 21:59:07,715 INFO [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-6-thread-45) [27d543d8] After deactivate treatment vds: colovn03,pool Collogia 2013-12-09 21:59:07,720 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-6-thread-45) [27d543d8] START, RefreshStoragePoolVDSCommand(HostName = colovn03, HostId = 5d233303-559e-4602-88fb-de4e07170261, storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, masterStorageDomainId=2c51d320-88ce-4f23-8215-e15f55f66906, masterVersion=3), log id: 655db14d <<<<<<< OLD VERSION ! 2013-12-09 21:59:07,742 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-6-thread-45) [27d543d8] FINISH, RefreshStoragePoolVDSCommand, log id: 655db14d 2013-12-09 21:59:07,747 INFO [org.ovirt.engine.core.bll.storage.DisconnectStorageServerConnectionCommand] (pool-6-thread-50) Running command: DisconnectStorageServerConnectionCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2013-12-09 21:59:07,753 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (pool-6-thread-50) START, DisconnectStorageServerVDSCommand(HostName = colovn03, HostId = 5d233303-559e-4602-88fb-de4e07170261, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: ff550c17-a171-4a46-93b5-3ae4c8600f60, connection: 10.10.30.252:/var/nas2/OVirtIB, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 642e6e5a 2013-12-09 21:59:10,289 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (pool-6-thread-50) FINISH, DisconnectStorageServerVDSCommand, return: {ff550c17-a171-4a46-93b5-3ae4c8600f60=0}, log id: 642e6e5a 2013-12-09 21:59:10,309 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-50) Correlation ID: 27d543d8, Job ID: 235fb0d2-808e-434c-8db8-0f701476bcdb, Call Stack: null, Custom Event ID: -1, Message: Storage Domain colnas02_IB (Data Center Collogia) was deactivated by admin@internal 2013-12-09 22:00:08,561 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (ajp--127.0.0.1-8702-6) [3518986c] Lock Acquired to object EngineLock [exclusiveLocks= key: 2c51d320-88ce-4f23-8215-e15f55f66906 value: STORAGE key: 94ed7a19-fade-4bd6-83f2-2cbb2f730b95 value: POOL , sharedLocks= ] 2013-12-09 22:00:08,620 INFO [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-6-thread-50) [3518986c] Running command: DeactivateStorageDomainCommand internal: false. Entities affected : ID: 2c51d320-88ce-4f23-8215-e15f55f66906 Type: Storage 2013-12-09 22:00:08,661 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [3518986c] START, DeactivateStorageDomainVDSCommand( storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, ignoreFailoverLimit = false, storageDomainId = 2c51d320-88ce-4f23-8215-e15f55f66906, masterDomainId = 965ca3b6-4f9c-4e81-b6e8-5ed4a9e58545, masterVersion = 4), log id: 24d2cc8 <<<< NEW VERSION !!! 2013-12-09 22:00:08,705 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [3518986c] Failed in DeactivateStorageDomainVDS method 2013-12-09 22:00:08,708 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [3518986c] Error code StorageDomainActionError and error message IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=2c51d320-88ce-4f23-8215-e15f55f66906, spUUID=94ed7a19-fade-4bd6-83f2-2cbb2f730b95, msdUUID=965ca3b6-4f9c-4e81-b6e8-5ed4a9e58545, masterVersion=4',) 2013-12-09 22:00:08,713 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-50) [3518986c] IrsBroker::Failed::DeactivateStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=2c51d320-88ce-4f23-8215-e15f55f66906, spUUID=94ed7a19-fade-4bd6-83f2-2cbb2f730b95, msdUUID=965ca3b6-4f9c-4e81-b6e8-5ed4a9e58545, masterVersion=4',) Version-Release number of selected component (if applicable): ovirt engine 3.3.1 How reproducible: Unknown Steps to Reproduce: Do not know ho to reproduce Actual results: Domain is inaccessible afterwards. Expected results: Domain should recover Additional info: log attached. storage domain was reactivated in database with SQL "update storage_pool set master_domain_version=3;"
Created attachment 834665 [details] vdsm log
could you please add additional info before rca- number of domains/hosts/vdsm log. thanks!
Ok here we go: we have three NFS storage servers with a dedicated network segment colnas01 - 10.10.30.251 colnas02 - 10.10.30.252 colnas03 - 10.10.30.253 One ovirt node based on Fedora 19. For each storage we have created a domain. colovn03 - 192.168.10.53 - 10.10.30.3 [root@colovn03 ~]# df Dateisystem 1K-blocks Benutzt Verfügbar Verw% 10.10.30.253:/var/nas3/OVirtIB 7810410496 4616333312 3194077184 60% /rhev/data-center/mnt/10.10.30.253:_var_nas3_OVirtIB 10.10.30.251:/var/nas1/OVirtIB 11611801600 9678794752 1933006848 84% /rhev/data-center/mnt/10.10.30.251:_var_nas1_OVirtIB 10.10.30.252:/var/nas2/OVirtIB 11611801600 10525883392 1085918208 91% /rhev/data-center/mnt/10.10.30.252:_var_nas2_OVirtIB Ovirt engine based on centos 6.5 colove01 - 192.168.10.110 Any other things you need?
ReconstructMasterDomain should have been performed to the other given domain in the provided scenario. The provided patch should solve that.
Is the bugfix in 3.3.2 or do we have to wait until 3.4?
3.3.2 GA'd today. allon, please review if material for 3.3 stable branch for 3.3.3.
Thanks for the update. Will the patch find its way into the next to be released ovirt version? Be it 3.3.3 or 3.4.0? The critical side effects are not to be neglegted.
There's been a slight bookkeeping snafu here, I'm afraid. The provided patch, (oVirt gerrit #20564) was originally intended to fix bug 1023741, but solves this one too. Since the usecases are a bit different, and since one bug is reported on oVirt and one on RHEVM, both bugs were left open instead of closing as dup. The provided patch (oVirt gerrit #20564) was merged to ovirt-engine's master branch and will be available in 3.4. However, it was also backported to the ovirt-engine-3.3 branch (oVirt gerrit #20584, added to the external trackers), and is available in the released oVirt Engine 3.3.2 release. Setting target version back to 3.3.2. Markus - I'd be glad if you could verify this scenario.
I could make a test for you if I only knew how to reproduce the error. As stated it occured somewhere in my failure test but I cannot remember what was the source of it all.
The verification we did here was by manually updating the database, which simulates the scenario but does not really reproduce it. Move the BZ to CLOSED CURRENTRELEASE - if you encounter it again, by all means, feel free to reopen.