Bug 1039835 - Domain Master Version changed without success
Summary: Domain Master Version changed without success
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.3
Hardware: x86_64
OS: Linux
urgent
medium
Target Milestone: ---
: 3.3.2
Assignee: Liron Aravot
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-10 06:23 UTC by Markus Stockhausen
Modified: 2016-02-10 20:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-24 17:10:24 UTC
oVirt Team: Storage


Attachments (Terms of Use)
enigne log (137.13 KB, application/zip)
2013-12-10 06:23 UTC, Markus Stockhausen
no flags Details
vdsm log (1.03 MB, application/zip)
2013-12-10 08:43 UTC, Markus Stockhausen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 20564 0 None None None Never
oVirt gerrit 20584 0 None None None Never

Description Markus Stockhausen 2013-12-10 06:23:36 UTC
Created attachment 834644 [details]
enigne log

Description of problem:

Doing some NFS failure tests the master storage domain version has been incremented. Nevertheless the domain information on the NFS server was not updated. Important lines of the log are:

2013-12-09 21:58:59,015 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [27d543d8] START, DeactivateStorageDomainVDSCommand( storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, ignoreFailoverLimit = false, storageDomainId = 272ec473-6041-42ee-bd1a-732789dd18d4, masterDomainId = 2c51d320-88ce-4f23-8215-e15f55f66906, masterVersion = 3), log id: 7cc2cca
2013-12-09 21:59:07,665 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [27d543d8] FINISH, DeactivateStorageDomainVDSCommand, log id: 7cc2cca
2013-12-09 21:59:07,668 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-6-thread-50) [27d543d8] Lock freed to object EngineLock [exclusiveLocks= key: 272ec473-6041-42ee-bd1a-732789dd18d4 value: STORAGE
key: 94ed7a19-fade-4bd6-83f2-2cbb2f730b95 value: POOL
, sharedLocks= ]
2013-12-09 21:59:07,715 INFO  [org.ovirt.engine.core.bll.storage.AfterDeactivateSingleAsyncOperation] (pool-6-thread-45) [27d543d8] After deactivate treatment vds: colovn03,pool Collogia
2013-12-09 21:59:07,720 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-6-thread-45) [27d543d8] START, RefreshStoragePoolVDSCommand(HostName = colovn03, HostId = 5d233303-559e-4602-88fb-de4e07170261, storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, masterStorageDomainId=2c51d320-88ce-4f23-8215-e15f55f66906, masterVersion=3), log id: 655db14d <<<<<<< OLD VERSION !
2013-12-09 21:59:07,742 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-6-thread-45) [27d543d8] FINISH, RefreshStoragePoolVDSCommand, log id: 655db14d
2013-12-09 21:59:07,747 INFO  [org.ovirt.engine.core.bll.storage.DisconnectStorageServerConnectionCommand] (pool-6-thread-50) Running command: DisconnectStorageServerConnectionCommand internal: true. Entities affected :  ID: aaa00000-0000-0000-0000-123456789aaa Type: System
2013-12-09 21:59:07,753 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (pool-6-thread-50) START, DisconnectStorageServerVDSCommand(HostName = colovn03, HostId = 5d233303-559e-4602-88fb-de4e07170261, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: ff550c17-a171-4a46-93b5-3ae4c8600f60, connection: 10.10.30.252:/var/nas2/OVirtIB, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 642e6e5a
2013-12-09 21:59:10,289 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (pool-6-thread-50) FINISH, DisconnectStorageServerVDSCommand, return: {ff550c17-a171-4a46-93b5-3ae4c8600f60=0}, log id: 642e6e5a
2013-12-09 21:59:10,309 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-50) Correlation ID: 27d543d8, Job ID: 235fb0d2-808e-434c-8db8-0f701476bcdb, Call Stack: null, Custom Event ID: -1, Message: Storage Domain colnas02_IB (Data Center Collogia) was deactivated by admin@internal
2013-12-09 22:00:08,561 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (ajp--127.0.0.1-8702-6) [3518986c] Lock Acquired to object EngineLock [exclusiveLocks= key: 2c51d320-88ce-4f23-8215-e15f55f66906 value: STORAGE
key: 94ed7a19-fade-4bd6-83f2-2cbb2f730b95 value: POOL
, sharedLocks= ]
2013-12-09 22:00:08,620 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (pool-6-thread-50) [3518986c] Running command: DeactivateStorageDomainCommand internal: false. Entities affected :  ID: 2c51d320-88ce-4f23-8215-e15f55f66906 Type: Storage
2013-12-09 22:00:08,661 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [3518986c] START, DeactivateStorageDomainVDSCommand( storagePoolId = 94ed7a19-fade-4bd6-83f2-2cbb2f730b95, ignoreFailoverLimit = false, storageDomainId = 2c51d320-88ce-4f23-8215-e15f55f66906, masterDomainId = 965ca3b6-4f9c-4e81-b6e8-5ed4a9e58545, masterVersion = 4), log id: 24d2cc8 <<<< NEW VERSION !!!
2013-12-09 22:00:08,705 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [3518986c] Failed in DeactivateStorageDomainVDS method
2013-12-09 22:00:08,708 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-6-thread-50) [3518986c] Error code StorageDomainActionError and error message IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=2c51d320-88ce-4f23-8215-e15f55f66906, spUUID=94ed7a19-fade-4bd6-83f2-2cbb2f730b95, msdUUID=965ca3b6-4f9c-4e81-b6e8-5ed4a9e58545, masterVersion=4',)
2013-12-09 22:00:08,713 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-50) [3518986c] IrsBroker::Failed::DeactivateStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=2c51d320-88ce-4f23-8215-e15f55f66906, spUUID=94ed7a19-fade-4bd6-83f2-2cbb2f730b95, msdUUID=965ca3b6-4f9c-4e81-b6e8-5ed4a9e58545, masterVersion=4',)



Version-Release number of selected component (if applicable):

ovirt engine 3.3.1


How reproducible:

Unknown

Steps to Reproduce:

Do not know ho to reproduce

Actual results:

Domain is inaccessible afterwards.

Expected results:

Domain should recover

Additional info:

log attached. storage domain was reactivated in database with SQL "update storage_pool set master_domain_version=3;"

Comment 1 Markus Stockhausen 2013-12-10 08:43:42 UTC
Created attachment 834665 [details]
vdsm log

Comment 2 Liron Aravot 2013-12-10 09:29:39 UTC
could you please add additional info before rca- number of domains/hosts/vdsm log.

thanks!

Comment 3 Markus Stockhausen 2013-12-10 09:49:01 UTC
Ok here we go:

we have three NFS storage servers with a dedicated network segment

colnas01 - 10.10.30.251
colnas02 - 10.10.30.252
colnas03 - 10.10.30.253

One ovirt node based on Fedora 19. For each storage we have created a domain. 

colovn03 - 192.168.10.53 - 10.10.30.3

[root@colovn03 ~]# df
Dateisystem                       1K-blocks     Benutzt  Verfügbar Verw% 10.10.30.253:/var/nas3/OVirtIB   7810410496  4616333312 3194077184   60% /rhev/data-center/mnt/10.10.30.253:_var_nas3_OVirtIB
10.10.30.251:/var/nas1/OVirtIB  11611801600  9678794752 1933006848   84% /rhev/data-center/mnt/10.10.30.251:_var_nas1_OVirtIB
10.10.30.252:/var/nas2/OVirtIB  11611801600 10525883392 1085918208   91% /rhev/data-center/mnt/10.10.30.252:_var_nas2_OVirtIB

Ovirt engine based on centos 6.5

colove01 - 192.168.10.110

Any other things you need?

Comment 4 Liron Aravot 2013-12-19 16:06:26 UTC
ReconstructMasterDomain should have been performed to the other given domain in the provided scenario.
The provided patch should solve that.

Comment 5 Markus Stockhausen 2013-12-19 19:38:40 UTC
Is the bugfix in 3.3.2 or do we have to wait until 3.4?

Comment 6 Itamar Heim 2013-12-20 01:50:42 UTC
3.3.2 GA'd today. allon, please review if material for 3.3 stable branch for 3.3.3.

Comment 7 Markus Stockhausen 2013-12-20 06:00:28 UTC
Thanks for the update. Will the patch find its way into the next to be released ovirt version? Be it 3.3.3 or 3.4.0? The critical side effects are not to be neglegted.

Comment 8 Allon Mureinik 2013-12-22 17:19:55 UTC
There's been a slight bookkeeping snafu here, I'm afraid.

The provided patch, (oVirt gerrit #20564) was originally intended to fix bug 1023741, but solves this one too.
Since the usecases are a bit different, and since one bug is reported on oVirt and one on RHEVM, both bugs were left open instead of closing as dup.

The provided patch (oVirt gerrit #20564) was merged to ovirt-engine's master branch and will be available in 3.4. However, it was also backported to the ovirt-engine-3.3 branch (oVirt gerrit #20584, added to the external trackers), and is available in the released oVirt Engine 3.3.2 release.

Setting target version back to 3.3.2.

Markus - I'd be glad if you could verify this scenario.

Comment 9 Markus Stockhausen 2013-12-23 08:54:29 UTC
I could make a test for you if I only knew how to reproduce the error. As stated it occured somewhere in my failure test but I cannot remember what was the source of it all.

Comment 10 Allon Mureinik 2013-12-24 17:10:24 UTC
The verification we did here was by manually updating the database, which simulates the scenario but does not really reproduce it.

Move the BZ to CLOSED CURRENTRELEASE - if you encounter it again, by all means, feel free to reopen.


Note You need to log in before you can comment on or make changes to this bug.