Bug 890022

Summary: 3.2 - host-deploy: enforce random vdsm id when detecting duplication
Product: Red Hat Enterprise Virtualization Manager Reporter: Alon Bar-Lev <alonbl>
Component: ovirt-engineAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: bazulay, danken, dyasny, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---Keywords: Improvement
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Infra
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-31 06:45:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alon Bar-Lev 2012-12-24 11:46:52 UTC
Currently we fail installation if we detect a duplicate vdsm id reported by host. This was ever since 3.0.

With ovirt-host-deploy there is the option of enforcing vdsm id from engine, so we can detect this state and enforce random id.

The problem is that if I add two addresses of the *SAME* host, we think that the second addition is a duplicate vdsm id, and generate our own. As a result we have two hosts up and running at engine side, polling the same vdsm, while its unique id is the random enforced one (the first can be either random or real bios uuid).

Pros/Cons for that?

Comment 1 Alon Bar-Lev 2012-12-24 11:54:22 UTC
This keep be ask, I thought that a proper discussion is in order, so I can refer to it.

Comment 2 Alon Bar-Lev 2012-12-24 11:54:56 UTC
commit 47df0540d741bdc9380873309966d5c47fe665a1
Author: Alon Bar-Lev <alonbl>
Date:   Mon Dec 24 11:45:23 2012 +0200

    host-deploy: enforce random vdsm id in case of duplicate
    
    CURRENT IMPLEMENTATION
    
    Fail installation if a duplicate vdsm id is reported by vdsm.
    
    PROBLEM IN CURRENT IMPLEMENTATION
    
    Manual intervention required:
    
     # uuidgen > /etc/vdsm/vdsm.id
    
    NEW IMPLEMENTATION
    
    Enforce random vdsm id when duplicate is reported.
    
    PROBLEM IN NEW IMPLEMENTATION
    
    A host can be added twice with different addresses/names, results in
    duplicate active hosts polling the same vdsm.
    
    Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=890022
    Change-Id: Id448585b502cd53510526bc0a09bd7f9cd22230b
    Signed-off-by: Alon Bar-Lev <alonbl>

http://gerrit.ovirt.org/10338

Comment 3 Alon Bar-Lev 2012-12-24 12:25:38 UTC
Test external tracker, should be nice but cannot add more than one in single commit?

Comment 4 Itamar Heim 2012-12-25 07:41:05 UTC
you can have vdsm report the unique id in getVdsStats and verify it matches the engine's.
so once a host will change its unique id you can detect this and move the "old host" to maint or something like that.
if we are assuming vdsm will go down and up for such a change, using getVdsCaps will suffice, and no need to check it on each getVdsStats.

Comment 5 Alon Bar-Lev 2012-12-25 09:10:44 UTC
Question: does it worth the effort, or better to leave implementation as-is.

Comment 6 Simon Grinberg 2012-12-25 16:42:49 UTC
(In reply to comment #5)
> Question: does it worth the effort, or better to leave implementation as-is.

My initial thoughts where it's worth the effort, because when it happens it's just to confusing to figure out the problem. 

Don't forget duplicate IDs happen in some blade centers, meaning you have a bunch of identical hosts, not just two. A simple mistake may lead to adding the same host twice. 

On the other hand, if I understand correctly, this can't happen with RHEV-H hosts if RHEV-H registers to the manager.

So for it to happen:
1. Must be RHEL hosts 
2. Hosts must have same bios UUID 
3. Hosts must have more then one IP to begin with. 
If the 3 above is accurate then you may leave the current implementation as is.

Comment 7 Alon Bar-Lev 2012-12-25 19:48:14 UTC
0. It can be RHEV-H.
3. Host can have single IP, but used as IP address, name, name.suffix, name.suffix.suffix.

I think Andrew did not want to handle this as it is less than 0.5%...