Bug 1016461 - [vdsm] engine fails to add host with vdsm version 4.13.0
Summary: [vdsm] engine fails to add host with vdsm version 4.13.0
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.4.0
Assignee: Yaniv Bronhaim
QA Contact: Tareq Alayan
URL:
Whiteboard: infra
Depends On:
Blocks: 1017285 1098767
TreeView+ depends on / blocked
 
Reported: 2013-10-08 08:31 UTC by Eyal Edri
Modified: 2016-02-10 19:36 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1017285 1098767 (view as bug list)
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (665.04 KB, application/x-bzip)
2013-10-08 08:41 UTC, Eyal Edri
no flags Details
engine-log (63.94 KB, application/x-tar-gz)
2014-05-19 12:15 UTC, Tareq Alayan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 19913 0 None None None Never
oVirt gerrit 19973 0 None None None Never

Description Eyal Edri 2013-10-08 08:31:44 UTC
Description of problem:

when trying to add host to engine running vdsm version 4.13.0-x host add fails. 


Version-Release number of selected component (if applicable):


How reproducible:
always 

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
proposed fix seems to be working - http://gerrit.ovirt.org/19913
but might hide a bigger problem below.

Comment 1 Eyal Edri 2013-10-08 08:41:01 UTC
Created attachment 809178 [details]
engine.log

error can be found on 2013-10-06 around 18:30 - 18:42

Comment 3 Yair Zaslavsky 2013-10-08 08:57:30 UTC
Let's handle the current issue and open a bug for others issues.

Comment 4 Dan Kenigsberg 2013-10-08 09:50:02 UTC
(In reply to Yair Zaslavsky from comment #3)
> Let's handle the current issue and open a bug for others issues.

This bug was opened in order to track the "other issues", which where supposedly triggered by

http://gerrit.ovirt.org/#/c/17719/
backend: Fixing log print when vdsm version is not supported in cluster

Engine must honor vdsm's supportedENGINEs list. If Engine sees its version there, it must accept this vdsm's version.

Comment 5 Yaniv Bronhaim 2013-10-08 12:55:11 UTC
Even if Vdsm version is not one of the SupportedVDSMVersions ??? why so? so why to have both?

Comment 6 Dan Kenigsberg 2013-10-08 13:46:44 UTC
We have Engine-3.0.0 deployed in the field for eons. We are about to deploy vdsm-4.13.0. Vdsm's supportedENGINEs is the only way to tell that old Engine that it should accept the new version.

We should maintain this future-compatibility feature and add a unit test so it does not break again.

Comment 7 Yaniv Bronhaim 2013-10-08 13:51:32 UTC
So why keeping the SupportedVDSMVersions? If we verify that the cluster is supported and the engine's version is supported by the host's vdsm process, what does SupportedVDSMVersions suppose to mean?

Comment 8 Dan Kenigsberg 2013-10-08 14:12:13 UTC
It's useful for the opposite use case, where a newly-deployed Engine meets an ancient Vdsm.

Comment 9 Yaniv Bronhaim 2013-10-09 07:54:55 UTC
But in such case the newly-deployed engine's version won't be in vdsm's supportedENGINEs as well. So still you don't use the SupportedVDSMVersions.. it might be useful to avoid using old versions

Comment 11 Itamar Heim 2014-01-21 22:21:02 UTC
Closing - RHEV 3.3 Released

Comment 12 Itamar Heim 2014-01-21 22:26:30 UTC
Closing - RHEV 3.3 Released

Comment 13 Oved Ourfali 2014-05-15 13:19:44 UTC
Please verify in the 3.4.0 bug with a 3.5 host, and in the 3.3.z bug with both 3.4 and a 3.5 hosts

Comment 16 Tareq Alayan 2014-05-19 12:15:46 UTC
Created attachment 897120 [details]
engine-log

failed qa.

couldn't add host with vdsm-4.13.2-0.16.el6ev.x86_64 to engine with rhevm-3.4.0-0.20.el6ev.noarch


	
error msg:
Host rose04 is compatible with versions (3.0,3.1,3.2,3.3) and cannot join Cluster m3 which is set to version 3.4.

Comment 17 Oved Ourfali 2014-05-19 12:29:39 UTC
(In reply to Tareq Alayan from comment #16)
> Created attachment 897120 [details]
> engine-log
> 
> failed qa.
> 
> couldn't add host with vdsm-4.13.2-0.16.el6ev.x86_64 to engine with
> rhevm-3.4.0-0.20.el6ev.noarch
> 
> 
> 	
> error msg:
> Host rose04 is compatible with versions (3.0,3.1,3.2,3.3) and cannot join
> Cluster m3 which is set to version 3.4.

The use-case of this bug is adding a new host to a old cluster.
You're adding here a host doesn't support cluster level 3.4, to such a cluster.
As written in comment #13, you need to verify adding a new (3.5) host to an old (3.4) enging, and not vice versa.

Moving to ON_QA.

Comment 18 Tareq Alayan 2014-05-19 13:07:53 UTC
i added vdsm-4.13.2-0.16.el6ev.x86_64 to rhevm-3.3.3-0.51.el6ev.noarch to cluster compatibility version 3.2

i added vdsm-4.13.2-0.16.el6ev.x86_64 to rhevm-3.4.0-0.20.el6ev.noarch to cluster with 3.3 compatibility mode 

verifed

Comment 19 Itamar Heim 2014-05-19 17:12:09 UTC
please review comment 13.
what you checked in comment 18 does not check the mechanism of a VDSM which does *not* appear in the SupportedVDSMVersions.
please detail the vdsm version and the SupportedVDSMVersions in your verification.

Comment 20 Tareq Alayan 2014-05-20 09:20:12 UTC
case1: 3.4 engine with 3.5 vdsm
================================
vdsm-4.14.1-339.gitedb07b8.el6.x86_64
rhevm-3.4.0-0.20.el6ev.noarch
cluster compatibility version 3.3 - host is UP
cluster compatibility version 3.4 - host is up


select option_name,option_value,version from vdc_options where option_name='SupportedVDSMVersions';
      option_name      |         option_value         | version 
-----------------------+------------------------------+---------
 SupportedVDSMVersions | 4.9,4.10,4.11,4.12,4.13,4.14 | general


case2: 3.4 engine with vdsm-4.13 (3.3)
=======================================
vdsm-4.13.2-0.16.el6ev.x86_64
rhevm-3.4.0-0.20.el6ev.noarch
cluster compatibility version 3.3 - host is UP
cluster compatibility version 3.4 - host is non opertional 


select option_name,option_value,version from vdc_options where option_name='SupportedVDSMVersions';
      option_name      |         option_value         | version 
-----------------------+------------------------------+---------
 SupportedVDSMVersions | 4.9,4.10,4.11,4.12,4.13,4.14 | general

 


case3: 3.3.3 engine with 3.5 vdsm
===================================
vdsm-4.14.1-339.gitedb07b8.el6.x86_64
rhevm-3.3.3-0.51.el6ev.noarch
cluster compatibility version 3.2 - host is non operational
cluster compatibility version 3.3 - host is non operational

select option_name,option_value,version from vdc_options where option_name='SupportedVDSMVersions';
     option_name      |      option_value       | version 
-----------------------+-------------------------+---------
 SupportedVDSMVersions | 4.9,4.10,4.11,4.12,4.13 | general


 case3 3.3 engine with vdsm-4.13 (3.3): 
 ==================================
vdsm-4.13.2-0.16.el6ev.x86_64
rhevm-3.3.3-0.51.el6ev.noarch
cluster compatibility version 3.2 - host is UP
cluster compatibility version 3.3 - host is UP

select option_name,option_value,version from vdc_options where option_name='SupportedVDSMVersions';
     option_name      |      option_value       | version 
-----------------------+-------------------------+---------
 SupportedVDSMVersions | 4.9,4.10,4.11,4.12,4.13 | general


is this ok?

Comment 21 Tareq Alayan 2014-05-20 14:20:58 UTC
additional info:
===============
when host is connected to 3.4 engine and vdsm is 4.13.2 
vdsClient -s 0 getVdsCaps
supportedENGINEs = ['3.0', '3.1', '3.2', '3.3']

when host is concected to 3.4 and vdsm is vdsm-4.14.1-339.gitedb07b8.el6.x86_64
supportedENGINEs = ['3.0', '3.1', '3.2', '3.3', '3.4']

Comment 22 Yaniv Bronhaim 2014-05-20 14:46:30 UTC
Thanks
but still something suspicious about the non-operational state. Host with vdsm >=4.13 is compatible with cluster 3.0-3.3, so the reason for the non-operational state is not clear to me (referring to case 2 and 3). what does the audit log say about those hosts?

Comment 23 Tareq Alayan 2014-05-21 14:08:52 UTC
Host rose04 is compatible with versions (3.0,3.1,3.2,3.3) and cannot join Cluster m3

Comment 24 Oved Ourfali 2014-05-22 06:04:13 UTC
(In reply to Tareq Alayan from comment #23)
> Host rose04 is compatible with versions (3.0,3.1,3.2,3.3) and cannot join
> Cluster m3

I guess that's the error in case:

case2: 3.4 engine with vdsm-4.13 (3.3)
=======================================
vdsm-4.13.2-0.16.el6ev.x86_64
rhevm-3.4.0-0.20.el6ev.noarch
cluster compatibility version 3.3 - host is UP
cluster compatibility version 3.4 - host is non opertional 

Which is okay.

But, what's the error in the two cases below?

case3: 3.3.3 engine with 3.5 vdsm
===================================
vdsm-4.14.1-339.gitedb07b8.el6.x86_64
rhevm-3.3.3-0.51.el6ev.noarch
cluster compatibility version 3.2 - host is non operational
cluster compatibility version 3.3 - host is non operational

Comment 25 Tareq Alayan 2014-05-22 10:35:31 UTC
Retest with updated vdsm:
=========================
rhevm-3.3.3-0.52.el6ev.noarch
vdsm-4.15.0-22.gitdcd07f4.el6.x86_64
cluster compatibility version 3.3 and 3.2
host state = up


select option_name,option_value,version from vdc_options where option_name='SupportedVDSMVersions';
      option_name      |      option_value       | version 
-----------------------+-------------------------+---------
 SupportedVDSMVersions | 4.9,4.10,4.11,4.12,4.13 | general


vdsClient -s 0 getVdsCaps
        HBAInventory = {'FC': [], 'iSCSI': [{'InitiatorName': 'iqn.1994-05.com.redhat:3865ad2788b0'}]}
        ISCSIInitiatorName = 'iqn.1994-05.com.redhat:3865ad2788b0'
        autoNumaBalancing = 2
        bondings = {'bond0': {'addr': '',
                              'cfg': {},
                              'hwaddr': '00:00:00:00:00:00',
                              'ipv6addrs': [],
                              'mtu': '1500',
                              'netmask': '',
                              'slaves': []},
                    'bond1': {'addr': '',
                              'cfg': {},
                              'hwaddr': '00:00:00:00:00:00',
                              'ipv6addrs': [],
                              'mtu': '1500',
                              'netmask': '',
                              'slaves': []},
                    'bond2': {'addr': '',
                              'cfg': {},
                              'hwaddr': '00:00:00:00:00:00',
                              'ipv6addrs': [],
                              'mtu': '1500',
                              'netmask': '',
                              'slaves': []},
                    'bond3': {'addr': '',
                              'cfg': {},
                              'hwaddr': '00:00:00:00:00:00',
                              'ipv6addrs': [],
                              'mtu': '1500',
                              'netmask': '',
                              'slaves': []},
                    'bond4': {'addr': '',
                              'cfg': {},
                              'hwaddr': '00:00:00:00:00:00',
                              'ipv6addrs': [],
                              'mtu': '1500',
                              'netmask': '',
                              'slaves': []}}
        bridges = {'rhevm': {'addr': '10.35.97.27',
                             'cfg': {'BOOTPROTO': 'dhcp',
                                     'DEFROUTE': 'yes',
                                     'DELAY': '0',
                                     'DEVICE': 'rhevm',
                                     'MTU': '1500',
                                     'NM_CONTROLLED': 'no',
                                     'ONBOOT': 'no',
                                     'STP': 'off',
                                     'TYPE': 'Bridge'},
                             'gateway': '10.35.97.254',
                             'ipv6addrs': ['fe80::d6ae:52ff:fec6:1a0e/64'],
                             'ipv6gateway': '::',
                             'mtu': '1500',
                             'netmask': '255.255.255.0',
                             'opts': {'ageing_time': '29995',
                                      'bridge_id': '8000.d4ae52c61a0e',
                                      'forward_delay': '0',
                                      'gc_timer': '36',
                                      'group_addr': '1:80:c2:0:0:0',
                                      'hash_elasticity': '4',
                                      'hash_max': '512',
                                      'hello_time': '199',
                                      'hello_timer': '36',
                                      'max_age': '1999',
                                      'multicast_last_member_count': '2',
                                      'multicast_last_member_interval': '99',
                                      'multicast_membership_interval': '25996',
                                      'multicast_querier': '0',
                                      'multicast_querier_interval': '25496',
                                      'multicast_query_interval': '12498',
                                      'multicast_query_response_interval': '999',
                                      'multicast_router': '1',
                                      'multicast_snooping': '1',
                                      'multicast_startup_query_count': '2',
                                      'multicast_startup_query_interval': '3124',
                                      'priority': '32768',
                                      'root_id': '8000.d4ae52c61a0e',
                                      'root_path_cost': '0',
                                      'root_port': '0',
                                      'stp_state': '0',
                                      'tcn_timer': '0',
                                      'topology_change': '0',
                                      'topology_change_detected': '0',
                                      'topology_change_timer': '0'},
                             'ports': ['em1'],
                             'stp': 'off'}}
        clusterLevels = ['3.0', '3.1', '3.2', '3.3', '3.4']
        cpuCores = '4'
        cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,xtopology,nonstop_tsc,aperfmperf,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,cx16,xtpr,pdcm,pcid,sse4_1,sse4_2,x2apic,popcnt,tsc_deadline_timer,aes,xsave,avx,f16c,rdrand,lahf_lm,ida,arat,epb,xsaveopt,pln,pts,dts,tpr_shadow,vnmi,flexpriority,ept,vpid,fsgsbase,smep,erms,model_Nehalem,model_Conroe,model_coreduo,model_core2duo,model_Penryn,model_Westmere,model_n270,model_SandyBridge'
        cpuModel = 'Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz'
        cpuSockets = '1'
        cpuSpeed = '3601.000'
        cpuThreads = '8'
        emulatedMachines = ['rhel6.5.0',
                            'pc',
                            'rhel6.4.0',
                            'rhel6.3.0',
                            'rhel6.2.0',
                            'rhel6.1.0',
                            'rhel6.0.0',
                            'rhel5.5.0',
                            'rhel5.4.4',
                            'rhel5.4.0']
        guestOverhead = '65'
        hooks = {'after_disk_hotplug': {'aaa.jpg': {'md5': 'd41d8cd98f00b204e9800998ecf8427e'}},
                 'after_nic_hotplug': {'aaa.jpg': {'md5': 'd41d8cd98f00b204e9800998ecf8427e'}}}
        kvmEnabled = 'true'
        lastClient = '10.35.97.27'
        lastClientIface = 'rhevm'
        management_ip = '0.0.0.0'
        memSize = '15921'
        netConfigDirty = 'False'
        networks = {'rhevm': {'addr': '10.35.97.27',
                              'bootproto4': 'dhcp',
                              'bridged': True,
                              'cfg': {'BOOTPROTO': 'dhcp',
                                      'DEFROUTE': 'yes',
                                      'DELAY': '0',
                                      'DEVICE': 'rhevm',
                                      'MTU': '1500',
                                      'NM_CONTROLLED': 'no',
                                      'ONBOOT': 'no',
                                      'STP': 'off',
                                      'TYPE': 'Bridge'},
                              'gateway': '10.35.97.254',
                              'iface': 'rhevm',
                              'ipv6addrs': ['fe80::d6ae:52ff:fec6:1a0e/64'],
                              'ipv6gateway': '::',
                              'mtu': '1500',
                              'netmask': '255.255.255.0',
                              'ports': ['em1'],
                              'stp': 'off'}}
        nics = {'em1': {'addr': '',
                        'cfg': {'BRIDGE': 'rhevm',
                                'DEVICE': 'em1',
                                'HWADDR': 'd4:ae:52:c6:1a:0e',
                                'MTU': '1500',
                                'NM_CONTROLLED': 'no',
                                'ONBOOT': 'no'},
                        'hwaddr': 'd4:ae:52:c6:1a:0e',
                        'ipv6addrs': [],
                        'mtu': '1500',
                        'netmask': '',
                        'speed': 1000},
                'em2': {'addr': '',
                        'cfg': {'BOOTPROTO': 'dhcp',
                                'DEVICE': 'em2',
                                'HWADDR': 'D4:AE:52:C6:1A:0F',
                                'NM_CONTROLLED': 'yes',
                                'ONBOOT': 'no',
                                'TYPE': 'Ethernet',
                                'UUID': '7053e28c-50c7-4dbc-8e4d-b7ee85072871'},
                        'hwaddr': 'd4:ae:52:c6:1a:0f',
                        'ipv6addrs': [],
                        'mtu': '1500',
                        'netmask': '',
                        'speed': 0}}
        numaNodeDistance = {'0': [10]}
        numaNodes = {'0': {'cpus': [0, 1, 2, 3, 4, 5, 6, 7], 'totalMemory': '15921'}}
        operatingSystem = {'name': 'RHEL', 'release': '6.5.0.1.el6', 'version': '6Server'}
        packages2 = {'kernel': {'buildtime': 1393846365.0,
                                'release': '431.11.2.el6.x86_64',
                                'version': '2.6.32'},
                     'libvirt': {'buildtime': 1395830257,
                                 'release': '29.el6_5.7',
                                 'version': '0.10.2'},
                     'mom': {'buildtime': 1391960594,
                             'release': '0.0.master.20140209.gitd79b9d6.el6',
                             'version': '0.4.0'},
                     'qemu-img': {'buildtime': 1398757031,
                                  'release': '2.415.el6_5.9',
                                  'version': '0.12.1.2'},
                     'qemu-kvm': {'buildtime': 1398757031,
                                  'release': '2.415.el6_5.9',
                                  'version': '0.12.1.2'},
                     'spice-server': {'buildtime': 1385990636,
                                      'release': '6.el6_5.1',
                                      'version': '0.12.4'},
                     'vdsm': {'buildtime': 1400688219,
                              'release': '22.gitdcd07f4.el6',
                              'version': '4.15.0'}}
        reservedMem = '321'
        rngSources = ['random']
        selinux = {'mode': '0'}
        software_revision = '22'
        software_version = '4.15'
        supportedENGINEs = ['3.0', '3.1', '3.2', '3.3', '3.4']
        supportedProtocols = ['2.2', '2.3']
        uuid = '4C4C4544-0058-5610-8056-B6C04F4D5731'
        version_name = 'Snow Man'
        vlans = {}
        vmTypes = ['kvm']

Comment 26 Oved Ourfali 2014-05-22 10:50:59 UTC
Great. Thanks for re-testing it.
Please move it to VERIFIED.

Comment 27 Itamar Heim 2014-06-12 14:11:00 UTC
Closing as part of 3.4.0

Comment 28 SATHEESARAN 2015-01-28 11:11:15 UTC
Hi Guys,

I have this query. I have RHS 3.0.3 running vdsm - vdsm-4.14.7.3-1.el6rhs.x86_64
This node couldn't be added to RHEV 3.3 in 3.3 cluster compatibility.

Is this expected ?

Comment 29 Tareq Alayan 2015-01-29 07:04:11 UTC
Hi Satheesaran, 

vdsm-4-14 is compatible with 3.4 compatibility mode and above.


Note You need to log in before you can comment on or make changes to this bug.