Bug 1024618

Summary: Cannot create a host through RHEVM since it is not able to detect virtualization suport in the bios
Product: [oVirt] ovirt-host-deploy Reporter: Ranjit Noronha <ranjit_noronha>
Component: Plugins.VDSMAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Haim <hateya>
Severity: medium Docs Contact:
Priority: unspecified    
Version: masterCC: aberezin, acathrow, alonbl, bazulay, bugs, dougsland, iheim, Rhev-m-bugs, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-28 09:34:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
requested host deploy log.
none
vmx bios
none
attaching with debugging info none

Description Ranjit Noronha 2013-10-30 05:50:22 UTC
Description of problem:

Creating a host fails with the following log error entry:
<-------------------------
2013-10-23 15:47:15 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._isVirtualizationEnabled:179 virtualization support GenuineIntel (cpu: True, bios: False)
2013-10-23 15:47:15 DEBUG otopi.context context._executeMethod:130 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-D7lCbutUQy/pythonlib/otopi/context.py", line 120, in _executeMethod
    method['method']()
  File "/tmp/ovirt-D7lCbutUQy/otopi-plugins/ovirt-host-deploy/vdsm/hardware.py", line 205, in _validate_virtualization
    _('Hardware does not support virtualization')
RuntimeError: Hardware does not support virtualization
2013-10-23 15:47:15 ERROR otopi.context context._executeMethod:139 Failed to execute stage 'Setup validation': Hardware does not support virtualization
------------------>

However, this host has virtulization enabled in the bios and the kvm modules load without any issue.

Looks like the following function in hardware.py is broken:

/usr/share/ovirt-host-deploy/plugins/ovirt-host-deploy/vdsm/hardware.py:126

msr = self._prdmsr(0, MSR_IA32_FEATURE_CONTROL)

This function should return true, but is returning false.

Version-Release number of selected component (if applicable):


How reproducible:

Yes.

Steps to Reproduce:
1. Try to create a new host with virtualization enabled in rhev-m. It will fail.
2.
3.

Actual results:

The new host add fails with the above log error.

Expected results:

The host should add correctly, since kvm modules load on the system and we are able to run vms in it.

Additional info:

Comment 1 Alon Bar-Lev 2013-10-30 09:25:25 UTC
Can you please send complete host-deploy log?

This should have been solved long ago (since 1.0.0)[1]

[1] http://gerrit.ovirt.org/#/c/9758/

Comment 2 Ranjit Noronha 2013-10-30 09:31:16 UTC
Created attachment 817369 [details]
requested host deploy log.

Attaching host-attach log as requested.

Comment 3 Alon Bar-Lev 2013-10-30 10:32:31 UTC
Thanks!!!

First, are you using oVirt or RHEVM?

I cannot understand... from log I see:
---
2013-10-23 15:47:15 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._vmx_enabled_by_bios:134 vmx bios: True
2013-10-23 15:47:15 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._cpuid:81 cpuid: (67237, 269486080, 10281917, 3219913727)
2013-10-23 15:47:15 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._cpu_has_vmx_support:88 vmx support: True
2013-10-23 15:47:15 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._isVirtualizationEnabled:179 virtualization support GenuineIntel (cpu: True, bios: False)
---

This means that both functions (self._vmx_enabled_by_bios, self._cpu_has_vmx_support) return True, but somehow the bios_ok is turned into False... I do not understand how it is possible.

Can you please investigate some farther? as it seems that nobody else is effected by this.

Comment 4 Ranjit Noronha 2013-10-30 11:00:02 UTC
Created attachment 817395 [details]
vmx bios

I am using RHEV-M 3.3.

vmx bios is True, because I manually changed it in the code while debugging. You can look at th earlier log I have provided where vmx bios is False.

I'm not sure what further investigation you want me to do? If there some command etc. you would like me to try to help narrow down the problem, I can try it out. 

Perhaps the python command is broken and needs to be investigated? Is the flag you are sending into the command correct?

Comment 5 Alon Bar-Lev 2013-10-30 11:29:42 UTC
Please state when you manually changing anything...

If you are using rhevm-3.3 how come you use otopi-1.0 and ovirt-host-deploy-1.0?

Both should have been pulled to 1.1.

What channel do you use? what exact rpms:

rpm -qa | grep rhevm
rpm -q otopi
rpm -q ovirt-host-deploy

Thanks!

Comment 6 Ranjit Noronha 2013-10-30 11:55:30 UTC
I have no clue about the version of otopi, ovirt-host-deploy etc. If such an update is required, it should have been specified as an rpm dependency that would be managed by yum/apt-get automatically and not require customer to open bugzilla entries to figure this out. We use internal Symantec channels that are synchronized to redhat channels. Which channel would you like us to use?


[root@ranjit ~]# rpm -qa|grep rhevm
rhevm-config-3.2.0-11.30.el6ev.noarch
rhevm-doc-3.2.0-4.el6eng.noarch
rhevm-genericapi-3.2.0-11.30.el6ev.noarch
rhevm-userportal-3.2.0-11.30.el6ev.noarch
rhevm-iso-uploader-3.2.2-1.el6ev.noarch
rhevm-dbscripts-3.2.0-11.30.el6ev.noarch
rhevm-image-uploader-3.2.2-1.el6ev.noarch
rhevm-sdk-3.2.0.11-1.el6ev.noarch
rhevm-tools-common-3.2.0-11.30.el6ev.noarch
rhevm-3.2.0-11.30.el6ev.noarch
rhevm-backend-3.2.0-11.30.el6ev.noarch
rhevm-spice-client-x64-cab-3.2-10.el6ev.noarch
rhevm-log-collector-3.2.2-3.el6ev.noarch
rhevm-cli-3.2.0.9-1.el6ev.noarch
rhevm-setup-3.2.0-11.30.el6ev.noarch
rhevm-webadmin-portal-3.2.0-11.30.el6ev.noarch
rhevm-spice-client-x86-cab-3.2-10.el6ev.noarch
rhevm-restapi-3.2.0-11.30.el6ev.noarch
rhevm-notification-service-3.2.0-11.30.el6ev.noarch

[root@ranjit ~]# rpm -q otopi
otopi-1.0.0-2.el6ev.noarch

[root@ranjit ~]# rpm -q ovirt-host-deploy
ovirt-host-deploy-1.0.0-2.el6ev.noarch
[root@ranjit ~]#

Comment 7 Alon Bar-Lev 2013-10-30 12:54:34 UTC
I do not understand what happening... are you sure that now attachment#817395 [details] is a product of execution without any modification to software?

2013-10-23 14:46:49 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._prdmsr:118 prdmsr: 5

---
    def _vmx_enabled_by_bios(self):
        MSR_IA32_FEATURE_CONTROL = 0x3a
        MSR_IA32_FEATURE_CONTROL_LOCKED = 0x1
        MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED = 0x4

        msr = self._prdmsr(0, MSR_IA32_FEATURE_CONTROL)
        ret = (
            msr & (
                MSR_IA32_FEATURE_CONTROL_LOCKED |
                MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED
            )
        ) != MSR_IA32_FEATURE_CONTROL_LOCKED

        self.logger.debug('vmx bios: %s', ret)
        return ret
---

This reduced to the following statement:

 5 & (4 | 1) != 1

Which evaluates to True while in the log I can see this statement evaluates to False, which is kind of impossible.

Comment 8 Ranjit Noronha 2013-10-31 05:49:10 UTC
No changes to that portion of the code. Are you sure the return value from that function is 5?

Comment 9 Alon Bar-Lev 2013-10-31 07:43:58 UTC
(In reply to Ranjit Noronha from comment #8)
> No changes to that portion of the code. Are you sure the return value from
> that function is 5?

---
    def _prdmsr(self, cpu, index):
        ret = -1

        msr = '/dev/cpu/%d/msr' % cpu
        if not os.path.exists(msr):
            raise AttributeError('No %s' % msr)
        with open(msr, 'r') as f:
            try:
                f.seek(index)
                ret = struct.unpack('L', f.read(8))[0]
            except struct.error:
                pass

        self.logger.debug('prdmsr: %s', ret)
        return ret
---

---
2013-10-23 14:46:49 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._prdmsr:118 prdmsr: 5
---

if the above code and message are used, the return return exactly what the message prints, unless something is terribly wrong somewhere.

can you please add more debug lines so we can analyze that?

At engine side: /usr/share/ovirt-host-deploy/plugins/ovirt-host-deploy/vdsm/hardware.py
---
    def _vmx_enabled_by_bios(self):
        MSR_IA32_FEATURE_CONTROL = 0x3a
        MSR_IA32_FEATURE_CONTROL_LOCKED = 0x1
        MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED = 0x4

        msr = self._prdmsr(0, MSR_IA32_FEATURE_CONTROL)
+       self.logger.debug('DEBUG001: %s:%s', type(msr), msr)
+       self.logger.debug('DEBUG002: %s', msr & 5)
+       self.logger.debug('DEBUG002: %s', msr & (MSR_IA32_FEATURE_CONTROL_LOCKED|MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED))
        ret = (
            msr & (
                MSR_IA32_FEATURE_CONTROL_LOCKED |
                MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED
            )
        ) != MSR_IA32_FEATURE_CONTROL_LOCKED

        self.logger.debug('vmx bios: %s', ret)
        return ret
---

Then re-deploy host.

Thanks

Comment 10 Ranjit Noronha 2013-10-31 11:16:00 UTC
Created attachment 817841 [details]
attaching with debugging info

Here you go. The full log is attached. Looked like the function is returning 1.

Comment 11 Ranjit Noronha 2013-10-31 11:25:47 UTC
Looks like we are not seeing this issue with 3.3 so you can close this bug.

Comment 12 Alon Bar-Lev 2013-10-31 11:32:27 UTC
(In reply to Ranjit Noronha from comment #10)
> Created attachment 817841 [details]
> attaching with debugging info
> 
> Here you go. The full log is attached.

Thank you!

> Looked like the function is returning 1.

It is seems to be impossible, as we print the return value of the function before returning, and it is 5.

---
2013-10-31 22:05:43 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._prdmsr:118 prdmsr: 5
2013-10-31 22:05:43 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._vmx_enabled_by_bios:127 DEBUG001: <type 'int'>:1
2013-10-31 22:05:43 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._vmx_enabled_by_bios:128 DEBUG002: 1
2013-10-31 22:05:43 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._vmx_enabled_by_bios:130 DEBUG002: 1
2013-10-31 22:05:43 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.hardware hardware._vmx_enabled_by_bios:139 vmx bios: False
---

Can you please add more debug, notice the new one at _vmx_enabled_by_bios.

Thank you for your help!

---
    def _prdmsr(self, cpu, index):
        ret = -1

+       self.logger.debug('DEBUG011: entry')

        msr = '/dev/cpu/%d/msr' % cpu
        if not os.path.exists(msr):
            raise AttributeError('No %s' % msr)
        with open(msr, 'r') as f:
            try:
                f.seek(index)
                ret = struct.unpack('L', f.read(8))[0]
            except struct.error:
                pass

        self.logger.debug('prdmsr: %s', ret)
+       self.logger.debug('DEBUG012: %s:%s', type(ret), ret)
        return ret

    def _vmx_enabled_by_bios(self):
        MSR_IA32_FEATURE_CONTROL = 0x3a
        MSR_IA32_FEATURE_CONTROL_LOCKED = 0x1
        MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED = 0x4

+       self.logger.debug('DEBUG000: call')
        msr = self._prdmsr(0, MSR_IA32_FEATURE_CONTROL)
+       self.logger.debug('DEBUG001: %s:%s', type(msr), msr)
+       self.logger.debug('DEBUG002: %s', msr & 5)
+       self.logger.debug('DEBUG002: %s', msr & (MSR_IA32_FEATURE_CONTROL_LOCKED|MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED))
        ret = (
            msr & (
                MSR_IA32_FEATURE_CONTROL_LOCKED |
                MSR_IA32_FEATURE_CONTROL_VMXON_ENABLED
            )
        ) != MSR_IA32_FEATURE_CONTROL_LOCKED

        self.logger.debug('vmx bios: %s', ret)
        return ret
---

Comment 13 Alon Bar-Lev 2013-10-31 11:33:30 UTC
(In reply to Ranjit Noronha from comment #11)
> Looks like we are not seeing this issue with 3.3 so you can close this bug.

there is no difference in this regard between 3.2 and 3.3, I suspect python version.

Comment 14 Barak 2013-11-03 12:53:39 UTC
 (In reply to Alon Bar-Lev from comment #13)
> (In reply to Ranjit Noronha from comment #11)
> > Looks like we are not seeing this issue with 3.3 so you can close this bug.
> 
> there is no difference in this regard between 3.2 and 3.3, I suspect python
> version.

3.2 & 3.3 both run on RHEL 6.4 - why do you suspect it's a python version issue ?

Comment 15 Alon Bar-Lev 2013-11-03 12:56:33 UTC
(In reply to Barak from comment #14)
>  (In reply to Alon Bar-Lev from comment #13)
> > (In reply to Ranjit Noronha from comment #11)
> > > Looks like we are not seeing this issue with 3.3 so you can close this bug.
> > 
> > there is no difference in this regard between 3.2 and 3.3, I suspect python
> > version.
> 
> 3.2 & 3.3 both run on RHEL 6.4 - why do you suspect it's a python version
> issue ?

as I wrote in comment#12, based on the analysis so far, it seems that a function returning '5' and caller gets '1', this is not global issue as we would have probably gotten many reports regarding this one, so it is something that either local or some edge in python.

Comment 16 Arthur Berezin 2013-11-11 18:13:44 UTC
Alon, Is this behaviour reproducible in your environment ?



Arthur

Comment 17 Alon Bar-Lev 2013-11-11 19:16:33 UTC
(In reply to Arthur Berezin from comment #16)
> Alon, Is this behaviour reproducible in your environment ?

No... had it been reproduced, it would have already been solved :)

It is also first report of this kind we got... very strange one... had I not known better, I would have guessed that this is memory corruption within python.

Comment 18 Alon Bar-Lev 2013-11-28 09:34:22 UTC
Please reopen if reproduced.
Thanks!