Bug 1340912 - Hosted-engine-setup accepts /root as a valid alternative scratch dir but then fails since it's not readable by VDSM user
Summary: Hosted-engine-setup accepts /root as a valid alternative scratch dir but then...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.0.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ovirt-4.1.0-beta
: 2.1.0
Assignee: Ido Rosenzwig
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-30 16:26 UTC by Nikolai Sednev
Modified: 2017-05-11 09:29 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-02-15 14:57:22 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
sosreport from host (6.06 MB, application/x-xz)
2016-05-30 16:27 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 69100 0 master MERGED Core: Added check for VDSM-user read permissions on tmp directory 2021-01-06 09:59:21 UTC
oVirt gerrit 70094 0 ovirt-hosted-engine-setup-2.1 MERGED Core: Added check for VDSM-user read permissions on tmp directory 2021-01-06 09:59:21 UTC

Description Nikolai Sednev 2016-05-30 16:26:49 UTC
Description of problem:
During deployment of HE on NGN 4.0 the FQDN changed to localhost.localdomain and HE deployment had failed, also there were insufficient space in /var/tmp, so HE could be installed using appliance.

[root@alma03 ~]# hosted-engine --deploy
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: 
          It has been detected that this program is executed through an SSH connection without using screen.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".
          Do you want to continue anyway? (Yes, No)[No]: yes
[ INFO  ] Hardware supports virtualization
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160530154518-pkecdc.log
          Version: otopi-1.5.0_beta1 (otopi-1.5.0-0.1.beta1.el7.centos)
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ INFO  ] Generating libvirt-spice certificates
[WARNING] Cannot locate gluster packages, Hyper Converged setup support will be disabled.
[ INFO  ] Please abort the setup and install vdsm-gluster, gluster-server >= 3.7.2 and restart vdsmd service in order to gain Hyper Converged setup support.
[ INFO  ] Stage: Environment customization
         
          --== STORAGE CONFIGURATION ==--
         
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: 
          Please specify the full shared storage connection path to use (example: host:/path): 10.35.64.11:/vol/RHEV/Virt/nsednev_3_6_HE_1
[ INFO  ] Installing on first host
         
          --== SYSTEM CONFIGURATION ==--
         
         
          --== NETWORK CONFIGURATION ==--
         
          Please indicate a nic to set ovirtmgmt bridge on: (p1p1, p1p2, em1, em2) [em1]: p1p1
          iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]: 
          Please indicate a pingable gateway IP address [10.35.117.254]: 
         
          --== VM CONFIGURATION ==--
         
          Booting from cdrom on RHEL7 is ISO image based only, as cdrom passthrough is disabled (BZ760885)
          Please specify the device to boot the VM from (choose disk for the oVirt engine appliance)
          (cdrom, disk, pxe) [disk]: 
          Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]: 
[ INFO  ] Detecting available oVirt engine appliances
          The following appliance have been found on your system:
                [1] - The oVirt Engine Appliance image (OVA) - 4.0-20160528.1.el7.centos
                [2] - Directly select an OVA file
          Please select an appliance (1, 2) [1]: 
[ INFO  ] Verifying its sha1sum
[ INFO  ] Checking OVF archive content (could take a few minutes depending on archive size)
[ INFO  ] Checking OVF XML content (could take a few minutes depending on archive size)
[WARNING] OVF does not contain a valid image description, using default.
[ ERROR ] Not enough space in the temporary directory [/var/tmp]
          Please specify path to a temporary directory with at least 10 GB [/var/tmp]: /root
          Would you like to use cloud-init to customize the appliance on the first boot (Yes, No)[Yes]? 
          Would you like to generate on-fly a cloud-init ISO image (of no-cloud type)
          or do you have an existing one (Generate, Existing)[Generate]? 
          Please provide the FQDN you would like to use for the engine appliance.
          Note: This will be the FQDN of the engine VM you are now going to launch,
          it should not point to the base host or to any other existing machine.
          Engine VM FQDN: (leave it empty to skip):  []: nsednev-he-1.qa.lab.tlv.redhat.com
          Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? 
          Automatically restart the engine VM as a monitored service after engine-setup (Yes, No)[Yes]? 
          Please provide the domain name you would like to use for the engine appliance.
          Engine VM domain: [qa.lab.tlv.redhat.com]
          Enter root password that will be used for the engine appliance (leave it empty to skip): 
          Confirm appliance root password: 
          The following CPU types are supported by this host:
                 - model_SandyBridge: Intel SandyBridge Family
                 - model_Westmere: Intel Westmere Family
                 - model_Nehalem: Intel Nehalem Family
                 - model_Penryn: Intel Penryn Family
                 - model_Conroe: Intel Conroe Family
          Please specify the CPU type to be used by the VM [model_SandyBridge]: 
          Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): [4]: 
[WARNING] Minimum requirements for disk size not met
          You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:05:1a:74]: 00:16:3E:7B:B8:53
          Please specify the memory size of the VM in MB (Defaults to appliance OVF value): [16384]: 
          How should the engine VM network be configured (DHCP, Static)[DHCP]? 
          Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
          Note: ensuring that this host could resolve the engine VM hostname is still up to you
          (Yes, No)[No] yes
         
          --== HOSTED ENGINE CONFIGURATION ==--
         
          Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_1]: alma03.qa.lab.tlv.redhat.com
          Enter engine admin password: 
          Confirm engine admin password: 
          Please provide the name of the SMTP server through which we will send notifications [localhost]: smtp.redhat.com
          Please provide the TCP port number of the SMTP server [25]: 
          Please provide the email address from which notifications will be sent [root@localhost]: nsednevhe1
          Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: nsednev
[ INFO  ] Stage: Setup validation
         
          --== CONFIGURATION PREVIEW ==--
         
          Bridge interface                   : p1p1
          Engine FQDN                        : nsednev-he-1.qa.lab.tlv.redhat.com
          Bridge name                        : ovirtmgmt
          Host address                       : alma03.qa.lab.tlv.redhat.com
          SSH daemon port                    : 22
          Firewall manager                   : iptables
          Gateway address                    : 10.35.117.254
          Host name for web application      : alma03.qa.lab.tlv.redhat.com
          Storage Domain type                : nfs3
          Host ID                            : 1
          Image size GB                      : 10
          Storage connection                 : 10.35.64.11:/vol/RHEV/Virt/nsednev_3_6_HE_1
          Console type                       : vnc
          Memory size MB                     : 16384
          MAC address                        : 00:16:3E:7B:B8:53
          Boot type                          : disk
          Number of CPUs                     : 4
          OVF archive (for disk boot)        : /usr/share/ovirt-engine-appliance/ovirt-engine-appliance-4.0-20160528.1.el7.centos.ova
          Restart engine VM after engine-setup: True
          CPU Type                           : model_SandyBridge
         
          Please confirm installation settings (Yes, No)[Yes]: 
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Configuring libvirt
[ INFO  ] Configuring VDSM
[ INFO  ] Starting vdsmd
[ INFO  ] Configuring the management bridge
[ INFO  ] Creating Storage Domain
[ INFO  ] Creating Storage Pool
[ INFO  ] Connecting Storage Pool
[ INFO  ] Verifying sanlock lockspace initialization
[ INFO  ] Creating Image for 'hosted-engine.lockspace' ...
[ INFO  ] Image for 'hosted-engine.lockspace' created successfully
[ INFO  ] Creating Image for 'hosted-engine.metadata' ...
[ INFO  ] Image for 'hosted-engine.metadata' created successfully
[ INFO  ] Creating VM Image
[ INFO  ] Extracting disk image from OVF archive (could take a few minutes depending on archive size)
[ INFO  ] Validating pre-allocated volume size
[ INFO  ] Image not uploaded to data domain
[ ERROR ] Failed to execute stage 'Misc configuration': Command '/bin/sudo' failed to execute
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160530161141.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue, fix and redeploy
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160530154518-pkecdc.log


Version-Release number of selected component (if applicable):
ovirt-engine-appliance-4.0-20160528.1.el7.centos.noarch
mom-0.5.3-1.1.el7.noarch
ovirt-vmconsole-host-1.0.2-0.0.master.20160517094103.git06df50a.el7.noarch
vdsm-4.17.999-1155.gitcf216a0.el7.centos.x86_64
ovirt-setup-lib-1.0.2-0.0.master.20160502125738.gitf05af9e.el7.centos.noarch
ovirt-release40-4.0.0-0.3.beta1.noarch
ovirt-vmconsole-1.0.2-0.0.master.20160517094103.git06df50a.el7.noarch
libvirt-client-1.2.17-13.el7_2.4.x86_64
ovirt-engine-sdk-python-3.6.5.1-0.1.20160507.git5fb7e0e.el7.centos.noarch
ovirt-host-deploy-1.5.0-0.1.alpha1.el7.centos.noarch
ovirt-hosted-engine-setup-2.0.0-0.1.beta1.el7.centos.noarch
ovirt-release-host-node-4.0.0-0.3.beta1.el7.noarch
ovirt-engine-appliance-4.0-20160528.1.el7.centos.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-hosted-engine-ha-2.0.0-0.1.beta1.el7.centos.noarch
ovirt-node-ng-image-update-placeholder-4.0.0-0.3.beta1.el7.noarch
CentOS Linux release 7.2.1511 (Core) 
Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Linux version 3.10.0-327.18.2.el7.x86_64 (builder.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Thu May 12 11:03:55 UTC 2016



How reproducible:
100%

Steps to Reproduce:
1.Deploy via CLI HE on NGN 4.0 from appliance.
2.
3.

Actual results:
1)Deployment failed
2)There is not enough space in /var/tmp for the appliance.
Expected results:
Deployment should succeed.

Additional info:

Comment 1 Nikolai Sednev 2016-05-30 16:27:37 UTC
Created attachment 1162912 [details]
sosreport from host

Comment 2 Sandro Bonazzola 2016-05-31 05:59:37 UTC
Can you please attach also HE VM sos report?

Comment 3 Simone Tiraboschi 2016-05-31 06:43:26 UTC
The error is here:
[ ERROR ] Failed to execute stage 'Misc configuration': Command '/bin/sudo' failed to execute

But on my opinion the issue was here:
Please specify path to a temporary directory with at least 10 GB [/var/tmp]: /root

Can you please try using a scratch directory that could be accessed by vdsm user?
If the issue is here we have a bug since we need to enforce it.

Comment 4 Fabian Deutsch 2016-05-31 07:01:53 UTC
I'm aware of two bugs who might have an impact here:

Bug 1338511 - HE does not work when /var is too small (default case) MODIFIED
Bug 1329943 - myhostname is missing from the hosts line in nsswitch.conf

When we've got the logs, then we can probably see if it's bug 1338511 or something else.

Comment 5 Simone Tiraboschi 2016-05-31 07:26:00 UTC
Yes, the issue is simply here:

2016-05-30 16:1:33 DEBUG otopi.plugins.gr_he_common.vm.boot_disk plugin.execute:926 execute-output: ('/bin/sudo', '-u', 'vdsm', '-g', 'kvm', '/bin/qemu-img', 'info', '--output', 'json', '/root/tmpZd0aEB') stderr:
qemu-img: Could not open '/root/tmpZd0aEB': Could not open '/root/tmpZd0aEB': Permission denied

2016-05-30 16:1:33 DEBUG otopi.transaction transaction._prepare:66 exception during prepare phase
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/transaction.py", line 62, in _prepare
    element.prepare()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/boot_disk.py", line 218, in prepare
    self._validate_volume()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/boot_disk.py", line 101, in _validate_volume
    raiseOnError=True
  File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute
    command=args[0],
RuntimeError: Command '/bin/sudo' failed to execute
2016-05-30 16:1:33 DEBUG otopi.transaction transaction.abort:19 aborting 'Image Transaction'
2016-05-30 16:1:33 INFO otopi.plugins.gr_he_common.vm.boot_disk boot_disk.abort:222 Image not uploaded to data domain

The scratch dir where we extract the OVF image should be readable by VDSM user since we need to upload the image as VDSM user.
Having more space in /var/tmp as for bug 1338511 could prevent it since the default dir will have enough space but we should also avoid letting the user enter a wrong path better validating it.

I seams instead that hosted-engine-setup got the right hostname and I don't see it changing from the logs so the bug title seams a bit confusing:
2016-05-30 16:08:17 DEBUG otopi.plugins.gr_he_setup.network.bridge bridge._get_hostname_from_bridge_if:308 hostname: 'alma03.qa.lab.tlv.redhat.com', aliaslist: '[]', ipaddrlist: '['10.35.17.24']'
2016-05-30 16:08:17 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
2016-05-30 16:08:17 DEBUG otopi.context context.dumpEnvironment:770 ENV OVEHOSTED_NETWORK/host_name=str:'alma03.qa.lab.tlv.redhat.com'

Comment 6 Nikolai Sednev 2016-05-31 09:08:29 UTC
(In reply to Sandro Bonazzola from comment #2)
> Can you please attach also HE VM sos report?

It didn't got to it at all.
[root@localhost ~]# hosted-engine --vm-status
You must run deploy first

Comment 7 Simone Tiraboschi 2016-05-31 09:16:55 UTC
(In reply to Nikolai Sednev from comment #6)
> (In reply to Sandro Bonazzola from comment #2)
> > Can you please attach also HE VM sos report?
> 
> It didn't got to it at all.
> [root@localhost ~]# hosted-engine --vm-status
> You must run deploy first

But when you started it was:

[root@alma03 ~]# hosted-engine --deploy
[ INFO  ] Stage: Initializing

when did it changed?

Comment 8 Nikolai Sednev 2016-05-31 10:27:22 UTC
(In reply to Simone Tiraboschi from comment #7)
> (In reply to Nikolai Sednev from comment #6)
> > (In reply to Sandro Bonazzola from comment #2)
> > > Can you please attach also HE VM sos report?
> > 
> > It didn't got to it at all.
> > [root@localhost ~]# hosted-engine --vm-status
> > You must run deploy first
> 
> But when you started it was:
> 
> [root@alma03 ~]# hosted-engine --deploy
> [ INFO  ] Stage: Initializing
> 
> when did it changed?

During HE deployment phase.

Comment 9 Jiri Belka 2016-08-09 09:29:32 UTC
there should be simple check if vdsm user can access this provided scratch dir, the user executes the action as root and it can be not obvious that some activities during the flow are done by other user, ie. vdsm.

Comment 10 Nikolai Sednev 2017-02-07 18:58:53 UTC
HE deployment successfully completed on 4.1 RHEVH (rhvh-4.1-0.20170202.0+1):
# rpm -qa | grep appliance
rhvm-appliance-4.1.20170126.0-1.el7ev.noarch
[root@puma18 ~]# find / | grep rhvm-appliance-4.1.20170126.0-1.el7ev.noarch
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/checksum_data
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/origin_url
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/checksum_type
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/releasever
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/command_line
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/from_repo_revision
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/installed_by
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/reason
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/from_repo
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/from_repo_timestamp
/usr/share/yum/yumdb/r/50269336266f59037ccd019653df2fe785410fa3-rhvm-appliance-4.1.20170126.0-1.el7ev-noarch/var_uuid
/var/tmp/yum-root-SPSf4h/rhvm-appliance-4.1.20170126.0-1.el7ev.noarch.rpm
/var/imgbased/persisted-rpms/rhvm-appliance-4.1.20170126.0-1.el7ev.noarch.rpm

Components on host:
rhvm-appliance-4.1.20170126.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.1-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-node-ng-nodectl-4.1.0-0.20170104.1.el7.noarch
libvirt-client-2.0.0-10.el7_3.4.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
vdsm-4.19.4-1.el7ev.x86_64
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 7.3

Moving to verified.


Note You need to log in before you can comment on or make changes to this bug.