+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1466234 +++ ====================================================================== Description of problem: If the NFS storage for hosted engine is exported with root_squash , the hosted-engine --upgrade-appliance will fail when injecting the backup file to the image. The root_squash is the default settings for almost all NAS storage and even RHEL NFS server. The upgrade will fail when it inject the backup file to the HE image using guestfish. 2017-06-29 14:59:28 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/boot_disk.py", line 716, in _misc ohostedcons.Upgrade.BACKUP_FILE File "/usr/lib/python2.7/site-packages/otopi/transaction.py", line 156, in __exit__ self.commit() File "/usr/lib/python2.7/site-packages/otopi/transaction.py", line 148, in commit element.commit() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/boot_disk.py", line 219, in commit self._injectBackup() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/vm/boot_disk.py", line 154, in _injectBackup g.add_drive_opts(filename=destination, format='raw', readonly=0) File "/usr/lib64/python2.7/site-packages/guestfs.py", line 559, in add_drive r = libguestfsmod.add_drive (self._o, filename, readonly, format, iface, name, label, protocol, server, username, secret, cachemode, discard, copyonread) RuntimeError: /rhev/data-center/mnt/10.65.209.210:_data_nfs/c65acd3e-2ed8-46b6-8be9-a6d472c35441/images/1554afb0-1b74-497b-9d23-de7eed355595/c29d4ad0-ccf7-4863-a504-0bef62e6988c: Permission denied We are using LIBGUESTFS_BACKEND as direct which means the guestfish will be executed as root user as hosted-engine-setup will be executed with root. g = guestfs.GuestFS(python_return_dict=True) g.set_backend('direct') g.add_drive_opts(filename=destination, format='raw', readonly=0) Manually executing with root === export LIBGUESTFS_BACKEND=direct guestfish -a /rhev/data-center/mnt/10.65.209.210:_data_nfs/c65acd3e-2ed8-46b6-8be9-a6d472c35441/images/1554afb0-1b74-497b-9d23-de7eed355595/c29d4ad0-ccf7-4863-a504-0bef62e6988c /rhev/data-center/mnt/10.65.209.210:_data_nfs/c65acd3e-2ed8-46b6-8be9-a6d472c35441/images/1554afb0-1b74-497b-9d23-de7eed355595/c29d4ad0-ccf7-4863-a504-0bef62e6988c: Permission denied === With vdsm user === su vdsm -s /bin/bash -c 'guestfish -a /rhev/data-center/mnt/10.65.209.210:_data_nfs/c65acd3e-2ed8-46b6-8be9-a6d472c35441/images/1554afb0-1b74-497b-9d23-de7eed355595/c29d4ad0-ccf7-4863-a504-0bef62e6988c' Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: 'help' for help on commands 'man' to read the manual 'quit' to quit the shell ><fs> Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: 'help' for help on commands 'man' to read the manual 'quit' to quit the shell ><fs> ==== Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.0.4.3-3.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Create a NFS share with root_squash and use it for HE deployment . 2. Upgrade using hosted-engine --upgrade-appliance Actual results: hosted-engine --upgrade-appliance is not working. Expected results: hosted-engine --upgrade-appliance should work fine with a NFS exported with root_squash as this is default setting for many NAS servers. Additional info: (Originally by Nijin Ashok)
Very strange, I have root_squash enabled on my share and described scenario did worked for me just fine for 3.6->4.0 upgrade-appliance scenario.
Do you also have 'anonuid=36,anongid=36'? root_squash map requests from uid/gid 0 to the anonymous uid/gid so if you have also 'anonuid=36,anongid=36' it was supposed to work. The only case to hit that was if you had root_squash without anonuid=36,anongid=36 (which is not the recommended and documented configuration).
*** Bug 1469909 has been marked as a duplicate of this bug. ***
Reproduction steps: 1.I've properly configured the NFS server with root_squash parameter enabled: mnt]# touch 1.txt mnt]# ls -lsha total 8.0K 4.0K drwxrwxrwx. 2 root root 4.0K Jul 20 15:51 . 4.0K dr-xr-xr-x. 18 root root 4.0K Jul 20 15:51 .. 0 -rw-r--r--. 1 nfsnobody nfsnobody 0 Jul 20 15:51 1.txt Cleaned the NFS share and then deployed latest 3.6 SHE environment on that share. Environment consisted of two ha el7.4 hosts and the 3.6 engine. I've added also two NFS data storage domains from different NFS shares, from different server and got hosted-engine storage domain auto-imported successfully. Components on engine: rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.11.3-0.1.el6.noarch rhevm-userportal-3.6.11.3-0.1.el6.noarch rhevm-dependencies-3.6.1-1.el6ev.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.11.3-0.1.el6.noarch rhevm-setup-3.6.11.3-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.7-2.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch rhev-guest-tools-iso-3.6-6.el6ev.noarch rhevm-sdk-python-3.6.9.1-1.el6ev.noarch ovirt-vmconsole-1.0.4-1.el6ev.noarch rhevm-lib-3.6.11.3-0.1.el6.noarch rhevm-websocket-proxy-3.6.11.3-0.1.el6.noarch rhevm-backend-3.6.11.3-0.1.el6.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-tools-3.6.11.3-0.1.el6.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch rhevm-doc-3.6.10-1.el6ev.noarch rhevm-cli-3.6.9.0-1.el6ev.noarch rhevm-setup-base-3.6.11.3-0.1.el6.noarch rhevm-tools-backup-3.6.11.3-0.1.el6.noarch rhevm-restapi-3.6.11.3-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.11.3-0.1.el6.noarch rhevm-dbscripts-3.6.11.3-0.1.el6.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch rhevm-image-uploader-3.6.1-2.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.11.3-0.1.el6.noarch rhevm-webadmin-portal-3.6.11.3-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-extensions-api-impl-3.6.11.3-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-3.6.11.3-0.1.el6.noarch rhevm-3.6.11.3-0.1.el6.noarch ovirt-vmconsole-proxy-1.0.4-1.el6ev.noarch Linux version 2.6.32-696.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Tue Feb 21 00:53:17 EST 2017 Linux RHEL6.9 2.6.32-696.el6.x86_64 #1 SMP Tue Feb 21 00:53:17 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.9 (Santiago) Components on hosts: qemu-kvm-rhev-2.9.0-16.el7_4.2.x86_64 libvirt-client-3.2.0-14.el7.x86_64 sanlock-3.5.0-1.el7.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch rhevm-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.10-2.el7ev.noarch mom-0.5.6-1.el7ev.noarch vdsm-4.17.43-1.el7ev.noarch rhevm-appliance-20160620.0-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch Linux version 3.10.0-514.26.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Jun 20 01:16:02 EDT 2017 Linux 3.10.0-514.26.1.el7.x86_64 #1 SMP Tue Jun 20 01:16:02 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) 1. I've upgraded one of the hosts to 4.1 latest components and rebooted it. 2. I've migrated HE-VM to upgraded 4.1 host. 3. I've upgraded remaining 3.6 ha-host to 4.1 components and rebooted it. 4. I've placed ha-hosts in to global maintenance 5. I've backed up engine's DB and copied it to ha-host running the 3.6 engine. 6. I've started engine's upgrade using "hosted-engine --upgrade-appliance" from ha-host, which was running the engine. 7. During the upgrade I've provided the appropriate path to backup file and successfully restored engine's db during the upgrade. No issues with root_squash were detected.
Still getting this error on these components on hosts: qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64 ovirt-setup-lib-1.1.3-1.el7ev.noarch sanlock-3.5.0-1.el7.x86_64 mom-0.5.9-1.el7ev.noarch vdsm-4.19.23-1.el7ev.x86_64 ovirt-hosted-engine-setup-2.1.3.4-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch ovirt-hosted-engine-ha-2.1.4-1.el7ev.noarch libvirt-client-3.2.0-14.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-host-deploy-1.6.6-1.el7ev.noarch Linux version 3.10.0-514.26.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Jun 20 01:16:02 EDT 2017 Linux 3.10.0-514.26.1.el7.x86_64 #1 SMP Tue Jun 20 01:16:02 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) [ ERROR ] Failed to execute stage 'Misc configuration': /run/user/0/libguestfsAh29q7: cannot create temporary directory: Permission denied [ INFO ] Yum Performing yum transaction rollback [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': 'Plugin' object has no attribute 'log' [ ERROR ] Failed to execute stage 'Clean up': [Errno 13] Permission denied: '/tmp/tmpbKaOwu' [ ERROR ] Failed to execute stage 'Clean up': [Errno 13] Permission denied: '/tmp/tmpgtzauo' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine upgrade failed: this system is not reliable, you can use --rollback-upgrade option to recover the engine VM disk from a backup Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20170723150218-f8x0cz.log Returning back to assigned.
Created attachment 1303159 [details] sosreport from host
Created attachment 1303160 [details] ovirt-hosted-engine-setup-20170723150218-f8x0cz.log
To get back to functional engine I had to run "hosted-engine --rollback-upgrade" on host alma04 and then to cancel global maintenance, so the engine could got started.
What Nikolai reported on https://bugzilla.redhat.com/show_bug.cgi?id=1467813#c16 is about having ovirt-hosted-engine-setup temporary running as VDSM user not being able to write a temporary file under /run/user/0 (as expected). That issue happens only with libguestfs 1:1.36.3-6.el7 from RHEL 7.4 while it correctly works with libguestfs 1:1.32.7-3.el7_3.3 from RHEL 7.3. The issue is well described here: https://bugzilla.redhat.com/show_bug.cgi?id=1469134#c4
As Simone says, the error in the log file is: /run/user/0/libguestfsAh29q7: cannot create temporary directory: Permission denied which is indeed caused by the su / XDG_RUNTIME_DIR problem as described fully in this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1469134#c4
I've deployed clean 3.6.11.3 HE on RHEL7.4 host over NFS and attached NFS data storage domain to get auto-imported HE-VM and HE's storage domain. Components on host: rhevm-appliance-20160620.0-1.el7ev.noarch libvirt-client-3.2.0-14.el7.x86_64 mom-0.5.6-1.el7ev.noarch sanlock-3.5.0-1.el7.x86_64 ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.43-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.10-2.el7ev.noarch rhevm-sdk-python-3.6.9.1-1.el7ev.noarch qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch Linux version 3.10.0-693.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017 Linux 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) Components on engine: rhevm-spice-client-x64-cab-3.6-7.el6.noarch rhevm-setup-plugins-3.6.5-1.el6ev.noarch rhevm-tools-backup-3.6.11.3-0.1.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.11.3-0.1.el6.noarch rhevm-branding-rhev-3.6.0-10.el6ev.noarch rhevm-tools-3.6.11.3-0.1.el6.noarch rhevm-setup-base-3.6.11.3-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-7.el6.noarch rhevm-guest-agent-common-1.0.11-6.el6ev.noarch rhevm-restapi-3.6.11.3-0.1.el6.noarch rhevm-3.6.11.3-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-3.6.11.3-0.1.el6.noarch rhevm-websocket-proxy-3.6.11.3-0.1.el6.noarch rhevm-image-uploader-3.6.1-2.el6ev.noarch rhevm-extensions-api-impl-3.6.11.3-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-spice-client-x86-msi-3.6-7.el6.noarch rhevm-webadmin-portal-3.6.11.3-0.1.el6.noarch rhevm-backend-3.6.11.3-0.1.el6.noarch rhevm-lib-3.6.11.3-0.1.el6.noarch rhevm-sdk-python-3.6.9.1-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.11.3-0.1.el6.noarch rhevm-setup-3.6.11.3-0.1.el6.noarch rhevm-cli-3.6.9.0-1.el6ev.noarch rhevm-dependencies-3.6.1-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.11.3-0.1.el6.noarch rhevm-doc-3.6.10-1.el6ev.noarch rhevm-userportal-3.6.11.3-0.1.el6.noarch rhev-guest-tools-iso-3.6-6.el6ev.noarch rhevm-spice-client-x64-msi-3.6-7.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-vmconsole-proxy-helper-3.6.11.3-0.1.el6.noarch rhevm-dbscripts-3.6.11.3-0.1.el6.noarch Linux version 2.6.32-642.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Wed Apr 13 00:51:26 EDT 2016 Linux 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.9 (Santiago) Then I've set in to global maintenance ha-host and backed-up the engine and copied backup files to ha-host. Then I've updated repositories on host to 4.1 and upgraded the host and started upgrade-appliance. I've provided backup file during the upgrade and it finished successfully. Components on upgraded host: ovirt-imageio-daemon-1.0.0-0.el7ev.noarch libvirt-client-3.2.0-14.el7.x86_64 ovirt-imageio-common-1.0.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.3.5-1.el7ev.noarch qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch rhevm-appliance-4.0.20170302.0-1.el7ev.noarch ovirt-host-deploy-1.6.6-1.el7ev.noarch mom-0.5.9-1.el7ev.noarch vdsm-4.19.25-1.el7ev.x86_64 ovirt-setup-lib-1.1.3-1.el7ev.noarch sanlock-3.5.0-1.el7.x86_64 ovirt-hosted-engine-ha-2.1.4-1.el7ev.noarch Linux version 3.10.0-693.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017 Linux 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) Engine: rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch rhevm-4.0.7.4-0.1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch rhev-guest-tools-iso-4.0-7.el7ev.noarch rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch rhevm-doc-4.0.7-1.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-guest-agent-common-1.0.12-4.el7ev.noarch rhevm-branding-rhev-4.0.0-7.el7ev.noarch Linux version 3.10.0-514.6.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Feb 17 19:21:31 EST 2017 Linux 3.10.0-514.6.2.el7.x86_64 #1 SMP Fri Feb 17 19:21:31 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) These were seen on NFS share with root_squash attribute: # ls -lsha /home/share/ total 12K 4.0K drwxrwxrwx. 3 root root 4.0K Aug 1 13:10 . 4.0K drwxr-xr-x. 4 root root 4.0K Jul 23 11:32 .. 4.0K drwxr-xr-x. 5 36 36 4.0K Aug 1 13:11 38d23a4d-1184-4498-a198-09ec23ab4c1f 0 -rwxr-xr-x. 1 36 36 0 Aug 1 20:33 __DIRECT_IO_TEST__ Moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2422