Bug 1469134

Summary: virt-sparsify failed with permission denied on rhel 7.4 host
Product: Red Hat Enterprise Linux 7 Reporter: Israel Pinto <ipinto>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: ptoscano, rbalakri, smelamud
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-11 08:34:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The log from libguestfs-test-tool with kernel panic none

Description Israel Pinto 2017-07-10 12:58:36 UTC
Description of problem:
I am working with RHEVM and invoke sparsify on VM disk.
The sparsify failed on rhel 7.4 host and works on 7.3 host.

I try to run the libguestfs-test-tool with vdsm user and failed also on
7.4 host.


Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.4-0.2.el7
7.4 host:
OS Version:RHEL - 7.4 - 11.el7
Kernel Version:3.10.0 - 663.el7.x86_64
KVM Version:2.9.0 - 16.el7
LIBVIRT Version:libvirt-3.2.0-4.el7
VDSM Version:vdsm-4.19.21-1.el7ev
libguestfs-1.36.3-3.el7.x86_64

7.3 host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 514.21.1.el7.x86_64
KVM Version:2.6.0 - 28.el7_3.9
LIBVIRT Version:libvirt-2.0.0-10.el7_3.9
VDSM Version:vdsm-4.19.21-1.el7ev
SPICE Version:0.12.4 - 20.el7_3
libguestfs-1.32.7-3.el7_3.3.x86_64

How reproducible:
all the time

Steps to Reproduce:
1. Create vm with disk
2. Invoke sparsify on disk


Actual results:
sparsify failed

Additional info:
VDSM log:
2017-07-10 15:55:40,684+0300 INFO  (tasks/0) [storage.SANLock] Releasing Lease(name='2f9dd1a7-7e14-4cca-bf88-aa1796bef6bc', path=u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_41_ENV__4_nfs__0/04cf8130-d771-4821-9984-49340e91a7b3/images/37890459-1d85-4419-9103-bff1ecee1667/2f9dd1a7-7e14-4cca-bf88-aa1796bef6bc.lease', offset=0) (clusterlock:390)
2017-07-10 15:55:40,687+0300 ERROR (tasks/0) [root] Job '48907bf9-d3b1-4e01-9e33-e7539b4a04c5' failed (jobs:217)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154, in run
    self._run()
  File "/usr/share/vdsm/storage/sdm/api/sparsify_volume.py", line 57, in _run
    virtsparsify.sparsify_inplace(self._vol_info.path)
  File "/usr/lib/python2.7/site-packages/vdsm/virtsparsify.py", line 71, in sparsify_inplace
    raise cmdutils.Error(cmd, rc, out, err)
Error: Command ['/usr/bin/virt-sparsify', '--machine-readable', '--in-place', u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_41_ENV__4_nfs__0/04cf8130-d771-4821-9984-49340e91a7b3/images/37890459-1d85-4419-9103-bff1ecee1667/2f9dd1a7-7e14-4cca-bf88-aa1796bef6bc'] failed with rc=1 out=[] err=['virt-sparsify: error: libguestfs error: guestfs_launch failed.', 'This usually means the libguestfs appliance failed to start or crashed.', 'Do:', '  export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1', 'and run the command again.  For further information, read:', '  http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs', "You can also run 'libguestfs-test-tool' and post the *complete* output", 'into a bug report or message to the libguestfs mailing list.', '', 'If reporting bugs, run virt-sparsify with debugging enabled and include the ', 'complete output:', '', '  virt-sparsify -v -x [...]']


**********************************************************************

libguestfs-test-tool with vdsm user output:

libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
     ************************************************************
     *                    IMPORTANT NOTICE
     *
     * When reporting bugs, include the COMPLETE, UNEDITED
     * output below in your bug report.
     *
     ************************************************************
LIBGUESTFS_DEBUG=1
LIBGUESTFS_TRACE=1
PATH=/sbin:/bin:/usr/sbin:/usr/bin
XDG_RUNTIME_DIR=/run/user/0
SELinux: Enforcing
libguestfs: trace: add_drive_scratch 104857600
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: disk_create "/tmp/libguestfsjF3mc5/scratch.1" "raw" 104857600
libguestfs: trace: disk_create = 0
libguestfs: trace: add_drive "/tmp/libguestfsjF3mc5/scratch.1" "format:raw" "cachemode:unsafe"
libguestfs: trace: add_drive = 0
libguestfs: trace: add_drive_scratch = 0
libguestfs: trace: get_append
libguestfs: trace: get_append = "NULL"
libguestfs: trace: get_autosync
libguestfs: trace: get_autosync = 1
libguestfs: trace: get_backend
libguestfs: trace: get_backend = "libvirt"
libguestfs: trace: get_backend_settings
libguestfs: trace: get_backend_settings = []
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: trace: get_direct
libguestfs: trace: get_direct = 0
libguestfs: trace: get_hv
libguestfs: trace: get_hv = "/usr/libexec/qemu-kvm"
libguestfs: trace: get_memsize
libguestfs: trace: get_memsize = 500
libguestfs: trace: get_network
libguestfs: trace: get_network = 0
libguestfs: trace: get_path
libguestfs: trace: get_path = "/usr/lib64/guestfs"
libguestfs: trace: get_pgroup
libguestfs: trace: get_pgroup = 0
libguestfs: trace: get_program
libguestfs: trace: get_program = "libguestfs-test-tool"
libguestfs: trace: get_recovery_proc
libguestfs: trace: get_recovery_proc = 1
libguestfs: trace: get_smp
libguestfs: trace: get_smp = 1
libguestfs: trace: get_sockdir
libguestfs: trace: get_sockdir = "/run/user/0"
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: get_trace
libguestfs: trace: get_trace = 1
libguestfs: trace: get_verbose
libguestfs: trace: get_verbose = 1
guestfs_get_append: (null)
guestfs_get_autosync: 1
guestfs_get_backend: libvirt
guestfs_get_backend_settings: []
guestfs_get_cachedir: /var/tmp
guestfs_get_direct: 0
guestfs_get_hv: /usr/libexec/qemu-kvm
guestfs_get_memsize: 500
guestfs_get_network: 0
guestfs_get_path: /usr/lib64/guestfs
guestfs_get_pgroup: 0
guestfs_get_program: libguestfs-test-tool
guestfs_get_recovery_proc: 1
guestfs_get_smp: 1
guestfs_get_sockdir: /run/user/0
guestfs_get_tmpdir: /tmp
guestfs_get_trace: 1
guestfs_get_verbose: 1
host_cpu: x86_64
Launching appliance, timeout set to 600 seconds.
libguestfs: trace: launch
libguestfs: trace: version
libguestfs: trace: version = <struct guestfs_version = major: 1, minor: 36, release: 3, extra: rhel=7,release=3.el7,libvirt, >
libguestfs: trace: get_backend
libguestfs: trace: get_backend = "libvirt"
libguestfs: launch: program=libguestfs-test-tool
libguestfs: launch: version=1.36.3rhel=7,release=3.el7,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=libvirt
libguestfs: launch: tmpdir=/tmp/libguestfsjF3mc5
libguestfs: launch: umask=0022
libguestfs: launch: euid=36
libguestfs: libvirt version = 3002000 (3.2.0)
libguestfs: guest random name = guestfs-3u10z1pyuxconba5
libguestfs: connect to libvirt
libguestfs: opening libvirt handle: URI = qemu:///session, auth = default+wrapper, flags = 0
libvirt: XML-RPC error : Cannot create user runtime directory '/run/user/0/libvirt': Permission denied
libguestfs: error: could not connect to libvirt (URI = qemu:///session): Cannot create user runtime directory '/run/user/0/libvirt': Permission denied [code=38 int1=13]
libguestfs: trace: launch = -1 (error)
libguestfs: trace: close
libguestfs: closing guestfs handle 0x55dea87a6c30 (state 0)
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsjF3mc5

Comment 2 Peter Krempa 2017-07-10 13:04:00 UTC
virt-sparsify is part of libguestfs. 

The error from libguestfs looks like originating from libvirt though, but the initial triage should be done in libguestfs.

Comment 3 Pino Toscano 2017-07-10 13:15:40 UTC
This looks like another effect of:
https://bugzilla.redhat.com/show_bug.cgi?id=753882#c2

Since XDG_RUNTIME_DIR=/run/user/0, then most probably the shell was a root one, and then `su - $user` (where $user is the user with uid=36, it seems) was used to switch to a non-priviledged user.

Most probably this should be solved by unset'ing XDG_RUNTIME_DIR.

Comment 4 Richard W.M. Jones 2017-07-10 13:19:34 UTC
Just copying my answer from a private email from yesterday ...

From the log:

  libvirt: XML-RPC error : Cannot create user runtime directory '/run/user/0/lib
virt': Permission denied
  libguestfs: error: could not connect to libvirt (URI = qemu:///session): Canno
t create user runtime directory '/run/user/0/libvirt': Permission denied  
[code=38int1=13]

This is a libvirt error message, but it's really caused because the
process has changed user to vdsm, but an environment variable
(XDG_RUNTIME_DIR I think) still references the root-owned /run/user/0
directory.  Libvirt tries to create a socket in the runtime dir but
fails because it doesn't have access.

So basically if you're going to setuid to a different user you need to
cleanse or change any environment variables like this.

XDG_RUNTIME_DIR is unfortunately a difficult case because there is no
default for it.  It's supposed to be set by the login process, and the
same login process is also supposed to create the actual
/run/user/<UID> directory with the right permissions (and clean it
when you log out).  If you setuid then that may not be happening.  It
doesn't always happen even when you use the proper tools (notoriously
‘su’ gets it wrong).  More discussion here:

https://bugzilla.redhat.com/show_bug.cgi?id=753882#c49
https://bugs.freedesktop.org/show_bug.cgi?id=70810
https://serverfault.com/questions/388840/good-default-for-xdg-runtime-dir

Comment 6 Shmuel Melamud 2017-07-10 15:13:47 UTC
(In reply to Richard W.M. Jones from comment #4)

You're right, the XDG_RUNTIME_DIR was the source of the error above. Because we used sudo -Eu vdsm for testing, it passed also the XDG_RUNTIME_DIR environment variable from root. But after unsetting XDG_RUNTIME_DIR we still get error from libguestfs-test-tool. And it is a kernel panic:

[    0.812095] RIP  [<ffffffffb86b68c8>] atomic_notifier_chain_register+0x38/0x70
[    0.812924]  RSP <ffff8fda1db1fce0>
[    0.813326] ---[ end trace 6f1814589326ee27 ]---
[    0.813853] Kernel panic - not syncing: Fatal exception
[    0.814747] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    0.815959] Rebooting in 1 seconds..libguestfs: error: appliance closed the connection unexpectedly, see earlier error messages
libguestfs: child_cleanup: 0x562f34a8dc30: child process died
libguestfs: error: guestfs_launch failed, see earlier error messages
libguestfs: trace: launch = -1 (error)
libguestfs: trace: close
libguestfs: closing guestfs handle 0x562f34a8dc30 (state 0)

What can cause this failure?

I've attached the full log.

Comment 7 Shmuel Melamud 2017-07-10 15:14:44 UTC
Created attachment 1295853 [details]
The log from libguestfs-test-tool with kernel panic

Comment 8 Richard W.M. Jones 2017-07-10 15:37:40 UTC
(In reply to Shmuel Melamud from comment #7)
> Created attachment 1295853 [details]
> The log from libguestfs-test-tool with kernel panic

[    0.789843] RIP: 0010:[<ffffffffb86b68c8>]  [<ffffffffb86b68c8>] atomic_notifier_chain_register+0x38/0x70

This is a kernel bug, see bug 1448312.

Comment 9 Israel Pinto 2017-07-11 08:34:18 UTC
update to kernel-3.10.0-693.el7.x86_64
all work, thanks Richard closing the BZ