Bug 1201355

Summary: [4.0] [HC] Hosted Engine storage domains disappear while running ovirt-host-deploy in Hyper Converged configuration
Product: [oVirt] vdsm Reporter: Sandro Bonazzola <sbonazzo>
Component: GeneralAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: ---CC: acanan, amureini, bazulay, bugs, dfediuck, didi, ebenahar, eedri, fromani, fsimonce, gklein, kaushal, kripper, lsurette, mgoldboi, nsoffer, pkarampu, ravishankar, rbalakri, riehecky, sabose, sbonazzo, srevivo, stefano.stagnaro, stirabos, teigland, ykaul
Target Milestone: ovirt-4.0.0-alphaKeywords: DevelBlocker
Target Release: 4.17.999Flags: ylavi: ovirt-4.0.0?
ylavi: planning_ack+
rule-engine: devel_ack+
sasundar: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt 4.0.0 alpha1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-25 12:42:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1022961, 1253939    
Bug Blocks: 1173669, 1175354, 1258386    
Attachments:
Description Flags
host logs
none
hosted engine vm logs none

Description Sandro Bonazzola 2015-03-12 14:30:23 UTC
Created attachment 1001003 [details]
host logs

In the Hyper Converged configuration gluster daemon is configured and running on the host.
Hosted Engine Setup creates a local brick and use it for creating a master data storage domain where to deploy the Hosted Engine VM.
Once the VM is deployed and the engine is running, the setup uses the SDK to add the host to the cluster.
ovirt-host-deploy is then executed by the engine.

At this point the mount point under /rhev used by the data storage domain disappears and the Hosted Engine VM enters paused state.

In the mount logs:
[2015-02-12 15:33:45.223019] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down
[2015-02-12 15:33:45.223040] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/mididell.home:_hosted__engine__glusterfs'.

It seems that something is terminating glusterd with sigterm.
Restarting glusterd shouldn't be causing it and ovirt-host-deploy and ovirt-hosted-engine-setup are not sending sigterm.

Comment 1 Sandro Bonazzola 2015-03-12 14:30:56 UTC
Created attachment 1001005 [details]
hosted engine vm logs

Comment 2 Sandro Bonazzola 2015-03-12 14:35:20 UTC
Host is RHEL 7.1
fabric script for installing rpms used for the test:

testing():
    fabric.api.run('yum -y install glusterfs-server python-magic')
    urls = [
        ('http://jenkins.ovirt.org/job/'
        'ovirt-hosted-engine-setup_any_create-rpms_manual/11/ARCH=x86_64,'
        'DISTRIBUTION=epel-7/artifact/exported-artifacts/'
        'ovirt-hosted-engine-setup-1.2.3-0.0.master.el7.centos.noarch.rpm')
    ]
    for package in (
        'vdsm-4.16.12-7.gita30da75.el7.centos.x86_64.rpm',
        'vdsm-cli-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
        'vdsm-python-zombiereaper-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
        'vdsm-jsonrpc-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
        'vdsm-xmlrpc-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
        'vdsm-yajsonrpc-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
        'vdsm-python-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
        'vdsm-gluster-4.16.12-7.gita30da75.el7.centos.noarch.rpm',
    ):
        urls.append(
            'http://jenkins.ovirt.org/job/manual-build-tarball/490/label=el7/'
            'artifact/exported-artifacts/%s' % package
        )
    fabric.api.run('rpm -Uvh --force ' + ' '.join(urls))

gluster rpms come from http://download.gluster.org/pub/gluster/glusterfs/3.6/3.6.2/

other ovirt rpms are from nightly master snapshot.

Comment 3 Sandro Bonazzola 2015-03-12 14:38:49 UTC
Using VDSM component since it seems that the removal of the storage is due to fuse termination while libgfapi is supposed to be enabled in the rpms used.

Comment 4 Christopher Pereira 2015-03-26 20:49:18 UTC
I experienced a similar behaviour.

While hosted-engine-setup was adding the host to the newly created cluster, VDSM crashed.

On the hosted-engine-setup logs we have:

	[...]

	2015-03-26 18:36:48 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:635 Adding the host to the cluster

	2015-03-26 18:36:49 INFO otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:169 Waiting for the host to become operational in the engine. This may take several minutes...

	2015-03-26 18:36:50 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in installing state
	2015-03-26 18:36:54 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in installing state
	2015-03-26 18:36:57 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in installing state

	2015-03-26 18:54:43 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, (7, 'I/O operation timed out')
	2015-03-26 18:54:43 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in  state
	2015-03-26 18:56:51 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, (7, 'Failed connect to m5.imatronix.com:443; Connection timed out')
	2015-03-26 18:56:51 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:189 VDSM host in  state
	2015-03-26 18:59:00 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._wait_host_ready:183 Error fetching host state: [ERROR]::oVirt API connection failure, (7, 'Failed connect to m5.imatronix.com:443; Connection timed out')
	[...]
	
On the VDSM logs, we have:

	[...]

	MainThread::DEBUG::2015-03-26 18:36:56,767::vdsm::66::vds::(sigtermHandler) Received signal 15

	MainThread::DEBUG::2015-03-26 18:36:56,768::protocoldetector::158::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor
	MainThread::INFO::2015-03-26 18:36:56,768::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server
	Detector thread::DEBUG::2015-03-26 18:36:56,768::protocoldetector::130::vds.MultiProtocolAcceptor::(_cleanup) Cleaning up
	Thread-88::DEBUG::2015-03-26 18:36:56,769::fileSD::262::Storage.Misc.excCmd::(getReadDelay) FAILED: <err> = ''; <rc> = -15
	MainThread::DEBUG::2015-03-26 18:36:56,769::vmchannels::192::vds::(stop) VM channels listener was stopped.
	Thread-88::ERROR::2015-03-26 18:36:56,769::misc::208::Storage.Misc::(readspeed) Unable to read file '/rhev/data-center/mnt/glusterSD/h2.imatronix.com:engine/7ba46e75-51af-4648-becc-5a469cb8e9c2/dom_md/metadata'
	Thread-88::ERROR::2015-03-26 18:36:56,769::monitor::250::Storage.Monitor::(_monitorDomain) Error monitoring domain 7ba46e75-51af-4648-becc-5a469cb8e9c2
	Traceback (most recent call last):
	  File "/usr/share/vdsm/storage/monitor.py", line 247, in _monitorDomain
		self._checkReadDelay()
	  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 726, in wrapper
		value = meth(self, *a, **kw)
	  File "/usr/share/vdsm/storage/monitor.py", line 319, in _checkReadDelay
		self.nextStatus.readDelay = self.domain.getReadDelay()
	  File "/usr/share/vdsm/storage/fileSD.py", line 262, in getReadDelay
		stats = misc.readspeed(self.metafile, 4096)
	  File "/usr/share/vdsm/storage/misc.py", line 209, in readspeed
		raise se.MiscFileReadException(path)
	MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/h2.imatronix.com:engine/7ba46e75-51af-4648-becc-5a469cb8e9c2/dom_md/metadata',)


	ioprocess communication (3617)::ERROR::2015-03-26 18:36:56,768::__init__::152::IOProcessClient::(_communicate) IOProcess failure
	Traceback (most recent call last):
	  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
		raise Exception("FD closed")
	Exception: FD closed
	MainThread::INFO::2015-03-26 18:36:56,769::momIF::87::MOM::(stop) Shutting down MOM
	Thread-88::INFO::2015-03-26 18:36:56,770::monitor::273::Storage.Monitor::(_notifyStatusChanges) Domain 7ba46e75-51af-4648-becc-5a469cb8e9c2 became INVALID
	Thread-147::DEBUG::2015-03-26 18:36:56,771::misc::813::Storage.Event.Storage.DomainMonitor.onDomainStateChange::(_emit) Emitting event
	MainThread::DEBUG::2015-03-26 18:36:56,771::task::592::Storage.TaskManager.Task::(_updateState) Task=`aac0fcbf-3a3c-48be-bd72-26e4bcc2baf6`::moving from state init -> state preparing
	Thread-147::DEBUG::2015-03-26 18:36:56,771::misc::823::Storage.Event.Storage.DomainMonitor.onDomainStateChange::(_emit) Calling registered method `contEIOVms`
	ioprocess communication (3617)::DEBUG::2015-03-26 18:36:56,771::__init__::298::IOProcessClient::(_run) Starting IOProcess...
	MainThread::INFO::2015-03-26 18:36:56,771::logUtils::48::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None)
	Thread-147::DEBUG::2015-03-26 18:36:56,771::misc::833::Storage.Event.Storage.DomainMonitor.onDomainStateChange::(_emit) Event emitted
	Thread-11::DEBUG::2015-03-26 18:36:56,772::storageServer::725::Storage.ConnectionMonitor::(_monitorConnections) Monitoring stopped
	MainThread::INFO::2015-03-26 18:36:56,782::monitor::134::Storage.Monitor::(close) Stop monitoring all domains
	MainThread::INFO::2015-03-26 18:36:56,782::monitor::147::Storage.Monitor::(_stopMonitors) Stop monitoring 7ba46e75-51af-4648-becc-5a469cb8e9c2
	Thread-88::DEBUG::2015-03-26 18:36:56,782::monitor::209::Storage.Monitor::(_run) Domain monitor for 7ba46e75-51af-4648-becc-5a469cb8e9c2 stopped
	MainThread::DEBUG::2015-03-26 18:36:56,782::monitor::153::Storage.Monitor::(_stopMonitors) Waiting for monitor 7ba46e75-51af-4648-becc-5a469cb8e9c2
	Thread-88::INFO::2015-03-26 18:36:56,782::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 7ba46e75-51af-4648-becc-5a469cb8e9c2 (id: 1)
	Thread-88::ERROR::2015-03-26 18:36:56,782::monitor::366::Storage.Monitor::(_releaseHostId) Error releasing host id 1 for domain 7ba46e75-51af-4648-becc-5a469cb8e9c2
	Traceback (most recent call last):
	  File "/usr/share/vdsm/storage/monitor.py", line 363, in _releaseHostId
		self.domain.releaseHostId(self.hostId, unused=True)
	  File "/usr/share/vdsm/storage/sd.py", line 480, in releaseHostId
		self._clusterLock.releaseHostId(hostId, async, unused)
	  File "/usr/share/vdsm/storage/clusterlock.py", line 252, in releaseHostId
		raise se.ReleaseHostIdFailure(self._sdUUID, e)
	ReleaseHostIdFailure: Cannot release host id: (u'7ba46e75-51af-4648-becc-5a469cb8e9c2', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))

	MainThread::DEBUG::2015-03-26 18:36:56,783::taskManager::90::Storage.TaskManager::(prepareForShutdown) Request to stop all tasks
	MainThread::INFO::2015-03-26 18:36:56,783::taskManager::96::Storage.TaskManager::(prepareForShutdown) cdb4f72b-1789-4a9f-adc5-69f7488f5992
	MainThread::INFO::2015-03-26 18:36:56,783::logUtils::51::dispatcher::(wrapper) Run and protect: prepareForShutdown, Return response: None
	MainThread::DEBUG::2015-03-26 18:36:56,783::task::1188::Storage.TaskManager.Task::(prepare) Task=`aac0fcbf-3a3c-48be-bd72-26e4bcc2baf6`::finished: None
	MainThread::DEBUG::2015-03-26 18:36:56,783::task::592::Storage.TaskManager.Task::(_updateState) Task=`aac0fcbf-3a3c-48be-bd72-26e4bcc2baf6`::moving from state preparing -> state finished

	[...]

When trying to restart VDSM, we get:

	Thread-46::ERROR::2015-03-26 19:24:31,843::vm::1237::vm.Vm::(_startUnderlyingVm) vmId=`191045ac-79e4-4ce8-aad7-52cc9af313c5`::The vm start process failed
	Traceback (most recent call last):
	  File "/usr/share/vdsm/virt/vm.py", line 1185, in _startUnderlyingVm
		self._run()
	  File "/usr/share/vdsm/virt/vm.py", line 2253, in _run
		self._connection.createXML(domxml, flags),
	  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper
		ret = f(*args, **kwargs)
	  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3427, in createXML
		if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
	libvirtError: Failed to acquire lock: No space left on device
	Thread-46::INFO::2015-03-26 19:24:31,844::vm::1709::vm.Vm::(setDownStatus) vmId=`191045ac-79e4-4ce8-aad7-52cc9af313c5`::Changed state to Down: Failed to acquire lock: No space left on device (code=1)
	Thread-46::DEBUG::2015-03-26 19:24:31,844::vmchannels::214::vds::(unregister) Delete fileno 60 from listener.
	VM Channels Listener::DEBUG::2015-03-26 19:24:32,346::vmchannels::121::vds::(_do_del_channels) fileno 60 was removed from listener.

In sanlock.log we have:

	2015-03-26 19:24:30+0000 7589 [752]: cmd 9 target pid 9559 not found
	2015-03-26 19:24:31+0000 7589 [764]: r7 cmd_acquire 2,8,9559 invalid lockspace found -1 failed 935819904 name 7ba46e75-51af-4648-becc-5a469cb8e9c2

All 3 lease files are present.

Comment 5 Christopher Pereira 2015-03-26 21:20:09 UTC
Facts:

1) The engine storage (/rhev/data-center/mmt/...) was umounted during this process.

2) Another mount of the same volume (I mounted it before the setup) was still mounted after the VDSM crash, so maybe the problem is not related with gluster.

3) After reconnecting storage with "hosted-engine --connect-storage", the volume is mounted again.
Now, when trying to restart VDSM, we get an "invalid lockspace":

	Thread-46::ERROR::2015-03-26 19:24:31,843::vm::1237::vm.Vm::(_startUnderlyingVm) vmId=`191045ac-79e4-4ce8-aad7-52cc9af313c5`::The vm start process failed
	Traceback (most recent call last):
	  File "/usr/share/vdsm/virt/vm.py", line 1185, in _startUnderlyingVm
		self._run()
	  File "/usr/share/vdsm/virt/vm.py", line 2253, in _run
		self._connection.createXML(domxml, flags),
	  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 126, in wrapper
		ret = f(*args, **kwargs)
	  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3427, in createXML
		if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
	libvirtError: Failed to acquire lock: No space left on device
	Thread-46::INFO::2015-03-26 19:24:31,844::vm::1709::vm.Vm::(setDownStatus) vmId=`191045ac-79e4-4ce8-aad7-52cc9af313c5`::Changed state to Down: Failed to acquire lock: No space left on device (code=1)
	Thread-46::DEBUG::2015-03-26 19:24:31,844::vmchannels::214::vds::(unregister) Delete fileno 60 from listener.
	VM Channels Listener::DEBUG::2015-03-26 19:24:32,346::vmchannels::121::vds::(_do_del_channels) fileno 60 was removed from listener.

In sanlock.log we have:

	2015-03-26 19:24:30+0000 7589 [752]: cmd 9 target pid 9559 not found
	2015-03-26 19:24:31+0000 7589 [764]: r7 cmd_acquire 2,8,9559 invalid lockspace found -1 failed 935819904 name 7ba46e75-51af-4648-becc-5a469cb8e9c2

(All 3 lease files are present)

Comment 6 Christopher Pereira 2015-03-26 21:21:25 UTC
About the hosted-engine VM not being resumed after restarting VDSM, I confirmed that QEMU is not reopening the file descriptors when resuming a paused VM.

See also:

- https://bugzilla.redhat.com/show_bug.cgi?id=1172905
- https://bugzilla.redhat.com/show_bug.cgi?id=1058300

Comment 7 Christopher Pereira 2015-03-26 23:05:06 UTC
I found out that the gluster engine storage disappears when VDSM is restarted during the activation process, because it is a child-process of the vdsmd.service:

   CGroup: /system.slice/vdsmd.service
           ├─14895 /usr/bin/python /usr/share/vdsm/vdsm
           ├─14964 /usr/libexec/ioprocess --read-pipe-fd 43 --write-pipe-fd 42 --max-threads 10 --ma
           ├─15836 /usr/libexec/ioprocess --read-pipe-fd 49 --write-pipe-fd 48 --max-threads 10 --ma
           ├─15911 /usr/sbin/glusterfs --volfile-server=h2.imatronix.com --volfile-id=engine /rhev/d
           └─15922 /usr/libexec/ioprocess --read-pipe-fd 61 --write-pipe-fd 60 --max-threads 10 --ma

A workarround is to mount the engine-storage manually, so when vdsmd is restarted on the host, it doesn't kill the storage and the engine VM.

Comment 8 Doron Fediuck 2015-03-31 06:49:46 UTC
(In reply to Christopher Pereira from comment #7)
> I found out that the gluster engine storage disappears when VDSM is
> restarted during the activation process, because it is a child-process of
> the vdsmd.service:
> 
>    CGroup: /system.slice/vdsmd.service
>            ├─14895 /usr/bin/python /usr/share/vdsm/vdsm
>            ├─14964 /usr/libexec/ioprocess --read-pipe-fd 43 --write-pipe-fd
> 42 --max-threads 10 --ma
>            ├─15836 /usr/libexec/ioprocess --read-pipe-fd 49 --write-pipe-fd
> 48 --max-threads 10 --ma
>            ├─15911 /usr/sbin/glusterfs --volfile-server=h2.imatronix.com
> --volfile-id=engine /rhev/d
>            └─15922 /usr/libexec/ioprocess --read-pipe-fd 61 --write-pipe-fd
> 60 --max-threads 10 --ma
> 
> A workarround is to mount the engine-storage manually, so when vdsmd is
> restarted on the host, it doesn't kill the storage and the engine VM.

Thanks Christopher.
The one thing that bothers me is that this issue only happens when we're
working with Gluster storage. Any other storage type works fine with the same
flow. Any idea why?

Comment 9 Christopher Pereira 2015-04-06 08:18:21 UTC
Doron,

Gluster FDs get invalidated because the glusterfs process is killed together with vdsmd.
Maybe with NFS the process hosting those FDs is not inside the same cgroup and is not killed during vdsmd restart.

Comment 10 Christopher Pereira 2015-04-06 08:18:59 UTC
Guys, an easy workarround to keep the hosted-engine VM alive during the setup, is to move the glusterfs process back to glusterd.service's cgroup with this command:

echo $glusterfspid > /sys/fs/cgroup/systemd/system.slice/glusterd.service/tasks

Comment 11 Pranith Kumar K 2015-04-08 01:39:37 UTC
hi Christopher,
     Will restarting glusterd kill glusterfs process when this is done?
Pranith

Comment 12 Christopher Pereira 2015-04-21 19:19:48 UTC
No.
Actually even after stopping glusterd, I still see glusterfs and glusterfsd processes running. I'm not sure if this is normal, but it happens with 3.7dev.
But if you kill those procs, QEMU's file-descriptors will be invalidated and the VM won't start anymore.
You will see a "block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107)"
But AFAIR there was a trick: virsh suspend + resume.
This probably works because QEMU reopens the files (this is possible, because libvirt is passing files, not file-descriptors).
In the future, when we pass file-descriptors to QEMU instead of file paths, we will need to reinit them in libvirt of wherever they are recevied from.

Comment 13 Kaushal 2015-04-23 14:11:47 UTC
(In reply to Christopher Pereira from comment #9)
> Doron,
> 
> Gluster FDs get invalidated because the glusterfs process is killed together
> with vdsmd.
> Maybe with NFS the process hosting those FDs is not inside the same cgroup
> and is not killed during vdsmd restart.

This could be because the GlusterFS mount process is a user-space process, unlike NFS mounts which are handled by the kernel.

(In reply to Christopher Pereira from comment #12)
> No.
> Actually even after stopping glusterd, I still see glusterfs and glusterfsd
> processes running. I'm not sure if this is normal, but it happens with
> 3.7dev.

This is actually surprising to me. I always thought that systemd would keep track of processes under a service's cgroup and terminate all of them when the service was stopped.
Did you stop glusterd manually or via systemctl?

Comment 14 Kaushal 2015-04-24 06:37:12 UTC
The best solution here would be to have QEMU use gfapi to do access the disk image. This will make it free from the VDSM mount, and VDSM is free to do as it pleases with the mount.

Comment 15 Kaushal 2015-04-24 06:40:30 UTC
(In reply to Kaushal from comment #13)
> (In reply to Christopher Pereira from comment #12)
> > No.
> > Actually even after stopping glusterd, I still see glusterfs and glusterfsd
> > processes running. I'm not sure if this is normal, but it happens with
> > 3.7dev.
> 
> This is actually surprising to me. I always thought that systemd would keep
> track of processes under a service's cgroup and terminate all of them when
> the service was stopped.
> Did you stop glusterd manually or via systemctl?

I checked the glusterd.service unit file. It has KillMode=process, which instructs systemd to kill only the main glusterd process not the child processes. This is why the mount wasn't being killed when you stopped GlusterD.

Comment 16 Christopher Pereira 2015-04-26 06:48:52 UTC
Why is libgfapi not being used here?

Would it be possible to patch VDSM so that the glusterd process is not executed inside the VDSM cgroup?

As a workarround, to keep the gluster volume alive I'm currently:

Stopping VDSM, mounting the /rhev share manually and then restarting VDSM.
I check that the gluster process stays outside the VDSM CGROUP, ie. "systemctl status vdsmd" must not show the glusterd process under the vdsmd cgroup.

Comment 17 Christopher Pereira 2015-04-30 18:16:06 UTC
Please change status to fixed.
Thanks

Comment 18 Nir Soffer 2015-04-30 18:19:04 UTC
(In reply to Christopher Pereira from comment #17)
> Please change status to fixed.

Hi Christopher, the patches fixing them were not merged upstream yet, we cannot
change status to fixed yet.

Comment 19 Doron Fediuck 2015-05-12 07:26:36 UTC
*** Bug 1220310 has been marked as a duplicate of this bug. ***

Comment 21 Christopher Pereira 2015-05-25 21:22:42 UTC
Working fine in 3.6 Alpha (tested on CentOS 7).
Can be closed.

Comment 22 Nikolai Sednev 2015-09-07 14:41:43 UTC
Hi Sandro, 
May you change the target release 3.6.0->4.0 please?

Comment 23 Sandro Bonazzola 2015-09-11 12:23:50 UTC
Dropping Bug #1083025 from the blocked bugs, this bug shouldn't affect external glusterfs support

Comment 24 Yaniv Kaul 2015-09-22 12:35:13 UTC
Sandro, when will this bug move to ON_QA?

Comment 25 Sandro Bonazzola 2015-09-22 14:55:03 UTC
I'm going to move to ON_QA when we release a 4.0 milestone build.
We may have a "checkpoint" build after 3.6.0 will be out if we don't want to wait until the first alpha.

Comment 26 Sahina Bose 2016-03-31 14:10:36 UTC
Sandro, can this be closed?

Comment 27 Sandro Bonazzola 2016-04-05 14:37:35 UTC
(In reply to Sahina Bose from comment #26)
> Sandro, can this be closed?

closed no, moved to qa yes.

Comment 28 Sahina Bose 2016-04-05 14:51:40 UTC
Isn't this fix available in 3.6.x?

Comment 29 Sandro Bonazzola 2016-04-18 12:47:06 UTC
(In reply to Sahina Bose from comment #28)
> Isn't this fix available in 3.6.x?

As far as I can tell, no, it's not in 3.6 (change-id: I79714a9a6ecbde17c396eca6b17b9e3605fad688 )

Comment 30 Allon Mureinik 2016-04-18 12:53:35 UTC
(In reply to Sandro Bonazzola from comment #29)
> (In reply to Sahina Bose from comment #28)
> > Isn't this fix available in 3.6.x?
> 
> As far as I can tell, no, it's not in 3.6 (change-id:
> I79714a9a6ecbde17c396eca6b17b9e3605fad688 )

This patch was backported to the 3.6 branch - see commit hash 48841c0956ca91af5cd6aeb66782c8d2115f1d1e.
It's included in 4.17.0.4

Comment 32 SATHEESARAN 2016-08-23 04:22:17 UTC
Tested with gluster 3.7.9 and RHV 4.0.2-7.

While deploying hosted engine, used glusterfs storage domain backed with replica 3 volume. No problems found.

glusterfs mount process is under the system.slice

└─system.slice
  ├─run-18300.scope
  │ └─18363 /usr/sbin/glusterfs --volfile-server=10.70.37.157 --volfile-server=10.70.37.162 --volfile-server=10.70.37.187 --volfile-id=/rep2vol /rhev/data-center/mnt/glusterSD/10.70.37.157:_rep2vol

Comment 33 SATHEESARAN 2016-08-23 04:31:23 UTC
Also found that the gluster fuse mount persisted, even after restarting glusterd

Comment 34 SATHEESARAN 2016-08-23 04:31:55 UTC
(In reply to SATHEESARAN from comment #33)
> Also found that the gluster fuse mount persisted, even after restarting
> glusterd

And also after restarting vdsmd