Description of problem: I have deployed a ovirt test bed. Cluster with GlusterFS Storage Domain,supervdsmServer deamon on which all vdsm nodes showed memory leaks,maybe! But cluster use NFS Storage Domain is normal.Please see "reproducible" detailed! Version-Release number of selected component (if applicable): [root@node-01 ~]# rpm -qa | grep vdsm vdsm-python-zombiereaper-4.16.4-0.el6.noarch vdsm-xmlrpc-4.16.4-0.el6.noarch vdsm-jsonrpc-4.16.4-0.el6.noarch vdsm-4.16.4-0.el6.x86_64 vdsm-cli-4.16.4-0.el6.noarch vdsm-python-4.16.4-0.el6.noarch vdsm-yajsonrpc-4.16.4-0.el6.noarch vdsm-gluster-4.16.4-0.el6.noarch [root@node-01 ~]# rpm -qa | grep gluster glusterfs-cli-3.5.2-1.el6.x86_64 glusterfs-libs-3.5.2-1.el6.x86_64 glusterfs-3.5.2-1.el6.x86_64 glusterfs-rdma-3.5.2-1.el6.x86_64 glusterfs-server-3.5.2-1.el6.x86_64 glusterfs-api-3.5.2-1.el6.x86_64 glusterfs-fuse-3.5.2-1.el6.x86_64 vdsm-gluster-4.16.4-0.el6.noarch [root@node-01 ~]# rpm -qa | grep ioprocess ioprocess-0.12.0-2.el6.x86_64 python-ioprocess-0.12.0-2.el6.noarch Exclusion bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1130045 https://bugzilla.redhat.com/show_bug.cgi?id=1124369 How reproducible: Steps to Reproduce: 1.The datacenter‘s cluster is “Enable Gluster Service” when created. 2.Add two nodes by the ovirt-engine dashboard. 3.Creating two Storage Domains,one is Data(Master) which type is GlusterFS,and another is ISO which type is POSIX compliant FS. 4.Creating some VMs. 5.Wating for few minutes and observe the the memory of supervdsmServer deamon,on which any node by top commond. Actual results: As time goes on,the supervdsmServer deamon may occupy more system memory until cann't allocate a little. As a result,the node's status change to be "NonOperational" on ovirt-eng WEBUI. It means that I cann't do any other efficient things on this cluster. In addition, it is noteworthy that I restart vdsm deamon and supervdsmServer deamon when "NonOperational", and hosts will run normally. But this situation may come back reproduce as time goes on. In addition another, if the cluster without GlusterFS Storage Domain, everything will quite natural! Expected results: Additional info: I cann't catch some helpful log info from vdsm.log and supervdsm.log.
Could you attach supervdsm.log anyway? Do you spot anything different in the log, relative to the cluster that has no glusterFS?
Created attachment 938700 [details] vdsm-logs and some pngs Thanks,first of all! I really cann't find helpful log, so I patch all logs of nodes' vdsm here, and some pngs that may productive to analyze the bug-1142647.
I see that supervdsm is asked to call /usr/sbin/gluster --mode=script volume info --xml every 5 seconds. Is this expected? Also (and unrelated to the leak), MainProcess|Thread-51::DEBUG::2014-09-17 10:16:59,274::supervdsmServer::101::SuperVdsm.ServerCallback::(wrapper) call wrapper with (None,) {} does not show the called function name, only "wrapper".
supervdsm calling "/usr/sbin/gluster --mode=script volume info --xml" every 5 sec is expected behaviour.
Please note, the nodes are VM that be created from my OpenStack environment, in which hypervisor is KVM too. The node(VM) "cat /proc/cpuinfo | grep vmx" is not None, so I take it as ovirt-node. Therefore, I'm not sure that, whether above situation makes a difference to this Bug-1142647 or not. Thanks all!
Darshan, can post this to the ovirt-3.5 branch? It's a nasty regression that I'd like to avoid.
(In reply to Dan Kenigsberg from comment #6) > Darshan, can post this to the ovirt-3.5 branch? It's a nasty regression that > I'd like to avoid. Done.
I have also glusterfs After upgrade from 3.4.4. to 3.5.0 I can see n all my 3 nodes PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND nod1 SPM running 0 VM 753 root 15 -5 17.208g 7.737g 10832 S 0.0 49.4 1:28.46 supervdsmServer nod2 running 3 VM 641 root 15 -5 17.573g 7.888g 10768 S 0.0 33.5 1:17.09 supervdsmServer nod3 running 2 VM 6391 root 15 -5 19.072g 8.646g 10844 S 9.3 44.1 38:17.05 supervdsmServer So Supervdsm server ocupy around 33-49% of memory alone! Also I've got systemctl status supervdsmd supervdsmd.service - "Auxiliary vdsm service for running helper functions as root" Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static) Active: active (running) since Tue 2014-10-21 11:32:40 EEST; 23h ago Main PID: 753 (supervdsmServer) CGroup: name=systemd:/system/supervdsmd.service ââ753 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock Oct 21 11:39:51 nod1 daemonAdapter[753]: Process Process-4: Oct 21 11:39:51 nod1 daemonAdapter[753]: Traceback (most recent call last): Oct 21 11:39:51 nod1 daemonAdapter[753]: File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap Oct 21 11:39:51 nod1 daemonAdapter[753]: self.run() Oct 21 11:39:51 nod1 daemonAdapter[753]: File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run Oct 21 11:39:51 nod1 daemonAdapter[753]: self._target(*self._args, **self._kwargs) Oct 21 11:39:51 nod1 daemonAdapter[753]: File "/usr/share/vdsm/supervdsmServer", line 242, in child Oct 21 11:39:51 nod1 daemonAdapter[753]: pipe.recv() Oct 21 11:39:51 nod1 daemonAdapter[753]: IOError: [Errno 4] Interrupted system call
Thanks for your report. This bug is destined to be hacked-away in ovirt-3.5.1 release.
This is an automated message: This bug should be fixed in oVirt 3.5.1 RC1, moving to QA
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.