Bug 1570840
Summary: | Memory leak in vdsmd | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Chris Adams <linux> | ||||||||||||
Component: | Core | Assignee: | Marcin Sobczyk <msobczyk> | ||||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Nikolai Sednev <nsednev> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 4.20.23 | CC: | bugs, fromani, linux, michal.skrivanek, mperina, msobczyk, nsednev, nsoffer, pkliczew | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2019-01-29 13:50:44 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Chris Adams
2018-04-23 14:04:47 UTC
How fast is it growing? Only on one of them? Any more concrete evidence? Anything in logs? Due to an unfortunate power failure, followed by a breaker failure, followed by a generator failure... my dev cluster was power-cycled yesterday afternoon (UPS batteries weren't expected to last through that much failure). I'm not sure why, but I ended up with SPM and all VMs running on the same node after the power returned (and after I fixed the storage - the iSCSI is a "home brew" SAN, Linux running targetcli, and it came back from power cycle with the LUNs not attached to the storage). After about 14 hours, vdsmd on node1 (no VMs) is using about 98MB, and vdsmd on node2 (SPM, 6 VMs including hosted engine) is up to almost 220MB. Nothing jumps out at me in the logs - let me know what logs you'd like to see and I'll attach them. It's been a while since I needed to debug a memory leak, so I'm not sure of the best way to see where the RAM is going. I have several production oVirt clusters with similar setups (execpt iSCSI is to a real SAN); they're still on 4.1.9. I see similar vdsmd memory leaks on a couple of them; it seems to follow the node with the hosted engine. On one for example, I restarted vdsmd last week (April 16), and it is currently using 1.8GB RAM. (In reply to Chris Adams from comment #2) "using" is too vague. Are you looking at RSS? Python allocation works in mysterious ways...and it's normal for it to grow initially (for hours/days), as long as it stops eventually, and it's also ok to see a very high total virtual memory size (tens of GB) There are also few known leaks but they are considered small enough to be relevant. It's not easy to profile remotely I'm afraid...you'd have to do it yourself or provide some repeatable scenario where we can see that in our lab. Sorry - I'm looking at resident size, not virtual. I had a vdsmd last week on one node one one of the 4.1 clusters that was sitting at 15GB resident (I didn't open a ticket then because I needed to restart it ASAP and because I didn't figure 4.1 was getting much look at this point). The 4.2.2 dev cluster has vdsmd resident up to 314MB now on the one node, so it has grown over 90MB in 4 hours. It seems to grow fairly steadily when I've watched it. If you can give me some tips, I can try to see what's happening (just I need some pointers because it's been a while since I had to do that). I'm guessing it must be something related to how I installed these, since it seems to happen on multiple clusters for me, but I thought I went pretty much right by the guides, so... I don't know? These were all installed at previous versions and updated over time (all 3.5 or 3.6 initially, did go major-release by major-release updating). Just to see if I could narrow down causes, I wrote a script to write the RSS of vdsmd to a file every five minutes. node1, with no VMs, was flat, while node2, with 6 VMs (including the engine) and also the SPM, increased every five minutes. I moved SPM to node1, and it took one jump, but then stayed level, while node2 continued increasing. I moved the engine VM to node1, and it started increasing steadily, while node2 almost leveled off (still increased a small amount). node1 vdsmd RSS is up about 14MB in the last hour. I'm still reading how to diagnose python memory leaks - python is not a language I'm strong in. (In reply to Chris Adams from comment #5) > Just to see if I could narrow down causes, I wrote a script to write the RSS > of vdsmd to a file every five minutes. node1, with no VMs, was flat, while > node2, with 6 VMs (including the engine) and also the SPM, increased every > five minutes. > > I moved SPM to node1, and it took one jump, but then stayed level, while > node2 continued increasing. SPM does a lot more than regular nodes. Maybe if you can try to see if the pattern is really related to that role or rather to how many VMs it run or if it runs HostedEngine VM. A node running HE VM is also doing a bit more/other stuff than regular non-SPM node. > I moved the engine VM to node1, and it started increasing steadily, while > node2 almost leveled off (still increased a small amount). node1 vdsmd RSS > is up about 14MB in the last hour. this woulds still sound ok. As long as it stops in about a day if you leave it like that. > I'm still reading how to diagnose python memory leaks - python is not a > language I'm strong in. it's not easy. Most of what you can find around is not working on vdsm because it's much more complex(especially the threading) than average python apps Nir Soffer and Francesco Romani were haunting leaks about a year or two ago. Please bring it up on user/devel list, that's easier to follow up than here I brought this up on the devel list and was referred back to BZ. Per the list, I enabled the vdsm health checker, and it shows the RSS for vdsm increasing every minute, by an average of about 190K, but only on the node with the hosted engine. As a test, I put the engine HA in global maintenance mode, and even stopped ovirt-ha-broker and ovirt-ha-agent on both nodes. Watching that for a short time seemed to show the RSS increase slow; most minutes it was 24K-44K (but still increasing every minute). I am not sure what information you'd need to try to replicate my setup, please let me know. This two-node dev cluster was installed with CentOS 7+oVirt 3.5 (with engine on CentOS 6), then upgraded 3.5->3.6->4.0->4.1->4.2 (with the CentOS 7 engine appliance deployed when CentOS 6 engine support was dropped). It is using iSCSI storage, with separate LUNs for hosted engine and regular VMs (the regular VM storage is using multipath). Alternately, if someone can send me notes/patches/etc. to try to dump where vdsm is using RAM, I can give that a try. I'm just not that good with python to do it myself I don't think (was reading about the tracemalloc module, but not sure where to start with it). And after a few hours, vdsm RSS is back to growing at a rate of 200-450K per minute. technically this is Infra but I'm not sure who can help with this. Nir, Francesco, I'm afraid you'd need to look at it or we have to close this. I do not see any more details in the mail thread to be able to recreate this in the lab. 200-400KB/s sounds serious enough to look at it. OTOH no one else is reporting such problem. Just to add - I'm guessing it is something about how I set things up, since I see it on multiple clusters... just not sure what that could have been (sorry for being a typical clueless end-user in that way!). My dev cluster is out of service right now, but I do see this in production on several 4.1.9 setups (completely independent systems). If there is any specific info I can provide, please just ask. Here's how they were set up: - installed CentOS 7 from a basic kickstart, added oVirt 3.6 repos, installed oVirt with hosted engine on iSCSI (installed engine VM from same basic kickstart when prompted by "hosted-engine -deploy") - updated within 3.6 a few times, then upgraded to 4.0 and then 4.1 They're each 2-3 servers with iSCSI storage (one has multipath but the others don't so not that). I'm not sure of anything "unique" about how I set them up (wish I did). (In reply to Michal Skrivanek from comment #9) > technically this is Infra > > but I'm not sure who can help with this. Nir, Francesco, I'm afraid you'd > need to look at it or we have to close this. I do not see any more details > in the mail thread to be able to recreate this in the lab. 200-400KB/s > sounds serious enough to look at it. OTOH no one else is reporting such > problem. Nir/Francesco, did you have a time to take a look into it? (In reply to Martin Perina from comment #11) > (In reply to Michal Skrivanek from comment #9) > > technically this is Infra > > > > but I'm not sure who can help with this. Nir, Francesco, I'm afraid you'd > > need to look at it or we have to close this. I do not see any more details > > in the mail thread to be able to recreate this in the lab. 200-400KB/s > > sounds serious enough to look at it. OTOH no one else is reporting such > > problem. > > Nir/Francesco, did you have a time to take a look into it? Nope, but I will have some time hopefully later today or tomorrow. (In reply to Martin Perina from comment #11) > Nir/Francesco, did you have a time to take a look into it? No, and I will not have time for this. If this leak is in storage, please move the bug to storage. (In reply to Chris Adams from comment #5) > Just to see if I could narrow down causes, I wrote a script to write the RSS > of vdsmd to a file every five minutes. node1, with no VMs, was flat, while > node2, with 6 VMs (including the engine) and also the SPM, increased every > five minutes. > > I moved SPM to node1, and it took one jump, but then stayed level, while > node2 continued increasing. > > I moved the engine VM to node1, and it started increasing steadily, while > node2 almost leveled off (still increased a small amount). node1 vdsmd RSS > is up about 14MB in the last hour. > > I'm still reading how to diagnose python memory leaks - python is not a > language I'm strong in. No need for custom scripts, btw. You can use the builtin Vdsm "health" reporting: https://github.com/oVirt/vdsm/blob/master/lib/vdsm/common/config.py.in#L565 the health reporting will periodically dump the memstats, including deltas from last check, thread count and other infos. Find them in the logs. From what I've learned from the comments, it seems somehow related to hosted engine. Chris, could you confirm this behaviour is showing up only with hosted engine? In other words, does it happen if you use non-hosted Engine? I haven't used oVirt with a non-hosted engine, so don't know about that or have a good way to test it. I did run a test with the engine HA agent/broker shut down, and the RSS increase appeared to slow somewhat, but it still continued to grow, so I don't know if that was just something coincidental or what. The vdsm RSS growth does seem to be on the node hosting the engine, so if it isn't the engine HA agent/broker, could it be related to the engine storage domain? I can't really think of much else that would be different between a node hosting the engine and other nodes. I'm trying to remember if I did anything "off script" in setting up the engine storage, but I don't think I did. It is an iSCSI LUN, dedicated to just the engine. Chris, can you enable the health monitor, run with DEBUG log level for couple of days, and attach vdsm logs to the bug? Just an FYI: I did see your request, but I'm on vacation for the next week. I will look at that when I get back. I did enable the health monitor on my dev cluster a while back - it did show the steadily growing RSS (but not really any additional info). (In reply to Chris Adams from comment #18) Great, please attach logs with enabled health monitor to the bug. It will help to understand the nature of this issue. Created attachment 1482831 [details]
vdsm log from node1
Created attachment 1482832 [details]
vdsm log from node2
Created attachment 1482833 [details]
vdsm log from node3
I enabled the health checker on one cluster; here are the logs from around the same time (didn't rotate at the same time so not exactly lined up). node2 is the node currently hosting the Hosted Engine VM and is the one with vdsm's RSS growing. Just checking in - was that sufficient log info, do I need to send more, ?? Nikolai, would be best if you run it for some time (few hours at least) and please enable health monitoring(comment #14) and attach vdsm.log. Thanks systemctl restart vdsmd puma18 ~]# free -m total used free shared buff/cache available Mem: 32065 17679 10706 489 3679 13077 Swap: 2047 0 2047 Every 2.0s: ps -o etime,rss,vsz 1618 Tue Nov 20 15:16:06 2018 ELAPSED RSS VSZ 00:21 82356 4413804 puma18 ~]# while sleep 60; do echo; date; ps -o rsz= 1618; done Tue Nov 20 15:18:17 IST 2018 98724 puma18 ~]# top top - 15:19:24 up 5 days, 21:15, 2 users, load average: 1.26, 1.03, 0.90 Tasks: 327 total, 1 running, 326 sleeping, 0 stopped, 0 zombie %Cpu(s): 13.0 us, 2.6 sy, 0.0 ni, 83.9 id, 0.5 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 32835032 total, 10974984 free, 18091540 used, 3768508 buff/cache KiB Swap: 2097148 total, 2097148 free, 0 used. 13404044 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1618 vdsm 0 -20 4635000 98760 13540 S 4.3 0.3 0:12.36 vdsmd 18808 root 15 -5 1547364 36516 9936 S 4.0 0.1 60:07.53 supervdsmd puma18 ~]# while sleep 360; do echo; date; ps -o rsz= 1618; done Tue Nov 20 15:24:32 IST 2018 99076 Tue Nov 20 16:24:33 IST 2018 100092 Tue Nov 20 15:30:32 IST 2018 99136 Tue Nov 20 16:30:33 IST 2018 100104 Tue Nov 20 15:42:32 IST 2018 99292 Tue Nov 20 16:42:33 IST 2018 100132 I can't really reproduce the issue. I have my HE environment up and running, there are some VMs launched, VDSM's memory usage ascends to ~100 MB but after ~30 hours it stalls on the same level. I was trying to analyze 'node2' log that Chris provided. I wrote a script that: - searches for lines containing delta information given by health check (i.e. "rss=408532 kB (+152)") - takes all the log lines that were printed between the current check and the previous one - groups the lines by occurence count so we get something like this: 31 x (Reactor thread) [Broker.StompAdapter] Processing CONNECT request (stompreactor:103) 31 x (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command received (stompreactor:132) 4 x (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo succeeded in 0.00 seconds (__init__:539) ... - prints the grouped data into files named after the delta value (i.e. "440", "320", etc.) with contents like above The deltas span is huge - the smallest are ~80 kB and the largest ~1000 kB. I was hoping that some pattern would emerge (i.e. for the big ones the log file would contain calls to some stuff that doesn't occur for the small ones) but I can't find a correlation. Nikolai's log unfortunately doesn't have the health check info, so I can't really work with it. 'node2', on the other hand, seems to be written with default logging levels, so there are no debug messages. If someone could use my utility in an environment that has both all the loggers set to debug, and the health checks enabled, that would be great. I'm attaching this utility along with the statistics I've created for the 'node2' log file. The usage is simple: ./process_log.py vdsm.log Please note that this utility will create *a lot* of files in the directory you're currently in. Created attachment 1521512 [details]
process_log.py utility
Created attachment 1521513 [details]
'node2' logs grouped by delta and occurence of log lines
(In reply to Marcin Sobczyk from comment #34) > I can't really reproduce the issue. I have my HE environment up and running, > there are some VMs launched, VDSM's memory usage ascends to ~100 MB but > after ~30 hours it stalls on the same level. > > I was trying to analyze 'node2' log that Chris provided. I wrote a script > that: > > - searches for lines containing delta information given by health check > (i.e. "rss=408532 kB (+152)") > - takes all the log lines that were printed between the current check and > the previous one > - groups the lines by occurence count so we get something like this: > > 31 x (Reactor thread) [Broker.StompAdapter] Processing CONNECT request > (stompreactor:103) > 31 x (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command > received (stompreactor:132) > 4 x (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo > succeeded in 0.00 seconds (__init__:539) > ... > > - prints the grouped data into files named after the delta value (i.e. > "440", "320", etc.) with contents like above > > The deltas span is huge - the smallest are ~80 kB and the largest ~1000 kB. > I was hoping that some pattern would emerge (i.e. for the big ones the log > file would contain calls to some stuff that doesn't occur for the small > ones) but I can't find a correlation. > > Nikolai's log unfortunately doesn't have the health check info, so I can't > really work with it. > 'node2', on the other hand, seems to be written with default logging levels, > so there are no debug messages. > If someone could use my utility in an environment that has both all the > loggers set to debug, and the health checks enabled, that would be great. > > I'm attaching this utility along with the statistics I've created for the > 'node2' log file. > The usage is simple: > > ./process_log.py vdsm.log > > Please note that this utility will create *a lot* of files in the directory > you're currently in. Nikolai, could you please try to reproduce in your environemnt with debug log enabled on all hosts? (In reply to Martin Perina from comment #37) > (In reply to Marcin Sobczyk from comment #34) > > I can't really reproduce the issue. I have my HE environment up and running, > > there are some VMs launched, VDSM's memory usage ascends to ~100 MB but > > after ~30 hours it stalls on the same level. > > > > I was trying to analyze 'node2' log that Chris provided. I wrote a script > > that: > > > > - searches for lines containing delta information given by health check > > (i.e. "rss=408532 kB (+152)") > > - takes all the log lines that were printed between the current check and > > the previous one > > - groups the lines by occurence count so we get something like this: > > > > 31 x (Reactor thread) [Broker.StompAdapter] Processing CONNECT request > > (stompreactor:103) > > 31 x (JsonRpc (StompReactor)) [Broker.StompAdapter] Subscribe command > > received (stompreactor:132) > > 4 x (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getHardwareInfo > > succeeded in 0.00 seconds (__init__:539) > > ... > > > > - prints the grouped data into files named after the delta value (i.e. > > "440", "320", etc.) with contents like above > > > > The deltas span is huge - the smallest are ~80 kB and the largest ~1000 kB. > > I was hoping that some pattern would emerge (i.e. for the big ones the log > > file would contain calls to some stuff that doesn't occur for the small > > ones) but I can't find a correlation. > > > > Nikolai's log unfortunately doesn't have the health check info, so I can't > > really work with it. > > 'node2', on the other hand, seems to be written with default logging levels, > > so there are no debug messages. > > If someone could use my utility in an environment that has both all the > > loggers set to debug, and the health checks enabled, that would be great. > > > > I'm attaching this utility along with the statistics I've created for the > > 'node2' log file. > > The usage is simple: > > > > ./process_log.py vdsm.log > > > > Please note that this utility will create *a lot* of files in the directory > > you're currently in. > > Nikolai, could you please try to reproduce in your environemnt with debug > log enabled on all hosts? I already provided all relevant logs including debug logs in comment #33. At this point due to capacity I can't supply any more data. Tested on latest 4.3.0-0.8 with vdsm-4.30.6-1.el7ev.x86_64 it stabilized on 106600: alma03 ~]# while sleep 60; do echo; date; ps -o rsz= 23584; done Sun Jan 20 15:57:55 IST 2019 105504 Sun Jan 20 16:06:56 IST 2019 105540 Sun Jan 20 16:55:01 IST 2019 105588 Mon Jan 21 07:54:35 IST 2019 106596 Mon Jan 21 07:55:35 IST 2019 106600 Mon Jan 21 07:56:35 IST 2019 106600 Mon Jan 21 09:39:46 IST 2019 106600 alma03 ~]# free -m total used free shared buff/cache available Mem: 32010 19749 5255 97 7005 11584 Swap: 2047 0 2047 I think that up to this point its stopped where it is and not going to rise. Please consider to close this bug as it works as designed. Since the issue is not reproducible by me or QE, we're closing the bug. |