555125 – kernel triggers a lot of udev "change" events per second on some CD drives, probably due to faulty media change events from the drive

Bug 555125 - kernel triggers a lot of udev "change" events per second on some CD drives, probably due to faulty media change events from the drive

Summary: kernel triggers a lot of udev "change" events per second on some CD drives, p...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	12
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:	552932
Blocks:
TreeView+	depends on / blocked

Reported:	2010-01-13 18:27 UTC by Lennart Poettering
Modified:	2010-12-04 00:27 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:	552932
Environment:
Last Closed:	2010-12-04 00:27:34 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Lennart Poettering 2010-01-13 18:27:52 UTC

This triggered a memleak issue in PulseAudio which improperly released all memory each time udev sent an event. That issue is now fixed, but the original cause of this issue still exists: there are gazillions of udev events generated for that driver which certainly should not happen.

+++ This bug was initially created as a clone of Bug #552932 +++

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.1.6) Gecko/20091216 Fedora/3.5.6-1.fc12 Firefox/3.5.6

Pulseaudio memory leak after gnome start, leading to complete RAM saturation, swap and so on in a few hours.
This is independant of audio softwares running at the same time.


Reproducible: Always

Steps to Reproduce:
Just start gnome usually.
Actual Results:  
After a few hours, 2 Gb RAM consumed, checked with gnome-system-monitor.

Expected Results:  
pulseaudio staying < 10Mo occupation in RAM space.

List of files given in the archive :
uname : result of uname -a
alsa_info : result of alsa-info.sh --no-upload
lsmod.txt : result of lsmod
lspci.txt : result of lspci
rpm.txt : result of rpm -qa
pulse_log_console.txt : result of pulse -vvvvv after having deactivated pulse auto-restart
gnome_boot.txt : list of the services concerning pulseaudio started at gnome start.
valgrind_pulse.txt : result of G_SLICE=always-malloc G_DEBUG=gc-friendly valgrind -v --tool=memcheck --leak-check=full --num-callers=40 --log-file=valgrind.log pulseaudio -vvvv
valgrind_pulse_error.txt contains the output in the console of the previous command : pulseaudio failed to start so this isn't a good demonstration of the problem ... When starting pulseaudio without valgrind, I don't have this problem.

Check ubuntu forums : seems to be the same problem because of udev and pulseaudio incompatibility (maybe a file descriptor not closed if I remember correctly).

--- Additional comment from sebtemp1 on 2010-01-06 16:29:07 CET ---

Created an attachment (id=382010)
log files

--- Additional comment from lpoetter on 2010-01-06 16:54:03 CET ---

Uh, measuring actual memory consumption is not as simple as you might think:

http://pulseaudio.org/wiki/MemoryConsumption

And that ubuntu bug appears to be about an fd leak, not a memory leak, so I don't think that is related here.

--- Additional comment from sebtemp1 on 2010-01-07 11:33:33 CET ---

Hello,

Thanks for your time, but actually I know the limitation of checking the memory like I did.
I'm not speaking of a 64Mb pb here, I'm speaking of 3Go Ram memory consumption in approximately 10 hours, LEADING TO SWAP on hard drive, even without using any audio software. I don't care about top not being accurate, if there is 3 Go taken and swap, there is a problem, this is not something that could happen just because of preallocation mechanism. This is why I put this bug as urgent : this is not just a crash impacting only pulseaudio, after one day I have to reboot because my laptop is useless.
I already used pulseaudio before, and this is not the correct behavior : normally the memory print is nearly constant over time, even while using audio, < 10Mo according to gnome-system-monitor.

You can take a look at ubuntu bug 424655 :
https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/424655

This is EXACTLY the same behavior.
I was speaking of fd leak because of this (seen in ubuntu 424655) :

One potential explanation: the fd leak made udev completely stop recognizing new hardware on a lot of systems (which was fixed in -6.1). So the fix might uncover another leak by adding/probing sound hw which previously didn't appear in udev (since it simply stopped working because of exceeding the 1024 fd limit)

For those of you who can reproduce this pulseaudio memleak (I can't), it would be interesting to compare an udev dump with both versions. Please install 147~6, do udevadm info --export-db > /tmp/udev.log
attach /tmp/udev.log here. Then upgrade to 147~-6.1, reboot, do the command again, and attach /tmp/udev.log again.

You can have a look at https://bugzilla.redhat.com/show_bug.cgi?id=552932 for nearly the same pb, too.

Sébastien

--- Additional comment from lpoetter on 2010-01-07 19:12:27 CET ---

Those valgrind runs show that PA is leaking exactly no memory.

So, valgrind shows no leak, and top/ps are useful for identifying leaks, so why do you cleam that PA is doing anything it shouldnt do?

What does "pacmd stat" say? It should tell you how much memory PA itself actually thinks it uses.

--- Additional comment from lpoetter on 2010-01-07 19:21:20 CET ---

Apparently, gnome-system-monitor's "Memory" column reflects the number of resident pages in memory. That is mostly just a hint how a process is treated by the swapper. And not surprisingly PA is treated nicely, since it has a negative nice level...

--- Additional comment from lpoetter on 2010-01-07 21:00:22 CET ---

In g-s-m you can activate a "Writable Memory" column. That's way more interesting than "Memory". What does it say for you for PA when you think pa is bahaving badly?

--- Additional comment from sebtemp1 on 2010-01-08 13:21:28 CET ---

I cleam there is a problem mostly because of hard drive swap (swap memory visible vith g-s-m too, always 0, begining to fill after 15 hours because no more RAM, everything getting slow).

pacmd stat shows no problem it seems : 

Welcome to PulseAudio! Use "help" for usage information.
>>> Memory blocks currently allocated: 1, size: 63,9 Kio.
Memory blocks allocated during the whole lifetime: 42, size: 2,1 Mio.
Memory blocks imported from other processes: 0, size: 0 B.
Memory blocks exported to other processes: 0, size: 0 B.
Total sample cache size: 0 B.
Default sample spec: s16le 2ch 44100Hz
Default channel map: front-left,front-right
Default sink name: alsa_output.pci-0000_00_1b.0.analog-stereo
Default source name: alsa_input.pci-0000_00_1b.0.analog-stereo
Memory blocks of type POOL: 1 allocated/1 accumulated.
Memory blocks of type POOL_EXTERNAL: 0 allocated/0 accumulated.
Memory blocks of type APPENDED: 0 allocated/0 accumulated.
Memory blocks of type USER: 0 allocated/0 accumulated.
Memory blocks of type FIXED: 0 allocated/41 accumulated.
Memory blocks of type IMPORTED: 0 allocated/0 accumulated.

The Writable memory column is exactly equal to the Memory column, always increasing slowly, approximately 0.2Mo/second.

As indicated, the test with valgrind may not be representative of the problem, because pulseaudio is immediatly shutting down with valgrind. I joined the log of the problem in the archive.

--- Additional comment from lpoetter on 2010-01-08 14:59:48 CET ---

so neither valgrind nor pa itself seem to think there was a memory leak...

Could you please show a screenshot or so how this looks like after a while?

--- Additional comment from lpoetter on 2010-01-08 15:00:11 CET ---

(In reply to comment #8)
> Could you please show a screenshot or so how this looks like after a while?  

a screenshot of g-s-m that is

--- Additional comment from sebtemp1 on 2010-01-10 11:22:39 CET ---

Created an attachment (id=382775)
Screenshots of the problem

Here you are.
I let it run during the night, the screenshots show the state in the morning. There was already swap, so this is a good example : took 10 seconds to open firefox ...

For your previous answer, I don't think you can conclude that valgrind didn't show there was a memory leak. As I said, pulseaudio was running for 1 or 2 seconds under valgrind and failed. I don't know why. I thought first this was because of SELinux context (not the same entry point because this is launched by valgrind) but I don't have any alert, even while deactivating silent denials.

I joined the error in valgrind_pulse_error.txt in my first post, the main thing is :
E: module-console-kit.c: GetSessionsForUnixUser() call failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
E: module.c: Failed to load  module "module-console-kit" (argument: ""): initialization failed.
E: main.c: Module load failed.
E: main.c: Échec lors de l'initialisation du démon

--- Additional comment from lpoetter on 2010-01-11 18:07:47 CET ---

Hmm, interesting. This indeed doesn't look right.

Do you think you would be able to run PA through valgrind's massif tool? This should give us a hint what leaks the memory here.

--- Additional comment from sebtemp1 on 2010-01-11 20:13:43 CET ---

I wouuld like to do it, but currently I'm stuck with this dbus problem.
I don't see the reason why pulseaudio is able to contact ConsoleKit via Dbus when launched from a terminal and not when launched by valgrind.

--- Additional comment from lpoetter on 2010-01-11 23:30:50 CET ---

(please do not attach compresses logs, makes it hard to read)

Looks like a CK incompat with valgrind. No clue. I'd probably just suggest commenting the loading of that module in /etc/pulse/default.pa while debugging this. It should strictly be necessary as long as you don't switch sessions too often ;-)

--- Additional comment from sebtemp1 on 2010-01-12 00:50:41 CET ---

Created an attachment (id=383124)
Valgrind massif result

This is the result of :
valgrind --trace-children=yes --tool=massif --alloc-fn=g_malloc --alloc-fn=g_realloc --alloc-fn=g_try_malloc --alloc-fn=g_malloc0 --alloc-fn=g_mem_chunk_alloc pulseaudio

It ran around 10 minutes.

--- Additional comment from sebtemp1 on 2010-01-12 00:53:03 CET ---

Ok, I hope the previous attachment is good : I'm not an expert in valgrind options !
Anyway, indeed no more errors after commenting the module, and I didn't compress the log this time ;)

--- Additional comment from lpoetter on 2010-01-12 16:59:30 CET ---

Hmm, this run seems to lack debug symbols. Could you install those?

Hmm, could you also run "ms_print" on the output this generates? I am already on F13 and lack the right setup to run that the same way on F12.

--- Additional comment from lpoetter on 2010-01-12 23:11:25 CET ---

This could be your fix:

http://git.0pointer.de/?p=pulseaudio.git;a=patch;h=15a3018bda8ee92393671ceaf272be418e95a81f

Fixes a memleak when devices we are not interested in come and go. Normally those events should be suppressed by udev anyway and even if they are not this should happen very seldomly, since most folks dont keep unplugging/replugging their mice all the time. ;-)

So, if this really your fix this would suggest that for some reason (virtual?) devices come and go all the time on your machine. Could you check that by running "udevadm monitor" for a while? Is there a lot going on?

--- Additional comment from sebtemp1 on 2010-01-13 00:26:24 CET ---

Created an attachment (id=383373)
ms_print output with pulseaudio debuginfo symbols

Here you are ;)
I installed the symbols only for pulseaudio rpm, not the other packages for the moment.

You seem to be right for the other test : udevadm returns the following lines :

UDEV  [1263251760.702821] change   /devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/block/sr0 (block)
KERNEL[1263251760.707446] change   /devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0 (scsi)
UDEV  [1263251760.707464] change   /devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0 (scsi)
KERNEL[1263251760.707481] change   /devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/block/sr0 (block)
UDEV  [1263251760.777527] change   /devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/block/sr0 (block)
...

I did a capture of the output of udevadm during 1 minute, and I have 3216 lines like these in the file. And i did not deco/reco my mice 3216 times in 1 minute for sure ;)

--- Additional comment from lpoetter on 2010-01-13 00:39:07 CET ---

Ah, nice. There's probably something wrong with your CD drive. Could you redo the udev monitor call and add a "--property"? That should tell us what exactly changed on those devices when those "change" events are triggered.

Given that indeed there is quite some udev traffic going on I am quite sure that the aforementioned patch fixes that issue.

Now, as mentioned there is definitely a bug somewhere else, since that udev traffic should definitely *not* happen.

--- Additional comment from sebtemp1 on 2010-01-13 18:55:06 CET ---

Created an attachment (id=383529)
Result of udevadm monitor --property

Ok, seems to be CD drive as you said !
But I'm not sure of what happens in there ...

For pulseaudio, I'll try to find an SRPM and recompile it with the modif.

--- Additional comment from lpoetter on 2010-01-13 19:20:07 CET ---

OK, talked to Kay Sievers now. This seems to be a firmware problem in the drive. The drive sends out media changed events all the time. And it should not. There probably needs to be added some framework in the kernel that hose events can be suppressed.

I will now close this bug, since the memleak in PA is actually fixed. I'll then duplicate the bug and assign to the kernel, since there really needs to be done something about the firmware/kernel bug that triggered this memleak issue in PA.

Comment 1 Lennart Poettering 2010-01-13 18:30:27 UTC

sebtemp1, I think it would be good if you could post the output of hdparm -i /dev/sr0 here.

Comment 2 sebtemp1 2010-01-13 19:37:08 UTC

Oki, here is the result :

[root@Dune ~]# hdparm -i /dev/sr0

/dev/sr0:

 Model=Optiarc, FwRev=LX01, SerialNo=30656340
 Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic }
 RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=0
 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 
 AdvancedPM=no
 Drive conforms to: unknown:  ATA/ATAPI-3,4,5,6,7

 * signifies the current active mode

Comment 3 Bug Zapper 2010-11-04 01:04:13 UTC

This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Bug Zapper 2010-12-04 00:27:34 UTC

Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.