Bug 471279 - gdbm "Can't be writer" on nfs
Summary: gdbm "Can't be writer" on nfs
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: pulseaudio
Version: 10
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lennart Poettering
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-12 20:00 UTC by Thomas J. Baker
Modified: 2008-12-17 19:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-12-17 19:33:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace -s 9999999 -o /tmp/foo pulseaudio (266.49 KB, text/plain)
2008-12-11 18:26 UTC, Enrico Scholz
no flags Details

Description Thomas J. Baker 2008-11-12 20:00:45 UTC
Most often when logging in, pulseaudio fails to start. Looking at the logs, the errors are these:


Nov 12 14:56:42 raptor pulseaudio[19800]: module-stream-restore.c: Failed to open volume database '/net/home/rcc/tjb/.pulse/5efeea4d6741dae157345d44490ad10b:stream-volumes.x86_64-redhat-linux-gnu.gdbm': Can't be writer
Nov 12 14:56:42 raptor pulseaudio[19800]: module.c: Failed to load  module "module-stream-restore" (argument: ""): initialization failed.
Nov 12 14:56:42 raptor pulseaudio[19800]: main.c: Module load failed.
Nov 12 14:56:42 raptor pulseaudio[19800]: main.c: Failed to initialize daemon.
Nov 12 14:56:42 raptor pulseaudio[19798]: main.c: Daemon startup failed.

I am using an nfs mounted home directory and have selinux in enforcing mode but putting it into permissive mode doesn't seem to help. If I blow away the .pulse directory, it starts up fine.

Comment 1 Bug Zapper 2008-11-26 05:15:59 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 2 Lennart Poettering 2008-12-08 18:32:35 UTC
What kind of NFS directory is this? Could you please run PA in "strace" and paste the fragment before the "Can't be writer" message is generated? This should tell us what exactly is the kernel error code that causes gdbm to fail.

Maybe the file name "/net/home/rcc/tjb/.pulse/5efeea4d6741dae157345d44490ad10b:stream-volumes.x86_64-redhat-linux-gnu.gdbm" is incompatible with your NFS server?

Comment 3 Enrico Scholz 2008-12-11 18:26:58 UTC
Created attachment 326645 [details]
strace -s 9999999 -o /tmp/foo pulseaudio

ditto here;

--
W: main.c: High-priority scheduling enabled in configuration but not allowed by policy.
W: core-util.c: setpriority(): Keine Berechtigung
E: module-stream-restore.c: Failed to open volume database '/home/ensc/.pulse/8fe6b107f2977228597415004786e0e3:stream-volumes.x86_64-redhat-linux-gnu.gdbm': Can't be writer
E: module.c: Failed to load  module "module-stream-restore" (argument: ""): initialization failed.
E: main.c: Module load failed.
E: main.c: Failed to initialize daemon.
--

there is no other 'pulseaudio' daemon running, and it works again
after removing ~/.pulse:

$ rm -rf ~/.pulse
$ pulseaudio


After sometime (I see it e.g. after suspend-resume cycle), the
'pulseaudio' daemon dies and the "Can't be writer" error happens
again


I see a '8fe6b107f2977228597415004786e0e3:runtime ->
/tmp/pulse-2vPHqfvCCay2' link in ~/.pulse.  Why not place the database
there?  gdbm on NFS just cries for problems...

Comment 4 Lennart Poettering 2008-12-13 11:08:12 UTC
(In reply to comment #3)

> I see a '8fe6b107f2977228597415004786e0e3:runtime ->
> /tmp/pulse-2vPHqfvCCay2' link in ~/.pulse.  Why not place the database
> there?  gdbm on NFS just cries for problems...

The runtime dir may be deleted at any time. However the stream database is supposed to be persistant. Both the stream db and the runtime dir a per-homedir/machine. The runtime dir needs to allow unix sockets. Hence we put the runtime dir in /tmp and the stream database in the homedir. We include a host identifier  and an arch identifier in the stream db name to make sure that the db is actually per-homedir/machine. Normally that should mean that only ever a single instance of PA accesses the db at the same time. Accesses like that don't need to be synchronized or anything, so there is not much chance for file corruption.

gdbm usually locks the db files completely during open. Is it possible that this is the call that fails for you because you have a misconfigured lockd? If that's the case we shhoulw be able to work around that without any ill effects by simply passing GDBM_NOLOCK to gdbm_open(). 

Could you please check with strace whether the failure happens when gdbm tries to lock? And then if GDBM_NOLOCK fixes the problems for you?

Comment 5 Enrico Scholz 2008-12-14 10:21:58 UTC
I attached strace log already. There,

| open("/home/ensc/.pulse/8fe6b107f2977228597415004786e0e3:stream-volumes.x86_64-redhat-linux-gnu.gdbm", O_RDWR|O_CREAT, 0600) = 25
| fstat(25, {st_mode=S_IFREG|0600, st_size=394972, ...}) = 0
| flock(25, LOCK_EX|LOCK_NB)              = -1 EAGAIN (Resource temporarily unavailable)

happens.  How can I configure pulseaudio to open db with GDBM_NOLOCK?


'lockd' seems to work usually but after some crash situations (program
dies after resume, nfs client tries to reconnect but fails to send the
LOCK_UN as it counts as a new connection) it stays locked forever.

E.g. pulseaudio died after resume (??) with

| E: module-alsa-sink.c: Error opening PCM device front:0: Das Argument ist ung�ltig
| E: sink-input.c: Assertion 'i->thread_info.rewrite_nbytes == 0' failed at pulsecore/sink-input.c:1150, function pa_sink_input_request_rewind(). Aborting.
| Abgebrochen

(first german phrase translates to 'invalid argument', last line to 'aborted')

Comment 6 Lennart Poettering 2008-12-17 19:33:29 UTC
lockf is generally not compatible with NFS (only the newer fcntl() based POSIX locking is NFS compatible). I have now changed upstream to not use lockf anymore and pass GDBM_NOLOCK instead. I will backport this shortly to rawhide and F10.

The other issue looks like an ALSA driver resume issue that PA doesn't know how to handle right. There's already a bug report about the PA issue.


Note You need to log in before you can comment on or make changes to this bug.