Bug 488663 - hald fails with stale or empty /var/cache/hald/fdi-cache
Summary: hald fails with stale or empty /var/cache/hald/fdi-cache
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: hal
Version: 10
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Richard Hughes
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-05 03:46 UTC by Chuck Ebbert
Modified: 2009-12-18 08:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-18 08:57:17 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
patch to fix the problem (813 bytes, patch)
2009-03-05 09:43 UTC, Richard Hughes
no flags Details | Diff

Description Chuck Ebbert 2009-03-05 03:46:14 UTC
Description of problem:
With 0-byte /var/cache/hald/fdi-cache, hald tries to mmap 0 bytes and fails.
hald then fails to start and no input devices work.

With stale file (left from previous shutdown) hald appears to start but both connected displays are blank when X starts (just the backlights are on.) This only happens on one machine.

Version-Release number of selected component (if applicable):
hal-0.5.12-14.20081027git.fc10.x86_64

How reproducible:
Stale file causes failure every time. Workaround was to edit the hald startup script to erase fdi-cache before starting the service. This fixes both the empty-file and stale-file failure cases.

A zero-byte fdi-cache file sometimes gets left behind when restarting the system after X starts with blank screens. This causes the other failure mode on the next boot.


Steps to Reproduce:
1. start system
2. use xrandr to turn off internal display and change resolution of external
3. reboot and get blank displays
4. turning off power after X fails to start can cause the 0-byte fdi-cache
  
Actual results:

Additional info:
strace of startup with 0-byte fdi-cache... looks like it also prints the wrong error message because the ioctl following the mmap also fails:

open("/var/cache/hald/fdi-cache", O_RDONLY) = 16
fstat(16, {st_dev=makedev(8, 8), st_ino=107580, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0,
mmap(NULL, 0, PROT_READ, MAP_SHARED, 16, 0) = -1 EINVAL (Invalid argument)
fstat(1, {st_dev=makedev(0, 15), st_ino=381, st_mode=S_IFCHR|0666, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff5ae02580) = -1 ENOTTY (Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1f52d00000
unlink("/var/run/haldaemon.pid")        = -1 EACCES (Permission denied)
write(1, "*** [DIE] mmap_cache.c:di_rules_init():79 : Couldn't mmap file '/var/cache/hald/fdi-cache', errno=25: Inappropriate ioctl
exit_group(1)                           = ?

Comment 1 Richard Hughes 2009-03-05 09:39:21 UTC
(In reply to comment #0)
> Stale file causes failure every time. Workaround was to edit the hald startup
> script to erase fdi-cache before starting the service. This fixes both the
> empty-file and stale-file failure cases.

We don't want to do that -- it would significantly slow down the startup.

> A zero-byte fdi-cache file sometimes gets left behind when restarting the
> system after X starts with blank screens. This causes the other failure
> mode on the next boot.

I've never experienced this. The cache file should only be re-written when hal starts and finds new mtimes on the fdi directories, rather than at X startup. Can you nuke the 0 byte file, recreate it and then find out exactly when the file is written to zero bytes?

> Steps to Reproduce:
> 1. start system
> 2. use xrandr to turn off internal display and change resolution of external
> 3. reboot and get blank displays
> 4. turning off power after X fails to start can cause the 0-byte fdi-cache

I can't reproduce this -- what hardware are you using?

> Additional info:
> strace of startup with 0-byte fdi-cache... looks like it also prints the wrong
> error message because the ioctl following the mmap also fails:

I'll attach a patch that should fix things up in this case.

Comment 2 Richard Hughes 2009-03-05 09:43:49 UTC
Created attachment 334115 [details]
patch to fix the problem

This is what I've applied upstream. I'll add this to the rawhide rpm if you want me to.

Comment 3 Chuck Ebbert 2009-03-05 19:39:04 UTC
(In reply to comment #1)
> > Steps to Reproduce:
> > 1. start system
> > 2. use xrandr to turn off internal display and change resolution of external
> > 3. reboot and get blank displays
> > 4. turning off power after X fails to start can cause the 0-byte fdi-cache
> 
> I can't reproduce this -- what hardware are you using?

Acer Aspire 5735 notebook:
http://www.smolts.org/client/show/pub_b177e97d-6b2e-4604-b4ec-1d4c360beeaf

I can't really try to reproduce the 0-byte file ... it means booting up and killing the system with the power switch -- I really don't want to do that.

Comment 4 Josua Dietze 2009-06-20 18:38:38 UTC
This happened to me and several other users during installation of FC11.

See
http://www.fedoraforum.org/forum/showthread.php?t=223372

Comment 5 Mario Lopez 2009-07-06 07:31:49 UTC
In my case this exactly behaviour was due to unrecognised wireless card after LiveCD install. 

HAL was not starting, during boot appears as FAILED, and when start in hand, the message was related to "fdi cache".

gdb hald 
run --daemon=no --verbose=yes

RESULT
*** [DIE] mmap_cache.c:di_rules_init():79 : Couldn't mmap file '/var/cache/hald/fdi-cache', errno=22: Invalid argument

*****************

Reading this post, proceed to delete fdi_cache and HAL start, but starting Xorg, the system freeze.

Here you find more information:

http://mariotpc.blogspot.com/2009/07/fedora-11-keyboard-and-mouse-not.html

Comment 6 Bug Zapper 2009-11-18 11:16:59 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Bug Zapper 2009-12-18 08:57:17 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.