Bug 452333 - stale blkid cache causes unmountable usb keys.
Summary: stale blkid cache causes unmountable usb keys.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: 9
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-21 00:51 UTC by Dave Jones
Modified: 2015-01-04 22:30 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2008-11-02 21:02:20 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Debian BTS 487758 0 None None None Never
Debian BTS 487783 0 None None None Never

Description Dave Jones 2008-06-21 00:51:52 UTC
I insert a usb key formatted as vfat with the label 'key'.
dmesg shows..

sb 1-2.4: New USB device found, idVendor=0781, idProduct=5150
usb 1-2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-2.4: Product: Cruzer Mini
usb 1-2.4: Manufacturer: SanDisk Corporation
usb 1-2.4: SerialNumber: SNDK8BF4240517406901
usb-storage: device scan complete
scsi 12:0:0:0: Direct-Access     SanDisk  Cruzer Mini      0.1  PQ: 0 ANSI: 2
sd 12:0:0:0: [sdl] 501759 512-byte hardware sectors (257 MB)
sd 12:0:0:0: [sdl] Write Protect is off
sd 12:0:0:0: [sdl] Mode Sense: 03 00 00 00
sd 12:0:0:0: [sdl] Assuming drive cache: write through
sd 12:0:0:0: [sdl] 501759 512-byte hardware sectors (257 MB)
sd 12:0:0:0: [sdl] Write Protect is off
sd 12:0:0:0: [sdl] Mode Sense: 03 00 00 00
sd 12:0:0:0: [sdl] Assuming drive cache: write through
 sdl: sdl1
sd 12:0:0:0: [sdl] Attached SCSI removable disk
sd 12:0:0:0: Attached scsi generic sg12 type 0


my fstab has..
LABEL=key		/media/key		vfat	noatime,user,exec,noauto	0 0

I run mount /media/key and I get..

mount: special device /dev/sdh1 does not exist

sdh is a slot on my multi-card-reader.  sdh isn't referred to in my fstab at all.

gvfs-mount -li shows that hal has the right knowledge about the key..

Drive(6): USB Drive
  ids:
   hal-udi:
'/org/freedesktop/Hal/devices/storage_serial_SanDisk_Cruzer_Mini_SNDK8BF4240517406901_0_0'
   unix-device: '/dev/sdl'
  is_media_removable=1
  has_media=1
  is_media_check_automatic=1
  can_poll_for_media=1
  can_eject=0
  Volume(0): key
    ids:
     hal-udi: '/org/freedesktop/Hal/devices/volume_uuid_3B69_1AFD'
     unix-device: '/dev/sdl1'
     label: 'key'
     uuid: '3B69-1AFD'
    uuid=3B69-1AFD
    themed icons:  [drive-removable-media-usb]  [drive-removable-media] 
[drive-removable]  [drive]
    can_mount=1
    can_eject=0

I run strace, and note that it's reading /etc/blkid/blkid.tab which contains..

<device DEVNO="0x0871" TIME="1213465324" LABEL="key" UUID="3B69-1AFD"
TYPE="vfat" SEC_TYPE="msdos">/dev/sdh1</device>
<device DEVNO="0x08b1" TIME="1201024493" SEC_TYPE="msdos" LABEL="key"
UUID="3B69-1AFD" TYPE="vfat">/dev/sdl1</device>

The first must be from a previous boot when for whatever reason, it got picked
up before the card reader did.

I read about the -g 'Perform a garbage collection pass on the blkid cache.'
which sounds great! Only to find..

$ sudo blkid -g
Segmentation fault

Sadness.

Removing the entry by hand from the blkid.tab allows me to mount my usb key again.

So, should something be running blkid -g during boot up ?
and also, if so, we should probably make it not segfault.

This isn't the only duplicate in my blkid.tab. There's at least a bunch of them
that are clearly dupes, and not all of them are even removable.

Comment 1 Dave Jones 2008-06-21 00:57:10 UTC
Here's the output of blkid -g from gdb  ..

(gdb) run -g
Starting program: /sbin/blkid -g

Program received signal SIGSEGV, Segmentation fault.
blkid_free_dev (dev=<value optimized out>) at ../../lib/blkid/list.h:90
90		prev->next = next;
Missing separate debuginfos, use: debuginfo-install glibc.x86_64
libselinux.x86_64 libsepol.x86_64 lvm2.x86_64
(gdb) bt
#0  blkid_free_dev (dev=<value optimized out>) at ../../lib/blkid/list.h:90
#1  0x00000030c7003ac7 in blkid_gc_cache (cache=<value optimized out>) at
cache.c:170
#2  0x00000000004016ae in main (argc=2, argv=0x7fffeb2e9748) at blkid.c:216


Comment 2 Eric Sandeen 2008-06-21 03:53:08 UTC
Patch for the segfault bit put into F9 and devel and sent upstream anyway...

Comment 3 Eric Sandeen 2008-06-21 04:02:13 UTC
At first glance it seems like blkid_find_dev_with_tag() and the things it calls
(which is a path we go down when it's trying to mount-by-label) is doing the
right thing, trying to validate the entry, removing if it doesn't find it,
starting over, etc.  I'll try recreating it and see what's going on.

Comment 4 Eric Sandeen 2008-06-21 04:59:48 UTC
Ok there's a problem in blkid_verify():

        if (((probe.fd = open(dev->bid_name, O_RDONLY)) < 0) ||
            (fstat(probe.fd, &st) < 0)) {
                if (probe.fd >= 0) close(probe.fd);
                if ((errno != EPERM) && (errno != EACCES) &&
                    (errno != ENOENT)) { 
                        DBG(DEBUG_PROBE, 
                            printf("blkid_verify: error %s (%d) while "
                                   "opening %s\n", strerror(errno), errno, 
                                   dev->bid_name));
                        blkid_free_dev(dev);
                        return NULL;
                }
                /* We don't have read permission, just return cache data. */
                DBG(DEBUG_PROBE,
                    printf("returning unverified data for %s\n",
                           dev->bid_name));
                return dev;

We find the bad device, and stat it - if the device doesn't exist, we get
ENOENT.  But we return the stale data for the nonexistent device anyway.  Eh?

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commitdiff;h=8bcaaabb1a023af4852dbf0dba76249982c62e40

did this:

When a nonprivileged user uses the blkid command, we want to keep the
cached filesystem information, and opening a device file could result
in an EACCESS or ENOENT (if an intervening directory is mode 700).  We
were previously testing for EPERM, which was really the wrong error
code to be testing against.

but I'm not sure about the ENOENT part ...this seems wrong.  We find a device in
the cache, stat it, get ENOENT and return it anyway?

Comment 5 Eric Sandeen 2008-07-06 21:39:54 UTC
Ted has some recent commits claiming to fix the rest, see also debian bugs
487758 and 487783.  I'll recreate & give it a whirl.

Comment 6 Eric Sandeen 2008-07-31 02:40:07 UTC
I've pushed 1.41.0 to f9 testing, it should address this.

Comment 7 Eric Sandeen 2008-11-02 21:02:20 UTC
1.41.0 is in stable now; this should be resolved (guess I forgot the bug nr. in the push for the bodhi-magic)


Note You need to log in before you can comment on or make changes to this bug.