Bug 452333

Summary: stale blkid cache causes unmountable usb keys.
Product: [Fedora] Fedora Reporter: Dave Jones <davej>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 9CC: kzak, oliver, pfrields
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-02 21:02:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Jones 2008-06-21 00:51:52 UTC
I insert a usb key formatted as vfat with the label 'key'.
dmesg shows..

sb 1-2.4: New USB device found, idVendor=0781, idProduct=5150
usb 1-2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-2.4: Product: Cruzer Mini
usb 1-2.4: Manufacturer: SanDisk Corporation
usb 1-2.4: SerialNumber: SNDK8BF4240517406901
usb-storage: device scan complete
scsi 12:0:0:0: Direct-Access     SanDisk  Cruzer Mini      0.1  PQ: 0 ANSI: 2
sd 12:0:0:0: [sdl] 501759 512-byte hardware sectors (257 MB)
sd 12:0:0:0: [sdl] Write Protect is off
sd 12:0:0:0: [sdl] Mode Sense: 03 00 00 00
sd 12:0:0:0: [sdl] Assuming drive cache: write through
sd 12:0:0:0: [sdl] 501759 512-byte hardware sectors (257 MB)
sd 12:0:0:0: [sdl] Write Protect is off
sd 12:0:0:0: [sdl] Mode Sense: 03 00 00 00
sd 12:0:0:0: [sdl] Assuming drive cache: write through
 sdl: sdl1
sd 12:0:0:0: [sdl] Attached SCSI removable disk
sd 12:0:0:0: Attached scsi generic sg12 type 0


my fstab has..
LABEL=key		/media/key		vfat	noatime,user,exec,noauto	0 0

I run mount /media/key and I get..

mount: special device /dev/sdh1 does not exist

sdh is a slot on my multi-card-reader.  sdh isn't referred to in my fstab at all.

gvfs-mount -li shows that hal has the right knowledge about the key..

Drive(6): USB Drive
  ids:
   hal-udi:
'/org/freedesktop/Hal/devices/storage_serial_SanDisk_Cruzer_Mini_SNDK8BF4240517406901_0_0'
   unix-device: '/dev/sdl'
  is_media_removable=1
  has_media=1
  is_media_check_automatic=1
  can_poll_for_media=1
  can_eject=0
  Volume(0): key
    ids:
     hal-udi: '/org/freedesktop/Hal/devices/volume_uuid_3B69_1AFD'
     unix-device: '/dev/sdl1'
     label: 'key'
     uuid: '3B69-1AFD'
    uuid=3B69-1AFD
    themed icons:  [drive-removable-media-usb]  [drive-removable-media] 
[drive-removable]  [drive]
    can_mount=1
    can_eject=0

I run strace, and note that it's reading /etc/blkid/blkid.tab which contains..

<device DEVNO="0x0871" TIME="1213465324" LABEL="key" UUID="3B69-1AFD"
TYPE="vfat" SEC_TYPE="msdos">/dev/sdh1</device>
<device DEVNO="0x08b1" TIME="1201024493" SEC_TYPE="msdos" LABEL="key"
UUID="3B69-1AFD" TYPE="vfat">/dev/sdl1</device>

The first must be from a previous boot when for whatever reason, it got picked
up before the card reader did.

I read about the -g 'Perform a garbage collection pass on the blkid cache.'
which sounds great! Only to find..

$ sudo blkid -g
Segmentation fault

Sadness.

Removing the entry by hand from the blkid.tab allows me to mount my usb key again.

So, should something be running blkid -g during boot up ?
and also, if so, we should probably make it not segfault.

This isn't the only duplicate in my blkid.tab. There's at least a bunch of them
that are clearly dupes, and not all of them are even removable.

Comment 1 Dave Jones 2008-06-21 00:57:10 UTC
Here's the output of blkid -g from gdb  ..

(gdb) run -g
Starting program: /sbin/blkid -g

Program received signal SIGSEGV, Segmentation fault.
blkid_free_dev (dev=<value optimized out>) at ../../lib/blkid/list.h:90
90		prev->next = next;
Missing separate debuginfos, use: debuginfo-install glibc.x86_64
libselinux.x86_64 libsepol.x86_64 lvm2.x86_64
(gdb) bt
#0  blkid_free_dev (dev=<value optimized out>) at ../../lib/blkid/list.h:90
#1  0x00000030c7003ac7 in blkid_gc_cache (cache=<value optimized out>) at
cache.c:170
#2  0x00000000004016ae in main (argc=2, argv=0x7fffeb2e9748) at blkid.c:216


Comment 2 Eric Sandeen 2008-06-21 03:53:08 UTC
Patch for the segfault bit put into F9 and devel and sent upstream anyway...

Comment 3 Eric Sandeen 2008-06-21 04:02:13 UTC
At first glance it seems like blkid_find_dev_with_tag() and the things it calls
(which is a path we go down when it's trying to mount-by-label) is doing the
right thing, trying to validate the entry, removing if it doesn't find it,
starting over, etc.  I'll try recreating it and see what's going on.

Comment 4 Eric Sandeen 2008-06-21 04:59:48 UTC
Ok there's a problem in blkid_verify():

        if (((probe.fd = open(dev->bid_name, O_RDONLY)) < 0) ||
            (fstat(probe.fd, &st) < 0)) {
                if (probe.fd >= 0) close(probe.fd);
                if ((errno != EPERM) && (errno != EACCES) &&
                    (errno != ENOENT)) { 
                        DBG(DEBUG_PROBE, 
                            printf("blkid_verify: error %s (%d) while "
                                   "opening %s\n", strerror(errno), errno, 
                                   dev->bid_name));
                        blkid_free_dev(dev);
                        return NULL;
                }
                /* We don't have read permission, just return cache data. */
                DBG(DEBUG_PROBE,
                    printf("returning unverified data for %s\n",
                           dev->bid_name));
                return dev;

We find the bad device, and stat it - if the device doesn't exist, we get
ENOENT.  But we return the stale data for the nonexistent device anyway.  Eh?

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commitdiff;h=8bcaaabb1a023af4852dbf0dba76249982c62e40

did this:

When a nonprivileged user uses the blkid command, we want to keep the
cached filesystem information, and opening a device file could result
in an EACCESS or ENOENT (if an intervening directory is mode 700).  We
were previously testing for EPERM, which was really the wrong error
code to be testing against.

but I'm not sure about the ENOENT part ...this seems wrong.  We find a device in
the cache, stat it, get ENOENT and return it anyway?

Comment 5 Eric Sandeen 2008-07-06 21:39:54 UTC
Ted has some recent commits claiming to fix the rest, see also debian bugs
487758 and 487783.  I'll recreate & give it a whirl.

Comment 6 Eric Sandeen 2008-07-31 02:40:07 UTC
I've pushed 1.41.0 to f9 testing, it should address this.

Comment 7 Eric Sandeen 2008-11-02 21:02:20 UTC
1.41.0 is in stable now; this should be resolved (guess I forgot the bug nr. in the push for the bodhi-magic)