Bug 510797 - Cannot use recursive find statement on GFS v1 volume
Cannot use recursive find statement on GFS v1 volume
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod (Show other bugs)
5.2
All Linux
low Severity high
: rc
: ---
Assigned To: Robert Peterson
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-10 15:36 EDT by Michael Worsham
Modified: 2010-01-11 22:31 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-08-03 10:32:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
CPU Usage during 'find' script run (36.16 KB, image/png)
2009-07-13 08:20 EDT, Michael Worsham
no flags Details
EMC SAN Device I/O during 'find' script run (29.47 KB, image/png)
2009-07-13 08:26 EDT, Michael Worsham
no flags Details

  None (edit)
Description Michael Worsham 2009-07-10 15:36:03 EDT
Description of problem:
We have a 4-node GFS v1 cluster running on RHEL 5.2. The GFS infrastructure is hooked into a EMC SAN array. When I attempt to run the command 'find /web -user old.name -print -exec chown new.name {} \;', the GFS environment hangs and the server load skyrockets. 

Reason for Need:
Under the /web/ directory are hundreds to thousands of files and directories that need to have their user id modified to a new id due to someone recently leaving our development team staff.

Version-Release number of selected component (if applicable):
findutils-4.2.27-4.1
gfs2-utils-0.1.44-1.el5_2.1
kmod-gfs-PAE-0.1.23-5.el5_2.2
gfs-utils-0.1.17-1.el5
kmod-gfs2-PAE-1.92-1.1.el5_2.2
kmod-gfs-PAE-0.1.23-5.el5_2.4

How reproducible:
Always 

Steps to Reproduce:
1. Run 'find /web myfile' <-- This works
2. Run 'find /web -user old.name' <-- This hangs
3. Run 'find /web -uid 5000' <-- This hangs
4. Run 'find /web -user old.name -print -exec chown new.name {} \;' <-- This hangs
 
Actual results:
CPU skyrockets, node becomes unresponsive (able to still use Ctrl+C to break) and server/filesystem becomes a bit sluggish as well. 

As per Red Hat Support (SR# 1934308):
When GFS reads the file descriptor it has to create a lock for each file it reads.  In your case you have lots of small files, this creates locks that are waiting on locks which causes the appearance that the command has locked up.
Comment 1 Steve Whitehouse 2009-07-10 16:20:48 EDT
It seems the common element in this is that the hanging commands require a stat call, so in otherwords a lot more I/O is generated.
Comment 2 Steve Whitehouse 2009-07-10 16:30:13 EDT
The question is which of the nodes are using which subsets of the /web directory? I presume that they are not all using the complete tree? If so then it will just be slow, but it should complete eventually.
Comment 3 Michael Worsham 2009-07-10 16:32:42 EDT
Just so you can get an idea of the output, we tried the following command:

'find /web/jacksonville.com -user old.name -print'

At the screen, this just sits there forever and never returns a response. In fact, you have to Ctrl+C to break out of it.

But when we use 'strace find /web/jacksonville.com -user old.name -print', we get the following output:

stat64(".", {st_mode=S_IFDIR|0775, st_size=3864, ...}) = 0
open("..", O_RDONLY|O_LARGEFILE|O_NOFOLLOW) = 5
fchdir(5)                               = 0
close(5)                                = 0
lstat64("Florida16.JPG", {st_mode=S_IFREG|0664, st_size=652334, ...}) = 0
lstat64("Gardening_50.jpg", {st_mode=S_IFREG|0664, st_size=516911, ...}) = 0
lstat64("eg0215.jpg", {st_mode=S_IFREG|0664, st_size=92837, ...}) = 0
lstat64("web_SingingRain.jpg", {st_mode=S_IFREG|0664, st_size=164874, ...}) = 0
lstat64("u16001", {st_mode=S_IFDIR|0775, st_size=3864, ...}) = 0
open("u16001", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 5
fstat64(5, {st_mode=S_IFDIR|0775, st_size=3864, ...}) = 0
fcntl64(5, F_SETFD, FD_CLOEXEC)         = 0
getdents64(5, /* 2 entries */, 4096)    = 48
getdents64(5, /* 0 entries */, 4096)    = 0
close(5)                                = 0
Comment 4 Michael Worsham 2009-07-10 16:36:03 EDT
Additional information via gfs_tool...

[root@app015 ~]# gfs_tool counters /web/jacksonville.com

                                  locks 97932
                             locks held 50523
                           freeze count 0
                          incore inodes 48920
                       metadata buffers 58
                        unlinked inodes 0
                              quota IDs 0
                     incore log buffers 0
                         log space used 0.20%
              meta header cache entries 0
                     glock dependencies 0
                 glocks on reclaim list 0
                              log wraps 5825
                   outstanding LM calls 0
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 0
                         glock nq calls 0
                         glock dq calls 0
                   glock prefetch calls 1493214107
                          lm_lock calls 1789092649
                        lm_unlock calls 1626987709
                           lm callbacks 0
                     address operations 86175728
                      dentry operations 0
                      export operations 0
                        file operations 1481983202
                       inode operations 0
                       super operations 936230891
                          vm operations 26270743
                        block I/O reads 0
                       block I/O writes 0

[root@app015 ~]# gfs_tool stat /web/jacksonville.com
  mh_magic = 0x01161970
  mh_type = 4
  mh_generation = 268576
  mh_format = 400
  mh_incarn = 0
  no_formal_ino = 25
  no_addr = 25
  di_mode = 0755
  di_uid = 0
  di_gid = 0
  di_nlink = 7
  di_size = 3864
  di_blocks = 1
  di_atime = 1247255070
  di_mtime = 1247257904
  di_ctime = 1247257904
  di_major = 0
  di_minor = 0
  di_rgrp = 0
  di_goal_rgrp = 0
  di_goal_dblk = 0
  di_goal_mblk = 0
  di_flags = 0x00000001
  di_payload_format = 1200
  di_type = 2
  di_height = 0
  di_incarn = 0
  di_pad = 0
  di_depth = 0
  di_entries = 8
  no_formal_ino = 0
  no_addr = 0
  di_eattr = 0
  di_reserved =
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 

Flags:
  jdata
Comment 5 Michael Worsham 2009-07-10 16:37:13 EDT
df output...

[root@app015 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p2      4061572    370340   3481584  10% /
/dev/mapper/vg00-opt   1015704    129676    833600  14% /opt
/dev/mapper/vg00-tmp   1015704     34116    929160   4% /tmp
/dev/mapper/vg00-usr   4062912   1417492   2435708  37% /usr
/dev/mapper/vg00-home
                       4955776   1020912   3678916  22% /home
/dev/mapper/vg00-var   1982288    289608   1590292  16% /var
/dev/mapper/vg00-varrtte
                       1015704    531408    431868  56% /var/rotate
/dev/mapper/vg00-varweb
                      20158332    454028  18680304   3% /var/web
/dev/cciss/c0d0p1       194442     23290    161113  13% /boot
tmpfs                  4089492         0   4089492   0% /dev/shm
/dev/mapper/vgsmsapp-phptmp
                        466768        36    466732   1% /web/phptmp
/dev/mapper/vgsmsapp-webjax
                      18381472   9549580   8831892  52% /web/jacksonville.com
/dev/mapper/vgsmsapp-webcjo
                      18381472   2040876  16340596  12% /web/cjonline.com
/dev/mapper/vgsmsapp-webcon
                      17823680    169852  17653828   1% /web/thecabin.net
Comment 6 Michael Worsham 2009-07-13 08:20:46 EDT
Created attachment 351467 [details]
CPU Usage during 'find' script run
Comment 7 Michael Worsham 2009-07-13 08:26:44 EDT
Created attachment 351469 [details]
EMC SAN Device I/O during 'find' script run
Comment 8 Michael Worsham 2009-07-13 08:31:13 EDT
Actually, I let the 'find' script run over the weekend. The script completed successfully and I verified the contents this morning.
 
In light of this, I am attaching a graph from the Device I/O from our SAN array that shows where it took a lot longer than 15 minutes to actually 'think' then start parsing the data.
 
Just for metric/stat needs, I started the 'find' script at 16:00 (4:00pm EST) and it seems the script finally completed it run at around 17:30 (5:30pm EST).

It might be no longer classified as a 'bug', but I do think it is at least an issue for using the find utility and GFS v1 environments because it doesn't give any type of message in relation to overall disk I/O or possible GFS locks that are occurring.
Comment 9 Robert Peterson 2009-08-03 10:32:26 EDT
Unfortunately, this is a known performance problem and the worst-
case scenario for GFS performance: doing lstat on millions of small
files.  There's not much we can do to increase the performance of
it at this time.  As a circumvention you might be able to umount
the file system from all nodes, then mount it on one node with
"-o lockproto=lock_nolock", perform your find commands, then
unmount it, eventually remounting everything normally again.

Note You need to log in before you can comment on or make changes to this bug.