Bug 615907 - [lvm2] failure in lvm: rename failed: File exists\n
[lvm2] failure in lvm: rename failed: File exists\n
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.1
All Linux
high Severity medium
: rc
: ---
Assigned To: Peter Rajnoha
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-19 05:05 EDT by Haim
Modified: 2014-01-12 19:46 EST (History)
17 users (show)

See Also:
Fixed In Version: lvm2-2.02.82-1.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 10:26:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Haim 2010-07-19 05:05:33 EDT
Description of problem:

while creating vm pool (lvcreate commands that create vms) I see lots of errors in vdsm log says lvm failed to rename vg as it already exists. 
attached some of the log:

'/usr/bin/sudo -n /sbin/lvcreate --config " devices { preferred_names = [\
\"^/dev/mapper/\\"] write_cache_state=0 filter = [ \\"a%/dev/mapper/14f504e46494c4500354c376e50522d696430312d664e3962|/dev/mapper/14f504e46494c450074726662556e
2d424132492d79454a42|/dev/mapper/14f504e46494c45007632334e45362d35536d302d50634569%\\", \\"r%.*%\\" ] }  backup {  retain_min = 50  retain_days = 0 } " --autob
ackup n --contiguous n --size 1024M --name 49d88b3f-7900-4fea-8661-06af7447f4ae aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f' (cwd None)


'/usr/bin/sudo -n /sbin/vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] write_cache_state=0 filter = [ \\"a%/dev/mapper/14f504e46494c4500354c376e50522d696430312d664e3962|/dev/mapper/14f504e46494c450074726662556e2d424132492d79454a42|/dev/mapper/14f504e46494c45007632334e45362d35536d302d50634569%\\", \\"r%.*%\\" ] }  backup {  retain_min = 50  retain_days = 0 } " --noheadings -o pv_name aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f' (cwd None)


lvm dump of system state before and after the error can be found under: 

/net/orion.tlv.redhat.com/export/public/qe-integration/logs

This issue is reproducible in my environment when I crate new vms

Version-Release number of selected component (if applicable):

lvm2-2.02.69-2.el6.x86_64
vdsm-4.9-10.el6.x86_64

create several vms (lvs).
Comment 1 Haim 2010-07-19 05:10:20 EDT
and the error itself: 

'/usr/bin/sudo -n /sbin/lvs --config " devices { preferred_names = [\\"^/d
ev/mapper/\\"] write_cache_state=0 filter = [ \\"a%/dev/mapper/14f504e46494c4500354c376e50522d696430312d664e3962|/dev/mapper/14f504e46494c450074726662556e2d424
132492d79454a42|/dev/mapper/14f504e46494c45007632334e45362d35536d302d50634569%\\", \\"r%.*%\\" ] }  backup {  retain_min = 50  retain_days = 0 } " --noheadings
 -o tags aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f/8022d143-247d-4205-9db9-9c72fa69e2de' (cwd None)
8409080c-1a2b-470a-9b95-d0760ee9f601::DEBUG::2010-07-19 11:20:16,739::misc::111::irs::SUCCESS: <err> = '  /etc/lvm/archive/.lvm_silver-vdse.qa.lab.tlv.redhat.com_6006_975457195: rename to /etc/lvm/archive/aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f_04386.vg failed: File exists\n  /etc/lvm/archive/.lvm_silver-vdse.qa.lab.tlv.redhat.com_6006_1396122488: rename to /etc/lvm/archive/aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f_04388.vg failed: File exists\n'; <rc> = 0
Comment 3 RHEL Product and Program Management 2010-07-19 05:37:46 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 4 Milan Broz 2010-07-19 07:11:19 EDT
I found a lot of empty files is archive and backup dir, this seems to be major problem in lvm archive handling, should be addressed in 6.0.

(Not sure if it is regression yet.)
Comment 5 RHEL Product and Program Management 2010-07-19 07:17:45 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 7 Alasdair Kergon 2010-07-28 17:16:27 EDT
1) The code should not be leaving behind empty files.
2) The protocol for selecting the "next" available filename has races.
3) The occasional extra duplicate backup file, or out-of-sequence file is not important.  But missing them out is.
4) Changing the filenames is within scope of a fix: if so, amend the namespace so there's no conflict with any existing filenames.  (One possibility may be to use the seqno in the name - so if the file exists you may assume it contains the same contents;  and renaming over the top of it wouldn't matter as the contents would still be the same.)
Comment 10 Peter Rajnoha 2010-08-30 10:43:15 EDT
Hopefully, simple timestamps will help here (patch posted to lvm-devel for review). This should prevent the problem with renaming. Also, there's a patch for additional "--config" line changes related to backup/archive handling. These changes were not applied, but default (or the value written in lvm.conf) was always used instead.

The problem with blank temporary files - well, I'd need more log information here. Apparently, it seems that the export of the VG metadata failed for some reason. Unfortunately, we don't have logs here (I couldn't reproduce it with the configuration mentioned here in this bug report).
Comment 11 Peter Rajnoha 2010-08-31 10:02:05 EDT
As for the race that occurs - this is because of the "--autobackup n" use, I'm quite sure (looking at the logs). If we use:

  1)   lvcreate --autobackup n ...
       ...
  X)   lvcreate --autobackup n ...
  X+1) vgs/pvs/lvs

The X-th lvcreate won't create any backup/archive files, as requested by the "--autobackup n" switch. Running vgs just after that will check the backup state and will decide that there's newer version of metadata to backup/archive and so it runs all the backup machinery (creating two archive files at once - the first one being copied from current backup dir and the other one is the current metadata state, then new backup file is written as well).

The problem is that vgs is not under an rw VG lock and so we're open to races while deciding about the name of the archive file.

(Similar situation happens when we remove the backup file in /etc/lvm/backup and then we run vgs)
Comment 12 Peter Rajnoha 2010-09-09 09:33:22 EDT
Finally, we decided to use random suffix instead of timestamps for archive file names...

Patches checked in upstream (2.02.74).
Comment 13 Peter Rajnoha 2010-09-09 10:14:41 EDT
(..those empty files are probably the result of insufficient free space to create new archives. Though we can't say 100% because we don't have the logs from time period when this happened but it's the most probable reason - it's reproducible when the disk is full..)
Comment 14 Ayal Baron 2010-09-12 08:48:22 EDT
(In reply to comment #12)
> Finally, we decided to use random suffix instead of timestamps for archive file
> names...
How do you guarantee uniqueness?
Comment 15 Peter Rajnoha 2010-09-13 04:04:38 EDT
(In reply to comment #14)
> How do you guarantee uniqueness?

well, uniqueness depends on the quality of random number generator here (/dev/urandom). We use these values while generating temporary archive file names as well...

Normally, this would not be needed at all since we're normally under a lock when modifying metadata (and then archiving and making backups). But there are those corner cases like the use of "--autobackup n" and subsequent "vgs". So the solution should be adequate (not adding too much complexity that could be the source of other possible and even more severe bugs).
Comment 20 Corey Marthaler 2011-03-18 17:21:51 EDT
I did a sanity check of my cluster nodes and did still see a couple empty files in the archive dir. Is this something to be worried about?

[root@taft-01 archive]# ls -l | wc -l
31355
[root@taft-01 archive]# ls -l | grep ' 0 '


[root@taft-02 archive]# ls -l | wc -l
32392
[root@taft-02 archive]# ls -l | grep ' 0 '
-rw-------. 1 root root     0 Mar  8 14:21 TAFT_00020-1448498064.vg
-rw-------. 1 root root     0 Mar  8 17:32 TAFT_00054-1123469189.vg


[root@taft-03 archive]# ls -l | wc -l
33891
[root@taft-03 archive]# ls -l | grep ' 0 '
-rw-------. 1 root root     0 Mar  8 14:20 TAFT_00036-249318537.vg
-rw-------. 1 root root     0 Mar  8 14:20 TAFT_00037-1570106112.vg


[root@taft-04 archive]# ls -l | wc -l
31234
[root@taft-04 archive]# ls -l | grep ' 0 '
Comment 21 Corey Marthaler 2011-03-18 18:03:37 EDT
Moving back to assigned based on the 0 length files found in comment #20
Comment 22 Peter Rajnoha 2011-03-21 05:46:28 EDT
(In reply to comment #21)
> Moving back to assigned based on the 0 length files found in comment #20

Do you have any logs when this happens. Unfortunately, I was not able to reproduce unless I was running out of disk space. So if that's not the case here, any kind of a log would be very helpful. Thanks.
Comment 24 Alasdair Kergon 2011-03-30 08:05:11 EDT
But the original problem - loads and loads of empty files appearing - is fixed?
This is just a couple of unexplained files left behind now and quite harmless, compared to the many many zero-length files appearing before?
I'd suggest WONTFIX for the remaining problem - it would be a waste of everyone's effort.
Comment 25 Corey Marthaler 2011-03-30 12:01:23 EDT
As a follow-up to comment #24, I'm fine with marking this bug verified since like agk said, it's just a few empty files. I re-ran the lvm regression tests on the latest tree on two different clusters (which results in thousands of archive files being created) and I didn't see any empty ones. The few ones I found in comment #20 may be difficult to reproduce.
Comment 26 Corey Marthaler 2011-04-04 11:51:23 EDT
I once again ran the standard lvm REG tests on the latest tree/kernel (2.6.32-128.el6.x86_64) and did not see any 0 length files. Marking this verified.

2.6.32-128.el6.x86_64

lvm2-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-libs-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
lvm2-cluster-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
udev-147-2.35.el6    BUILT: Wed Mar 30 07:32:05 CDT 2011
device-mapper-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
device-mapper-event-libs-1.02.62-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
cmirror-2.02.83-3.el6    BUILT: Fri Mar 18 09:31:10 CDT 2011
Comment 27 errata-xmlrpc 2011-05-19 10:26:19 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0772.html

Note You need to log in before you can comment on or make changes to this bug.