Bug 615907
| Summary: | [lvm2] failure in lvm: rename failed: File exists\n | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> |
| Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> |
| Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.1 | CC: | abaron, agk, antillon.maurizio, borgan, coughlan, danken, dwysocha, hateya, heinzm, jbrassow, joe.thornber, mbroz, mgoldboi, prockai, syeghiay, yeylon, ykaul |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.82-1.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-05-19 14:26:19 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
and the error itself:
'/usr/bin/sudo -n /sbin/lvs --config " devices { preferred_names = [\\"^/d
ev/mapper/\\"] write_cache_state=0 filter = [ \\"a%/dev/mapper/14f504e46494c4500354c376e50522d696430312d664e3962|/dev/mapper/14f504e46494c450074726662556e2d424
132492d79454a42|/dev/mapper/14f504e46494c45007632334e45362d35536d302d50634569%\\", \\"r%.*%\\" ] } backup { retain_min = 50 retain_days = 0 } " --noheadings
-o tags aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f/8022d143-247d-4205-9db9-9c72fa69e2de' (cwd None)
8409080c-1a2b-470a-9b95-d0760ee9f601::DEBUG::2010-07-19 11:20:16,739::misc::111::irs::SUCCESS: <err> = ' /etc/lvm/archive/.lvm_silver-vdse.qa.lab.tlv.redhat.com_6006_975457195: rename to /etc/lvm/archive/aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f_04386.vg failed: File exists\n /etc/lvm/archive/.lvm_silver-vdse.qa.lab.tlv.redhat.com_6006_1396122488: rename to /etc/lvm/archive/aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f_04388.vg failed: File exists\n'; <rc> = 0
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** I found a lot of empty files is archive and backup dir, this seems to be major problem in lvm archive handling, should be addressed in 6.0. (Not sure if it is regression yet.) This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** 1) The code should not be leaving behind empty files. 2) The protocol for selecting the "next" available filename has races. 3) The occasional extra duplicate backup file, or out-of-sequence file is not important. But missing them out is. 4) Changing the filenames is within scope of a fix: if so, amend the namespace so there's no conflict with any existing filenames. (One possibility may be to use the seqno in the name - so if the file exists you may assume it contains the same contents; and renaming over the top of it wouldn't matter as the contents would still be the same.) Hopefully, simple timestamps will help here (patch posted to lvm-devel for review). This should prevent the problem with renaming. Also, there's a patch for additional "--config" line changes related to backup/archive handling. These changes were not applied, but default (or the value written in lvm.conf) was always used instead. The problem with blank temporary files - well, I'd need more log information here. Apparently, it seems that the export of the VG metadata failed for some reason. Unfortunately, we don't have logs here (I couldn't reproduce it with the configuration mentioned here in this bug report). As for the race that occurs - this is because of the "--autobackup n" use, I'm quite sure (looking at the logs). If we use:
1) lvcreate --autobackup n ...
...
X) lvcreate --autobackup n ...
X+1) vgs/pvs/lvs
The X-th lvcreate won't create any backup/archive files, as requested by the "--autobackup n" switch. Running vgs just after that will check the backup state and will decide that there's newer version of metadata to backup/archive and so it runs all the backup machinery (creating two archive files at once - the first one being copied from current backup dir and the other one is the current metadata state, then new backup file is written as well).
The problem is that vgs is not under an rw VG lock and so we're open to races while deciding about the name of the archive file.
(Similar situation happens when we remove the backup file in /etc/lvm/backup and then we run vgs)
Finally, we decided to use random suffix instead of timestamps for archive file names... Patches checked in upstream (2.02.74). (..those empty files are probably the result of insufficient free space to create new archives. Though we can't say 100% because we don't have the logs from time period when this happened but it's the most probable reason - it's reproducible when the disk is full..) (In reply to comment #12) > Finally, we decided to use random suffix instead of timestamps for archive file > names... How do you guarantee uniqueness? (In reply to comment #14) > How do you guarantee uniqueness? well, uniqueness depends on the quality of random number generator here (/dev/urandom). We use these values while generating temporary archive file names as well... Normally, this would not be needed at all since we're normally under a lock when modifying metadata (and then archiving and making backups). But there are those corner cases like the use of "--autobackup n" and subsequent "vgs". So the solution should be adequate (not adding too much complexity that could be the source of other possible and even more severe bugs). I did a sanity check of my cluster nodes and did still see a couple empty files in the archive dir. Is this something to be worried about? [root@taft-01 archive]# ls -l | wc -l 31355 [root@taft-01 archive]# ls -l | grep ' 0 ' [root@taft-02 archive]# ls -l | wc -l 32392 [root@taft-02 archive]# ls -l | grep ' 0 ' -rw-------. 1 root root 0 Mar 8 14:21 TAFT_00020-1448498064.vg -rw-------. 1 root root 0 Mar 8 17:32 TAFT_00054-1123469189.vg [root@taft-03 archive]# ls -l | wc -l 33891 [root@taft-03 archive]# ls -l | grep ' 0 ' -rw-------. 1 root root 0 Mar 8 14:20 TAFT_00036-249318537.vg -rw-------. 1 root root 0 Mar 8 14:20 TAFT_00037-1570106112.vg [root@taft-04 archive]# ls -l | wc -l 31234 [root@taft-04 archive]# ls -l | grep ' 0 ' Moving back to assigned based on the 0 length files found in comment #20 (In reply to comment #21) > Moving back to assigned based on the 0 length files found in comment #20 Do you have any logs when this happens. Unfortunately, I was not able to reproduce unless I was running out of disk space. So if that's not the case here, any kind of a log would be very helpful. Thanks. But the original problem - loads and loads of empty files appearing - is fixed? This is just a couple of unexplained files left behind now and quite harmless, compared to the many many zero-length files appearing before? I'd suggest WONTFIX for the remaining problem - it would be a waste of everyone's effort. As a follow-up to comment #24, I'm fine with marking this bug verified since like agk said, it's just a few empty files. I re-ran the lvm regression tests on the latest tree on two different clusters (which results in thousands of archive files being created) and I didn't see any empty ones. The few ones I found in comment #20 may be difficult to reproduce. I once again ran the standard lvm REG tests on the latest tree/kernel (2.6.32-128.el6.x86_64) and did not see any 0 length files. Marking this verified. 2.6.32-128.el6.x86_64 lvm2-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 lvm2-libs-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 lvm2-cluster-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 udev-147-2.35.el6 BUILT: Wed Mar 30 07:32:05 CDT 2011 device-mapper-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-libs-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-event-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 device-mapper-event-libs-1.02.62-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 cmirror-2.02.83-3.el6 BUILT: Fri Mar 18 09:31:10 CDT 2011 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0772.html |
Description of problem: while creating vm pool (lvcreate commands that create vms) I see lots of errors in vdsm log says lvm failed to rename vg as it already exists. attached some of the log: '/usr/bin/sudo -n /sbin/lvcreate --config " devices { preferred_names = [\ \"^/dev/mapper/\\"] write_cache_state=0 filter = [ \\"a%/dev/mapper/14f504e46494c4500354c376e50522d696430312d664e3962|/dev/mapper/14f504e46494c450074726662556e 2d424132492d79454a42|/dev/mapper/14f504e46494c45007632334e45362d35536d302d50634569%\\", \\"r%.*%\\" ] } backup { retain_min = 50 retain_days = 0 } " --autob ackup n --contiguous n --size 1024M --name 49d88b3f-7900-4fea-8661-06af7447f4ae aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f' (cwd None) '/usr/bin/sudo -n /sbin/vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] write_cache_state=0 filter = [ \\"a%/dev/mapper/14f504e46494c4500354c376e50522d696430312d664e3962|/dev/mapper/14f504e46494c450074726662556e2d424132492d79454a42|/dev/mapper/14f504e46494c45007632334e45362d35536d302d50634569%\\", \\"r%.*%\\" ] } backup { retain_min = 50 retain_days = 0 } " --noheadings -o pv_name aaac4a9b-ae1f-4e4b-9c71-d25eb10bc83f' (cwd None) lvm dump of system state before and after the error can be found under: /net/orion.tlv.redhat.com/export/public/qe-integration/logs This issue is reproducible in my environment when I crate new vms Version-Release number of selected component (if applicable): lvm2-2.02.69-2.el6.x86_64 vdsm-4.9-10.el6.x86_64 create several vms (lvs).