Bug 1262370 - Race in snapperd while performing background job and status command is requested
Race in snapperd while performing background job and status command is requested
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: snapper (Show other bugs)
7.2
All All
medium Severity medium
: rc
: ---
Assigned To: Ondrej Kozina
Bruno Goncalves
:
Depends On: 1250371
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-11 09:49 EDT by Bruno Goncalves
Modified: 2016-11-04 03:29 EDT (History)
6 users (show)

See Also:
Fixed In Version: snapper-0.2.8-4.el7
Doc Type: Bug Fix
Doc Text:
There was a race while two or more snapper commands were about to mount (or unmount) same filesystem snapshot on same snapper configured filesystem. Usually the race occurred while first command created 'post' type snapshot on a snapper config with allowed background comparison and the second command was trying to access the very same snapshot. Only the LVM2 thin provisioning backend with arbitrary filesystem on top was affected. As an impact of that race occasionally snapper 'status' command initiated right after the other command triggering background comparison failed with following error message: # snapper -c my_config status 1..2 Failure (error.mount_snapshot) With the current fix the mount and umount operations are serialised properly via mutex per each snapper config (configured LVM2 backend).
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 03:29:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
snapperd debug on aarch64 (68.78 KB, text/plain)
2015-09-11 09:51 EDT, Bruno Goncalves
no flags Details

  None (edit)
Description Bruno Goncalves 2015-09-11 09:49:21 EDT
Description of problem:
Failed to create snaphost on RHEL-7.2 Snapshot#2 on aarch64

snapper -c bugtest status <number1>..<number2>
Failure (error.mount_snapshot).

Version-Release number of selected component (if applicable):
snapper-0.1.7-10.el7.aarch64
kernel-4.2.0-0.19.el7.aarch64

How reproducible:
it seems easily reproducible on aarch64

Steps to Reproduce:
1. Create thinp volume
# lvcreate -L5G -T rhel_hp-moonshot-02-c03/pool -V 30G --name thin_lv

2. Create a FS on the device and mount it
# mkfs.xfs -f  /dev/mapper/rhel_hp--moonshot--02--c03-thin_lv

# mkdir /mnt/test

# mount /dev/mapper/rhel_hp--moonshot--02--c03-thin_lv /mnt/test

3. Add new user and create snapper config
# useradd -m dummy_a

# chown dummy_a:dummy_a /mnt/test

# snapper -c bugtest create-config -f "lvm(xfs)" /mnt/test

# su - dummy_a

4. Run the reproducer from bz#1229353
# for i in $(seq 1 800000); do mkdir ./dir_$i; touch ./dir_$i/file_$i; done

# exit

# snapper -c bugtest list

5. Create snapshoty
# snapper -c bugtest create -t pre -p
<number1>

# snapper -c bugtest list

# su - dummy_a

# chmod -R g-w /mnt/test

# exit

# snapper -c bugtest create -t post --pre-num <number1> -p
<number2>

# snapper -c bugtest status <number1>..<number2>
Failure (error.mount_snapshot).
Comment 1 Bruno Goncalves 2015-09-11 09:49:55 EDT
# snapper -c bugtest list
Type   | # | Pre # | Date                            | User | Cleanup  | Description | Userdata
-------+---+-------+---------------------------------+------+----------+-------------+---------
single | 0 |       |                                 | root |          | current     |         
single | 1 |       | Fri 11 Sep 2015 09:01:01 AM EDT | root | timeline | timeline    |         
pre    | 2 |       | Fri 11 Sep 2015 09:40:22 AM EDT | root |          |             |         
post   | 3 | 2     | Fri 11 Sep 2015 09:40:56 AM EDT | root |          |             |         


# snapper -c bugtest status 2..3
Failure (error.mount_snapshot).
Comment 2 Bruno Goncalves 2015-09-11 09:51:41 EDT
Created attachment 1072567 [details]
snapperd debug on aarch64
Comment 6 Ondrej Kozina 2015-09-14 10:34:45 EDT
Sigh...there's a race in snapper daemon.

Background job doesn't hold a lock while mounting snapshot. The "status" thread tests whether snapshot is mounted and gets negative result. While trying to mount the snapshot the second "background" thread already mounted the snapshot and "status" thread obviously fails with -EBUSY.

As a workaround you may add a short sleep between last two commands:

snapper -c bugtest create -t post --pre-num <number1> -p
<number2>

  --- >sleep 1

snapper -c bugtest status <number1>..<number2>
Comment 7 Ondrej Kozina 2015-09-14 10:40:31 EDT
(Only lvm2 backend is affected. Btrfs snapshots stay always mounted)
Comment 8 Ondrej Kozina 2016-03-17 09:14:26 EDT
Patch offered to upstream: https://github.com/openSUSE/snapper/pull/225
Comment 9 Ondrej Kozina 2016-03-23 06:32:34 EDT
got accepted
Comment 10 Mike McCune 2016-03-28 19:14:23 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions
Comment 12 Bruno Goncalves 2016-07-26 08:32:42 EDT
It passed in our tests using snapper-0.2.8-2.el7.
Comment 14 Bruno Goncalves 2016-08-12 02:35:56 EDT
The problem happens sometimes when trying to get snapshot status.

# snapper -c bugtest list
Type   | # | Pre # | Date                         | User | Cleanup | Description | Userdata
-------+---+-------+------------------------------+------+---------+-------------+---------
single | 0 |       |                              | root |         | current     |         
single | 1 |       | Fri 12 Aug 2016 02:33:35 EDT | root |         |             |         
pre    | 2 |       | Fri 12 Aug 2016 02:33:36 EDT | root |         |             |         
post   | 3 | 2     | Fri 12 Aug 2016 02:33:36 EDT | root |         |             |

# snapper -c bugtest status 2..3
Failure (error.mount_snapshot).
Comment 15 Ondrej Kozina 2016-08-12 06:53:32 EDT
Patch posted upstream: https://github.com/openSUSE/snapper/pull/261
Comment 16 Bruno Goncalves 2016-08-16 10:45:56 EDT
Tested snapper-0.2.8-4.el7 and it is working well.
Comment 21 errata-xmlrpc 2016-11-04 03:29:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2481.html

Note You need to log in before you can comment on or make changes to this bug.