Bug 709318 - conflicting file hashes lead to data corruption
Summary: conflicting file hashes lead to data corruption
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: Sprint 24
Assignee: Pradeep Kilambi
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: 563609 verified-to-close
TreeView+ depends on / blocked
 
Reported: 2011-05-31 11:28 UTC by Daniel Mach
Modified: 2011-08-16 14:19 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-16 14:19:58 UTC
Embargoed:


Attachments (Terms of Use)
files to reproduce the issue (187 bytes, application/x-gzip)
2011-05-31 11:28 UTC, Daniel Mach
no flags Details

Description Daniel Mach 2011-05-31 11:28:46 UTC
Created attachment 501974 [details]
files to reproduce the issue

Files are staged into following directory structure:
<pulp_root>/files/<sha256[:3]>/<file_name>

Only 3 characters of a hash are used which gives 16^3 (4k) possible combinations.
It should be fine for most files, but when considering common file names like README, LICENSE etc., it's easy to get a conflict.


I was able to reproduce this issue:
$ pulp-admin content upload --repoid=<REPO> file1/reproducer
$ sha256sum /var/lib/pulp/files/963/reproducer 
963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a  /var/lib/pulp/files/963/reproducer

$ pulp-admin content upload --repoid=<REPO> file2/reproducer
$ sha256sum /var/lib/pulp/files/963/reproducer 
96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26  /var/lib/pulp/files/963/reproducer

The file is overwritten *WITHOUT ANY WARNING*.
Database contains both files but there's only one file on disk.


BTW, packages shouldn't conflict, because we usually release only several copies of a RPM with the same name - signed with different keys. These files are the identical except several bytes and should have complete different hashes due to hash nature.

Comment 1 Daniel Mach 2011-06-01 08:31:19 UTC
I forgot one thing:
If full hashes are used in paths, it will be probably an incompatible change.
I'd use this opportunity to propose new path schema.

this would be well searchable:
<pulp_root>/files/<file_name[:3]>/<file_name>/<sha256>

and this is more like current paths:
<pulp_root>/files/<sha256[:3]>/<sha256>/<file_name>

I prefer the first option due to better manual searching.

Comment 2 Pradeep Kilambi 2011-06-02 19:39:31 UTC
commit bfb294c0b679d4ec43d261dc2d77cbeeca8d23c1
Author: Pradeep Kilambi <pkilambi>
Date:   Thu Jun 2 15:39:01 2011 -0400


$ sudo pulp-admin content upload --repoid=test test/file1/reproducer  -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [test]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.
[pkilambi@prad ~]$ sudo pulp-admin content upload --repoid=test test/file2/reproducer  -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [test]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.


$ ls -l /var/lib/pulp/files/rep/reproducer/963*/var/lib/pulp/files/rep/reproducer/96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26:
total 4
-rw-r--r--. 1 apache apache 31 Jun  2 15:34 reproducer

/var/lib/pulp/files/rep/reproducer/963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a:
total 4
-rw-r--r--. 1 apache apache 9 Jun  2 15:34 reproducer

Comment 3 Pradeep Kilambi 2011-06-02 19:44:42 UTC
Just a note, if you already have files pushed, you might have to delete and repush them to get the new format. Other wise, if you prefer, lemme know and I can put together a migration script that updates the file system and db paths.

Comment 4 Jeff Ortel 2011-06-08 23:58:27 UTC
build: 0.188

Comment 5 dgao 2011-06-17 19:15:16 UTC
[root@pulp-qe ~]# pulp-admin content upload --repoid=bar file1/reproducer -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [bar]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.
[root@pulp-qe ~]# pulp-admin content upload --repoid=bar file2/reproducer -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [bar]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.
[root@pulp-qe ~]# ls -l /var/lib/pulp/repos/bar/
ABCD        ABCE        MANIFEST    repodata/   reproducer  
[root@pulp-qe ~]# ls -l /var/lib/pulp/files/rep/reproducer/963
96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26/ 963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a/
[root@pulp-qe ~]# ls -l /var/lib/pulp/files/rep/reproducer/963*
/var/lib/pulp/files/rep/reproducer/96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26:
total 4
-rw-r--r--. 1 apache apache 31 Jun 17 15:14 reproducer

/var/lib/pulp/files/rep/reproducer/963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a:
total 4
-rw-r--r--. 1 apache apache 9 Jun 17 15:14 reproducer
[root@pulp-qe ~]#

Comment 6 Preethi Thomas 2011-08-16 14:19:58 UTC
Closing with Community Release 15

pulp-0.0.223-4.


Note You need to log in before you can comment on or make changes to this bug.