Bug 709318

Summary: conflicting file hashes lead to data corruption
Product: [Retired] Pulp Reporter: Daniel Mach <dmach>
Component: z_otherAssignee: Pradeep Kilambi <pkilambi>
Status: CLOSED CURRENTRELEASE QA Contact: Preethi Thomas <pthomas>
Severity: urgent Docs Contact:
Priority: urgent    
Version: unspecifiedCC: dgao, dgregor
Target Milestone: ---Keywords: Triaged
Target Release: Sprint 24   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-16 14:19:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 563609, 647488    
Attachments:
Description Flags
files to reproduce the issue none

Description Daniel Mach 2011-05-31 11:28:46 UTC
Created attachment 501974 [details]
files to reproduce the issue

Files are staged into following directory structure:
<pulp_root>/files/<sha256[:3]>/<file_name>

Only 3 characters of a hash are used which gives 16^3 (4k) possible combinations.
It should be fine for most files, but when considering common file names like README, LICENSE etc., it's easy to get a conflict.


I was able to reproduce this issue:
$ pulp-admin content upload --repoid=<REPO> file1/reproducer
$ sha256sum /var/lib/pulp/files/963/reproducer 
963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a  /var/lib/pulp/files/963/reproducer

$ pulp-admin content upload --repoid=<REPO> file2/reproducer
$ sha256sum /var/lib/pulp/files/963/reproducer 
96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26  /var/lib/pulp/files/963/reproducer

The file is overwritten *WITHOUT ANY WARNING*.
Database contains both files but there's only one file on disk.


BTW, packages shouldn't conflict, because we usually release only several copies of a RPM with the same name - signed with different keys. These files are the identical except several bytes and should have complete different hashes due to hash nature.

Comment 1 Daniel Mach 2011-06-01 08:31:19 UTC
I forgot one thing:
If full hashes are used in paths, it will be probably an incompatible change.
I'd use this opportunity to propose new path schema.

this would be well searchable:
<pulp_root>/files/<file_name[:3]>/<file_name>/<sha256>

and this is more like current paths:
<pulp_root>/files/<sha256[:3]>/<sha256>/<file_name>

I prefer the first option due to better manual searching.

Comment 2 Pradeep Kilambi 2011-06-02 19:39:31 UTC
commit bfb294c0b679d4ec43d261dc2d77cbeeca8d23c1
Author: Pradeep Kilambi <pkilambi>
Date:   Thu Jun 2 15:39:01 2011 -0400


$ sudo pulp-admin content upload --repoid=test test/file1/reproducer  -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [test]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.
[pkilambi@prad ~]$ sudo pulp-admin content upload --repoid=test test/file2/reproducer  -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [test]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.


$ ls -l /var/lib/pulp/files/rep/reproducer/963*/var/lib/pulp/files/rep/reproducer/96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26:
total 4
-rw-r--r--. 1 apache apache 31 Jun  2 15:34 reproducer

/var/lib/pulp/files/rep/reproducer/963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a:
total 4
-rw-r--r--. 1 apache apache 9 Jun  2 15:34 reproducer

Comment 3 Pradeep Kilambi 2011-06-02 19:44:42 UTC
Just a note, if you already have files pushed, you might have to delete and repush them to get the new format. Other wise, if you prefer, lemme know and I can put together a migration script that updates the file system and db paths.

Comment 4 Jeff Ortel 2011-06-08 23:58:27 UTC
build: 0.188

Comment 5 dgao 2011-06-17 19:15:16 UTC
[root@pulp-qe ~]# pulp-admin content upload --repoid=bar file1/reproducer -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [bar]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.
[root@pulp-qe ~]# pulp-admin content upload --repoid=bar file2/reproducer -v
* Starting Content Upload

* Performing Content Uploads to Pulp server
Successfully uploaded [reproducer] to server

* Performing Repo Associations 
Content association Complete for Repo [bar]: 
 Packages: 
None 
 
 Files: 
reproducer

* Content Upload complete.
[root@pulp-qe ~]# ls -l /var/lib/pulp/repos/bar/
ABCD        ABCE        MANIFEST    repodata/   reproducer  
[root@pulp-qe ~]# ls -l /var/lib/pulp/files/rep/reproducer/963
96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26/ 963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a/
[root@pulp-qe ~]# ls -l /var/lib/pulp/files/rep/reproducer/963*
/var/lib/pulp/files/rep/reproducer/96364261a9e076d43a21d6ff17fa0694eaf66f0e6f577ef59847202a1708fb26:
total 4
-rw-r--r--. 1 apache apache 31 Jun 17 15:14 reproducer

/var/lib/pulp/files/rep/reproducer/963b29f07b9c24a234123676dfad905fd61d93e9c8bcca1002625966535ff96a:
total 4
-rw-r--r--. 1 apache apache 9 Jun 17 15:14 reproducer
[root@pulp-qe ~]#

Comment 6 Preethi Thomas 2011-08-16 14:19:58 UTC
Closing with Community Release 15

pulp-0.0.223-4.