Bug 1276718

Summary:	ceph df displays incorrect usage statistics
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Narendra Trivedi <natrived>
Component:	RADOS	Assignee:	Samuel Just <sjust>
Status:	CLOSED NOTABUG	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	1.2.3	CC:	ceph-eng-bugs, dzafman, flucifre, kchai, natrived
Target Milestone:	rc
Target Release:	1.3.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-07-18 13:00:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Narendra Trivedi 2015-10-30 15:59:43 UTC

Description of problem:

10 volumes each 10TB were created in an RBD pool. The volumes were filled with unique data (non-sparse) to add used data to the cluster that shows as only 6.73% being used. After writing  65TB of non-sparse data to the cluster (total capacity 1176TB) % RAW USED shoots to only 6.74% which is impossible. The RAW USED shows 81165G. Before filing up the cluster, it was 81107G. Hence, the RAW USED data increased only by 58G as per ceph df and rados df!   

Version-Release number of selected component (if applicable):
1.2.3

How reproducible:
Reproducible every time 

Steps to Reproduce:
1. Create some volumes in a pool with the intention of filling up the pool to more than 50% 
2. After attaching it to some filesystem, execute the following command on a Linux system inside the mount point (say /mnt/vdb where /dev/sdb is the volume):

for i in {1..10}; do ruby -e 'a=STDIN.readlines;100000000.times do;b=[];1000.times do; b << a[rand(a.size)].chomp end; puts b.join(" "); end' < /usr/share/dict/words > file$i.txt; done
 
3. Repeat for all the volumes till the total usage shoots to > 50%

Actual results:

The total non-sparse data written to the pool is as per the df -h : 

1) [root@pavana-vm vdb]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        47G  1.3G   44G   3% /
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           3.9G   17M  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vdc        985G  985G     0 100% /mnt/vdc
/dev/vdb        985G  985G     0 100% /mnt/vdb
/dev/vdd        985G  478G  458G  52% /mnt/vdd
/dev/vde        985G  985G     0 100% /mnt/vde
/dev/vdf        985G  932G  3.2G 100% /mnt/vdf
/dev/vdg        985G  476G  459G  51% /mnt/vdg
/dev/vdh        985G  474G  462G  51% /mnt/vdh
/dev/vdi        985G  392G  543G  42% /mnt/vdi
/dev/vdj        985G  377G  559G  41% /mnt/vdj
/dev/vdk        985G  396G  539G  43% /mnt/vdk


2) df -h | grep mnt | grep vd | awk '{print $3}' | cut -d"G" -f1 | awk '{sum += $1} END {print sum}'
6511

3) Before writing 65TB of data to the alln01-a-css-cinder-volume-1 pool, the ceph df shows this: 

POOLS:
    NAME                      ID     USED       %USED     MAX AVAIL     OBJECTS


alln01-a-css-cinder-volume-1   43    23142G      1.92         358T      7663661

After writing 65TB, below is the output of ceph df for the pool: 

alln01-a-css-cinder-volume-1    43   23144G      1.92          358T     7664486

Expected results:

The pool USED field in ceph df output of the pool alln01-a-css-cinder-volume-1 should have increased by 65TB. 


Additional info:
[Integration]root@css-a-ceph1-001:~# ceph -v
ceph version 0.80.8.5 (4ec83c2359fe63663ae28befbcaa65e755bcd462)

Comment 2 Samuel Just 2016-06-15 15:09:31 UTC

2) df -h | grep mnt | grep vd | awk '{print $3}' | cut -d"G" -f1 | awk '{sum += $1} END {print sum}'
6511

That looks like 6.5 TB to me, right?  Did these volumes exist before you started this test?