Bug 707559

Summary: mongodb files occupying large disk space (around 6G)
Product: Red Hat Update Infrastructure for Cloud Providers Reporter: Sachin Ghai <sghai>
Component: ToolsAssignee: Jason Connor <jconnor>
Status: CLOSED CURRENTRELEASE QA Contact: wes hayutin <whayutin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: jason.dobies, jconnor, jmatthew, mmccune, tsanders
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-31 12:55:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 711223    

Description Sachin Ghai 2011-05-25 11:57:36 UTC
Description of problem:
Upgraded the RHUI2.0 setup as per following builds:

rhui-tools: 2.0.22 to 2.0.25
pulp: 0.171 to 0.179

Version-Release number of selected component (if applicable):
Upgrade setup:

rhui-tools: 2.0.22 to 2.0.25
pulp: 0.171 to 0.179

How reproducible:


Steps to Reproduce:
1. Upgraded the RHUI 2.0 setup 

  
Actual results:
[root@dhcp193-157 mongodb]# ll -h
total 6.0G
-rwxr-xr-x. 1 mongodb mongodb    6 May 25 13:38 mongod.lock
-rw-------. 1 mongodb mongodb  64M May 25 16:32 pulp_database.0
-rw-------. 1 mongodb mongodb 128M May 25 16:24 pulp_database.1
-rw-------. 1 mongodb mongodb 256M May 25 14:00 pulp_database.2
-rw-------. 1 mongodb mongodb 512M May 25 14:01 pulp_database.3
-rw-------. 1 mongodb mongodb 1.0G May 25 14:01 pulp_database.4
-rw-------. 1 mongodb mongodb 2.0G May 25 14:01 pulp_database.5
-rw-------. 1 mongodb mongodb 2.0G May 25 11:59 pulp_database.6
-rw-------. 1 mongodb mongodb  16M May 25 14:00 pulp_database.ns

The size of logs files are of 2G

Expected results:
The log file size shouldn't be od this much size

Additional info:[root@dhcp193-157 ~]# tail -f /var/log/pulp/db.log 
migration to data model version 13 started
updating url source type for repo rhui-1.2-5Server-x86_64
updating url source type for repo rhui-1.2-5Server-i386
updating url source type for repo rhui-1.1-5Server-x86_64
updating url source type for repo rhui-1.1-5Server-i386
updating url source type for repo rhel-server-6-updates-6Server-x86_64
updating url source type for repo rhel-server-6-releases-6Server-x86_64
migration to data model version 13 complete
migration to data model version 14 started
migration to data model version 14 complete

[root@dhcp193-157 ~]#tail -f /var/log/pulp/pulp.log
2011-05-25 14:32:00,022 [INFO][Thread-39] sync() @ repo_sync.py:438 - Limiting download speed to 0 KB/sec per thread. [15] threads will be used
2011-05-25 14:32:00,025 [INFO][Thread-38] run() @ task.py:352 - Task id:43a96db3-86a6-11e0-b43a-525400da69d0, method_name:_sync Exception: [Errno 28] No space left on device: '/tmp/tmpjTQHvq'
2011-05-25 14:32:00,029 [INFO][Thread-39] run() @ task.py:352 - Task id:43a91238-86a6-11e0-b438-525400da69d0, method_name:_sync Exception: [Errno 28] No space left on device: '/var/lib/pulp//repos/content/dist/rhel/rhui/server/5Server/x86_64/rhui/1.1/os//repodata.new'
2011-05-25 14:32:00,045 [ERROR][Thread-38] failed() @ task.py:393 - Task id:43a96db3-86a6-11e0-b43a-525400da69d0, method_name:_sync:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 336, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo.py", line 1574, in _sync
    threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 116, in sync
    progress_callback, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 453, in sync
    report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 391, in fetchYumRepo
    self.yumFetch.setupRepo()
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 65, in setupRepo
    self.repo.basecachedir = self.makeTempDir()
  File "/usr/lib/python2.6/site-packages/grinder/BaseFetch.py", line 91, in makeTempDir
    return tempfile.mkdtemp()
  File "/usr/lib64/python2.6/tempfile.py", line 318, in mkdtemp
    _os.mkdir(file, 0700)
OSError: [Errno 28] No space left on device: '/tmp/tmpjTQHvq'

2011-05-25 14:32:00,046 [WARNING][Thread-38] schedule() @ task.py:270 - Task 43a96db3-86a6-11e0-b43a-525400da69d0: RepoApi._sync(rhui-1.1-5Server-i386, synchronizer=<pulp.server.api.repo_sync.YumSynchronizer object at 0x7fbc30b9a410>) missed 1 scheduled runs
2011-05-25 14:32:00,052 [ERROR][Thread-39] failed() @ task.py:393 - Task id:43a91238-86a6-11e0-b438-525400da69d0, method_name:_sync:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 336, in run
    result = self.callable(

Comment 1 Jay Dobies 2011-05-25 12:50:43 UTC
Can you reproduce this regularly? I wonder if this is a side effect of hardcore testing.

Comment 2 Sachin Ghai 2011-05-25 13:16:13 UTC
I noticed this today only and specially after updating the setup. 

Moreover the timestamps of these files shows that all files are of same date(i.e. today's)


[root@dhcp193-157 mongodb]# ll -h
total 6.0G
-rwxr-xr-x. 1 mongodb mongodb    6 May 25 13:38 mongod.lock
-rw-------. 1 mongodb mongodb  64M May 25 16:32 pulp_database.0
-rw-------. 1 mongodb mongodb 128M May 25 16:24 pulp_database.1
-rw-------. 1 mongodb mongodb 256M May 25 14:00 pulp_database.2
-rw-------. 1 mongodb mongodb 512M May 25 14:01 pulp_database.3
-rw-------. 1 mongodb mongodb 1.0G May 25 14:01 pulp_database.4
-rw-------. 1 mongodb mongodb 2.0G May 25 14:01 pulp_database.5
-rw-------. 1 mongodb mongodb 2.0G May 25 11:59 pulp_database.6
-rw-------. 1 mongodb mongodb  16M May 25 14:00 pulp_database.ns

I'll clean the few logs files and will try to reproduce it again.

Comment 3 wes hayutin 2011-06-06 21:15:20 UTC
added to rhui-20 tracker

Comment 4 Sachin Ghai 2011-06-08 11:58:13 UTC
Today I noticed the mongodb size; its around 4GB on my new setup. I configured this setup on last friday ( 4 days back).

[root@dhcp193-79 mongodb]# ll -h
total 4.0G
-rwxr-xr-x. 1 mongodb mongodb    6 Jun  8 16:30 mongod.lock
-rw-------. 1 mongodb mongodb  64M Jun  8 17:23 pulp_database.0
-rw-------. 1 mongodb mongodb 128M Jun  8 16:58 pulp_database.1
-rw-------. 1 mongodb mongodb 256M Jun  8 16:58 pulp_database.2
-rw-------. 1 mongodb mongodb 512M Jun  8 17:23 pulp_database.3
-rw-------. 1 mongodb mongodb 1.0G Jun  8 16:58 pulp_database.4
-rw-------. 1 mongodb mongodb 2.0G Jun  7 13:44 pulp_database.5
-rw-------. 1 mongodb mongodb  16M Jun  8 17:23 pulp_database.ns

Comment 5 John Matthews 2011-07-14 13:41:44 UTC
Sachin would you provide me with login information so I can examine a mongo instance where you see this behavior?

Additionally, when you see this behavior if you would include extra information on the database usage it would be helpful.  Running the below and including the output in a BZ would help us understand which collection is taking up a lot of room.

1) Run 'mongo' CLI
2) Execute 'use pulp_database`
3) Execute 'db.getCollectionNames().forEach(function(c) {s = db[c].stats(); printjson(s) });'
Include the output from the above

Additionally run
4) Execute 'db.stats()'  and include that output
5) 'exit;'

Comment 6 John Matthews 2011-07-14 17:41:12 UTC
Sachin,

I checked in a script to pulp git (pulp/playpen/mongodb/display_coll_stats.js).
Please run this script and attach the output.  It will sort the collections based on storage size usage.

To run the script, you will need 2 files:  
 collect_db_file_usage_stats.sh and display_coll_stats.js

Run the sh script: collect_db_file_usage_stats.sh and save the output.

 
Links to the 2 files needed are here:
http://git.fedorahosted.org/git/?p=pulp.git;a=blob;f=playpen/mongodb/collect_db_file_usage_stats.sh;h=d6bb58c6e17edc016a5d106438f04d1e014caf7e;hb=HEAD

http://git.fedorahosted.org/git/?p=pulp.git;a=blob;f=playpen/mongodb/display_coll_stats.js;h=cd5b0edcc18c98826ae80a2d75089c7649e5c67e;hb=HEAD

An example of what the output should look like is here:
https://fedorahosted.org/pulp/wiki/DataStorage/SizeAnalysis

Comment 7 Jason Connor 2011-07-14 21:06:26 UTC
As of commit 7dee46cc3b5ce12a7ddb7ba778d15093a68a8033,
pulp now has a new configuration option:

[auditing]
audit_events: false

This toggles all of auditing. The default setting is false or "off". Setting the audit_events option to true will turn auditing, both to the file and to the db on.

The server will need to be restarted for the change to take effect.

Comment 8 Jay Dobies 2011-07-14 21:10:43 UTC
Fixed in Pulp .208 and higher.

Comment 9 Sachin Ghai 2011-07-27 05:30:52 UTC
Verified with pulp 0.211.

Seems that issue has been fixed. I started a sync of multiple repos. And sync ran successfully, It has downloaded more than 20,000 packages from CDN. 

Following output list the count of downloaded rpms:

-------------------------------------
Enter value (1-6) to toggle selection, 'c' to confirm selections, or '?' for more commands: c


Name:               Red Hat Enterprise Linux Server (RPMs) (5Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server/5Server/x86_64/os
Package Count:      11868
Last Sync:          07-27-2011 09:25
Next Sync:          07-27-2011 12:35

Name:               Red Hat Enterprise Linux Server 6 Optional Updates (RPMs) (6Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os
Package Count:      3538
Last Sync:          07-27-2011 06:43
Next Sync:          07-27-2011 12:12

Name:               Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os
Package Count:      3391
Last Sync:          07-27-2011 09:19
Next Sync:          07-27-2011 12:12

Name:               Red Hat Enterprise Linux Server 6 Updates (RPMs) (6Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server-6/updates/6Server/x86_64/os
Package Count:      5164
Last Sync:          07-27-2011 06:44
Next Sync:          07-27-2011 12:13

---------------------------------------------------

[root@dhcp201-127 mongodb]# ll -h
total 465M
-rwxr-xr-x. 1 mongodb mongodb    5 Jul 26 16:45 mongod.lock
-rw-------. 1 mongodb mongodb  64M Jul 27 09:25 pulp_database.0
-rw-------. 1 mongodb mongodb 128M Jul 27 09:25 pulp_database.1
-rw-------. 1 mongodb mongodb 256M Jul 27 02:45 pulp_database.2
-rw-------. 1 mongodb mongodb  16M Jul 27 09:25 pulp_database.ns
drwxr-xr-x. 2 mongodb mongodb 4.0K Jul 26 16:48 _tmp
[root@dhcp201-127 mongodb]# rpm -qa | grep pulp
pulp-0.0.211-1.el6.noarch
pulp-client-0.0.211-1.el6.noarch
pulp-common-0.0.211-1.el6.noarch
[root@dhcp201-127 mongodb]# 


Here db size is 465M, even after syncing multiple repos. Since this defect is not reproducible in the new build so moving this verified.

Comment 10 wes hayutin 2011-08-01 21:39:54 UTC
moving to release pending

Comment 11 wes hayutin 2012-05-31 12:55:18 UTC
closing out, product released