Bug 707559 - mongodb files occupying large disk space (around 6G)
Summary: mongodb files occupying large disk space (around 6G)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Update Infrastructure for Cloud Providers
Classification: Red Hat
Component: Tools
Version: 2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Jason Connor
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks: rhui-20
TreeView+ depends on / blocked
 
Reported: 2011-05-25 11:57 UTC by Sachin Ghai
Modified: 2014-03-31 01:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-31 12:55:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sachin Ghai 2011-05-25 11:57:36 UTC
Description of problem:
Upgraded the RHUI2.0 setup as per following builds:

rhui-tools: 2.0.22 to 2.0.25
pulp: 0.171 to 0.179

Version-Release number of selected component (if applicable):
Upgrade setup:

rhui-tools: 2.0.22 to 2.0.25
pulp: 0.171 to 0.179

How reproducible:


Steps to Reproduce:
1. Upgraded the RHUI 2.0 setup 

  
Actual results:
[root@dhcp193-157 mongodb]# ll -h
total 6.0G
-rwxr-xr-x. 1 mongodb mongodb    6 May 25 13:38 mongod.lock
-rw-------. 1 mongodb mongodb  64M May 25 16:32 pulp_database.0
-rw-------. 1 mongodb mongodb 128M May 25 16:24 pulp_database.1
-rw-------. 1 mongodb mongodb 256M May 25 14:00 pulp_database.2
-rw-------. 1 mongodb mongodb 512M May 25 14:01 pulp_database.3
-rw-------. 1 mongodb mongodb 1.0G May 25 14:01 pulp_database.4
-rw-------. 1 mongodb mongodb 2.0G May 25 14:01 pulp_database.5
-rw-------. 1 mongodb mongodb 2.0G May 25 11:59 pulp_database.6
-rw-------. 1 mongodb mongodb  16M May 25 14:00 pulp_database.ns

The size of logs files are of 2G

Expected results:
The log file size shouldn't be od this much size

Additional info:[root@dhcp193-157 ~]# tail -f /var/log/pulp/db.log 
migration to data model version 13 started
updating url source type for repo rhui-1.2-5Server-x86_64
updating url source type for repo rhui-1.2-5Server-i386
updating url source type for repo rhui-1.1-5Server-x86_64
updating url source type for repo rhui-1.1-5Server-i386
updating url source type for repo rhel-server-6-updates-6Server-x86_64
updating url source type for repo rhel-server-6-releases-6Server-x86_64
migration to data model version 13 complete
migration to data model version 14 started
migration to data model version 14 complete

[root@dhcp193-157 ~]#tail -f /var/log/pulp/pulp.log
2011-05-25 14:32:00,022 [INFO][Thread-39] sync() @ repo_sync.py:438 - Limiting download speed to 0 KB/sec per thread. [15] threads will be used
2011-05-25 14:32:00,025 [INFO][Thread-38] run() @ task.py:352 - Task id:43a96db3-86a6-11e0-b43a-525400da69d0, method_name:_sync Exception: [Errno 28] No space left on device: '/tmp/tmpjTQHvq'
2011-05-25 14:32:00,029 [INFO][Thread-39] run() @ task.py:352 - Task id:43a91238-86a6-11e0-b438-525400da69d0, method_name:_sync Exception: [Errno 28] No space left on device: '/var/lib/pulp//repos/content/dist/rhel/rhui/server/5Server/x86_64/rhui/1.1/os//repodata.new'
2011-05-25 14:32:00,045 [ERROR][Thread-38] failed() @ task.py:393 - Task id:43a96db3-86a6-11e0-b43a-525400da69d0, method_name:_sync:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 336, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo.py", line 1574, in _sync
    threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 116, in sync
    progress_callback, max_speed, threads)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/repo_sync.py", line 453, in sync
    report = self.yum_repo_grinder.fetchYumRepo(store_path, callback=progress_callback)
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 391, in fetchYumRepo
    self.yumFetch.setupRepo()
  File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 65, in setupRepo
    self.repo.basecachedir = self.makeTempDir()
  File "/usr/lib/python2.6/site-packages/grinder/BaseFetch.py", line 91, in makeTempDir
    return tempfile.mkdtemp()
  File "/usr/lib64/python2.6/tempfile.py", line 318, in mkdtemp
    _os.mkdir(file, 0700)
OSError: [Errno 28] No space left on device: '/tmp/tmpjTQHvq'

2011-05-25 14:32:00,046 [WARNING][Thread-38] schedule() @ task.py:270 - Task 43a96db3-86a6-11e0-b43a-525400da69d0: RepoApi._sync(rhui-1.1-5Server-i386, synchronizer=<pulp.server.api.repo_sync.YumSynchronizer object at 0x7fbc30b9a410>) missed 1 scheduled runs
2011-05-25 14:32:00,052 [ERROR][Thread-39] failed() @ task.py:393 - Task id:43a91238-86a6-11e0-b438-525400da69d0, method_name:_sync:
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 336, in run
    result = self.callable(

Comment 1 Jay Dobies 2011-05-25 12:50:43 UTC
Can you reproduce this regularly? I wonder if this is a side effect of hardcore testing.

Comment 2 Sachin Ghai 2011-05-25 13:16:13 UTC
I noticed this today only and specially after updating the setup. 

Moreover the timestamps of these files shows that all files are of same date(i.e. today's)


[root@dhcp193-157 mongodb]# ll -h
total 6.0G
-rwxr-xr-x. 1 mongodb mongodb    6 May 25 13:38 mongod.lock
-rw-------. 1 mongodb mongodb  64M May 25 16:32 pulp_database.0
-rw-------. 1 mongodb mongodb 128M May 25 16:24 pulp_database.1
-rw-------. 1 mongodb mongodb 256M May 25 14:00 pulp_database.2
-rw-------. 1 mongodb mongodb 512M May 25 14:01 pulp_database.3
-rw-------. 1 mongodb mongodb 1.0G May 25 14:01 pulp_database.4
-rw-------. 1 mongodb mongodb 2.0G May 25 14:01 pulp_database.5
-rw-------. 1 mongodb mongodb 2.0G May 25 11:59 pulp_database.6
-rw-------. 1 mongodb mongodb  16M May 25 14:00 pulp_database.ns

I'll clean the few logs files and will try to reproduce it again.

Comment 3 wes hayutin 2011-06-06 21:15:20 UTC
added to rhui-20 tracker

Comment 4 Sachin Ghai 2011-06-08 11:58:13 UTC
Today I noticed the mongodb size; its around 4GB on my new setup. I configured this setup on last friday ( 4 days back).

[root@dhcp193-79 mongodb]# ll -h
total 4.0G
-rwxr-xr-x. 1 mongodb mongodb    6 Jun  8 16:30 mongod.lock
-rw-------. 1 mongodb mongodb  64M Jun  8 17:23 pulp_database.0
-rw-------. 1 mongodb mongodb 128M Jun  8 16:58 pulp_database.1
-rw-------. 1 mongodb mongodb 256M Jun  8 16:58 pulp_database.2
-rw-------. 1 mongodb mongodb 512M Jun  8 17:23 pulp_database.3
-rw-------. 1 mongodb mongodb 1.0G Jun  8 16:58 pulp_database.4
-rw-------. 1 mongodb mongodb 2.0G Jun  7 13:44 pulp_database.5
-rw-------. 1 mongodb mongodb  16M Jun  8 17:23 pulp_database.ns

Comment 5 John Matthews 2011-07-14 13:41:44 UTC
Sachin would you provide me with login information so I can examine a mongo instance where you see this behavior?

Additionally, when you see this behavior if you would include extra information on the database usage it would be helpful.  Running the below and including the output in a BZ would help us understand which collection is taking up a lot of room.

1) Run 'mongo' CLI
2) Execute 'use pulp_database`
3) Execute 'db.getCollectionNames().forEach(function(c) {s = db[c].stats(); printjson(s) });'
Include the output from the above

Additionally run
4) Execute 'db.stats()'  and include that output
5) 'exit;'

Comment 6 John Matthews 2011-07-14 17:41:12 UTC
Sachin,

I checked in a script to pulp git (pulp/playpen/mongodb/display_coll_stats.js).
Please run this script and attach the output.  It will sort the collections based on storage size usage.

To run the script, you will need 2 files:  
 collect_db_file_usage_stats.sh and display_coll_stats.js

Run the sh script: collect_db_file_usage_stats.sh and save the output.

 
Links to the 2 files needed are here:
http://git.fedorahosted.org/git/?p=pulp.git;a=blob;f=playpen/mongodb/collect_db_file_usage_stats.sh;h=d6bb58c6e17edc016a5d106438f04d1e014caf7e;hb=HEAD

http://git.fedorahosted.org/git/?p=pulp.git;a=blob;f=playpen/mongodb/display_coll_stats.js;h=cd5b0edcc18c98826ae80a2d75089c7649e5c67e;hb=HEAD

An example of what the output should look like is here:
https://fedorahosted.org/pulp/wiki/DataStorage/SizeAnalysis

Comment 7 Jason Connor 2011-07-14 21:06:26 UTC
As of commit 7dee46cc3b5ce12a7ddb7ba778d15093a68a8033,
pulp now has a new configuration option:

[auditing]
audit_events: false

This toggles all of auditing. The default setting is false or "off". Setting the audit_events option to true will turn auditing, both to the file and to the db on.

The server will need to be restarted for the change to take effect.

Comment 8 Jay Dobies 2011-07-14 21:10:43 UTC
Fixed in Pulp .208 and higher.

Comment 9 Sachin Ghai 2011-07-27 05:30:52 UTC
Verified with pulp 0.211.

Seems that issue has been fixed. I started a sync of multiple repos. And sync ran successfully, It has downloaded more than 20,000 packages from CDN. 

Following output list the count of downloaded rpms:

-------------------------------------
Enter value (1-6) to toggle selection, 'c' to confirm selections, or '?' for more commands: c


Name:               Red Hat Enterprise Linux Server (RPMs) (5Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server/5Server/x86_64/os
Package Count:      11868
Last Sync:          07-27-2011 09:25
Next Sync:          07-27-2011 12:35

Name:               Red Hat Enterprise Linux Server 6 Optional Updates (RPMs) (6Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os
Package Count:      3538
Last Sync:          07-27-2011 06:43
Next Sync:          07-27-2011 12:12

Name:               Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os
Package Count:      3391
Last Sync:          07-27-2011 09:19
Next Sync:          07-27-2011 12:12

Name:               Red Hat Enterprise Linux Server 6 Updates (RPMs) (6Server-x86_64)
Type:               Red Hat
Relative Path:      content/dist/rhel/rhui/server-6/updates/6Server/x86_64/os
Package Count:      5164
Last Sync:          07-27-2011 06:44
Next Sync:          07-27-2011 12:13

---------------------------------------------------

[root@dhcp201-127 mongodb]# ll -h
total 465M
-rwxr-xr-x. 1 mongodb mongodb    5 Jul 26 16:45 mongod.lock
-rw-------. 1 mongodb mongodb  64M Jul 27 09:25 pulp_database.0
-rw-------. 1 mongodb mongodb 128M Jul 27 09:25 pulp_database.1
-rw-------. 1 mongodb mongodb 256M Jul 27 02:45 pulp_database.2
-rw-------. 1 mongodb mongodb  16M Jul 27 09:25 pulp_database.ns
drwxr-xr-x. 2 mongodb mongodb 4.0K Jul 26 16:48 _tmp
[root@dhcp201-127 mongodb]# rpm -qa | grep pulp
pulp-0.0.211-1.el6.noarch
pulp-client-0.0.211-1.el6.noarch
pulp-common-0.0.211-1.el6.noarch
[root@dhcp201-127 mongodb]# 


Here db size is 465M, even after syncing multiple repos. Since this defect is not reproducible in the new build so moving this verified.

Comment 10 wes hayutin 2011-08-01 21:39:54 UTC
moving to release pending

Comment 11 wes hayutin 2012-05-31 12:55:18 UTC
closing out, product released


Note You need to log in before you can comment on or make changes to this bug.