Bug 733507

Summary: ODS: raw data collection infrastructure to NoSQL database
Product: Red Hat Enterprise MRG Reporter: Pete MacKinnon <pmackinn>
Component: condor-plumageAssignee: Pete MacKinnon <pmackinn>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: unspecified Docs Contact:
Priority: high    
Version: DevelopmentCC: ltrilety, matt, mkudlej, tstclair
Target Milestone: 2.1Keywords: FutureFeature, TechPreview
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: condor-7.6.5-0.3 Doc Type: Technology Preview
Doc Text:
Previously, existing statistic collection facilities (ViewCollector) used flat files that were only usable through the condor_stats tool. With this update, a new view collector plug-in has been developed to provide a new operational data store capability for Grid using a NoSQL database. The plug-in writes classad data (Machine, Submitter) to a mongodb NoSQL database. Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers for C++, Python, Ruby, and other languages. This Technology Preview is only supported on RHEL6 at this time. Technology Preview features are not currently supported under Red Hat Enterprise Linux subscription services, may not be functionally complete, and are generally not suitable for production use. However, these features are included as a customer convenience and to provide the technologies with wider exposure. Customers may find these features useful in non-production environments, and can provide feedback and functionality suggestions prior to their transition to fully supported status. Erratas will be provided for high-priority security issues. During its development additional components of a Technology Preview feature may become available to the public for testing. It is the intention of Red Hat to fully support Technology Preview features in a future release.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 17:28:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 742007    
Bug Blocks: 743350, 755648    

Description Pete MacKinnon 2011-08-25 20:20:27 UTC
Scope: sufficient functionality to provide reporting to a designated
internal client (i.e, cumin)

Data Reports:
- same as what is provided today by a view collector and condor_stats (as a start)

Time series:
- hourly (e.g., what was a given Startd's load over the past 5 hours?)
- daily
- weekly (e.g., show me the job submissions and their status for
submitter X over the previous 3 weeks)
- monthly

Deployment Capabilities:
- plug-in that can be dynamically loaded into a ViewServer-configured Collector

Unresolved:
- how to model data of dynamic slots?

Comment 1 Martin Kudlej 2011-09-29 12:12:38 UTC
How can we tests this bug? How we can setup this new feature?

Comment 2 Pete MacKinnon 2011-09-30 12:19:06 UTC
Verification based on info in contrib README using condor_q and condor_stats.

Comment 3 Matthew Farrellee 2011-09-30 12:22:13 UTC
Snapshot of src/condor_contrib/plumage/README at d982bb40, refer to source repo or RPM for current document -

Plumage - NoSQL Operational Data Store for Condor
.................................................

Overview
========
This contrib provides components for an ODS capability in Condor using the 
NoSQL database mongodb. The Quill contrib is a similar effort but based on a 
RDBMS model. The essential design of Plumage is to capture the data traffic 
emitted from the various Condor daemons and convert them from the ClassAd form
into a document instance in a mongodb collection. Once converted, these ClassAd
records can be queried simply and directly based on any attribute.

The initial focus of this contrib is to embed a Plumage ODS plugin inside a 
view collector to capture the raw ClassAds for the Machine and Submitter types.
The plugin operates on a timer to take snapshots of essential attributes at
a configurable interval, and then write those values also to a different 
mongodb collection.

Installation
============
You will need to install mongodb and the C++ driver (1.6.4 minimum) as well as
dependencies:
- js
- boost
- pymongo

Plumage can be included in a Condor source build using the following variables 
when cmake is invoked:

	-DWANT_CONTRIB:BOOL=TRUE -DWITH_MANAGEMENT:BOOL=TRUE -DWITH_PLUMAGE:BOOL=TRUE

Configuration
=============
This initial release is as a plugin to the existing view collector so much of the
Condor configuration relates to that of a standard view collector.

################
# View Server
################
VIEW_SERVER = $(COLLECTOR)
VIEW_SERVER_ARGS = -f -p 12345 -local-name VIEW_SERVER
VIEW_SERVER_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/ViewServerLog"
COLLECTOR.CONDOR_VIEW_HOST = $(CONDOR_HOST):12345
# make sure the view server doesn't point at itself
VIEW_SERVER.CONDOR_VIEW_HOST = 
VIEW_SERVER.KEEP_POOL_HISTORY = True
VIEW_SERVER.SAMPLING_INTERVAL=20
VIEW_SERVER.PLUGINS = $(LIBEXEC)/ODSCollectorPlugin.so
CONDOR_VIEW_CLASSAD_TYPES = Machine, Submitter
POOL_HISTORY_SAMPLING_INTERVAL = 60
UPDATE_INTERVAL = 300
# the following are defaults in code also
ODS_DB_HOST = localhost
ODS_DB_PORT =

plumage_stats.py
================

Plumage has a client tool with *similar* capabilities to the condor_stats tool.
It provides listings of submitters, resources and records over time spans. The 
submitter records include point-in-time snapshots of running, held and idle jobs.
The resource records show arch, OS, keyboard idle time, load average and state.

Usage: plumage_stats.py [options]

Query Condor ODS for statistics

Options:
  -h, --help            show this help message and exit
  -v, --verbose         enable logging
  -s SERVER, --server=SERVER
                        mongodb database server location: e.g., somehost,
                        localhost:2011
  --u=USER, --user=USER
                        stats for a single submitter
  --r=RESOURCE, --resource=RESOURCE
                        stats for a single resource
  --f=START, --from=START
                        records from datetime
  --t=END, --to=END     records to datetime
  --ul, --userlist      list all submitters
  --ugl, --usergrouplist
                        list all submitter groups
  --rl, --resourcelist  list all resources

Use Cases
---------

Find the names of all submitters.
$ ./plumage_stats.py --ul
pmackinn

Find the names of all user groups (i.e., users plus submitter machine).
$ ./plumage_stats.py --ugl
pmackinn/milo.usersys.redhat.com

Find the names of all resources (aka startds, slots).
$ ./plumage_stats.py --rl
slot1.redhat.com
slot2.redhat.com

Print the stats for any username that starts with 'pmackinn' for the previous hour (default time lookback).
$ ./plumage_stats.py --u pmackinn

Print the stats for user 'pmackinn' over a given datetime range.
$ ./plumage_stats.py --u pmackinn --from '2011-09-29 14:02' --to '2011-09-29 14:05'
pmackinn 	2011-09-29 14:02:19.239000 	2 	0 	0
pmackinn 	2011-09-29 14:03:19.235000 	2 	0 	0
pmackinn 	2011-09-29 14:04:19.469000 	0 	0 	0

Print the stats for a resource name that contains 'milo' for the previous hour (default time lookback).
$ ./plumage_stats.py --u pmackinn

Print the stats for resource 'milo' over a given datetime range.
$ ./plumage_stats.py --r milo --from '2011-09-29 14:02' --to '2011-09-29 14:05'
slot1.redhat.com 	2011-09-29 14:02:19.240000 	INTEL/LINUX 	205 	0.24 	Claimed
slot2.redhat.com 	2011-09-29 14:02:19.240000 	INTEL/LINUX 	533 	0.00 	Claimed
slot1.redhat.com 	2011-09-29 14:03:19.236000 	INTEL/LINUX 	12 		0.33 	Claimed
slot2.redhat.com 	2011-09-29 14:03:19.236000 	INTEL/LINUX 	603 	0.00 	Claimed
slot2.redhat.com 	2011-09-29 14:04:19.470000 	INTEL/LINUX 	653 	0.00 	Unclaimed
slot1.redhat.com 	2011-09-29 14:04:19.471000 	INTEL/LINUX 	30 		0.28 	Unclaimed

Comment 4 Pete MacKinnon 2011-10-04 20:03:11 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Desire for a new operational data store capability for Grid using a NoSQL database.
Consequence: Existing statistic collection facilities (ViewCollector) use flat files that are only usable through condor_stats tool.
Change: New view collector plugin was developed to write classad data (Machine, Submitter) to a mongodb NoSQL database.
Result: Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers (C++, Python, Ruby, etc.).

Comment 6 Pete MacKinnon 2011-10-24 14:18:01 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,10 @@
 Cause: Desire for a new operational data store capability for Grid using a NoSQL database.
 Consequence: Existing statistic collection facilities (ViewCollector) use flat files that are only usable through condor_stats tool.
 Change: New view collector plugin was developed to write classad data (Machine, Submitter) to a mongodb NoSQL database.
-Result: Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers (C++, Python, Ruby, etc.).+Result: Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers (C++, Python, Ruby, etc.).
+
+Technology Preview features are not currently supported under Red Hat Enterprise Linux subscription services, may not be functionally complete, and are generally not suitable for production use. However, these features are included as a customer convenience and to provide the technologies with wider exposure.
+
+Customers may find these features useful in non-production environments, and can provide feedback and functionality suggestions prior to their transition to fully supported status. Erratas will be provided for high-priority security issues.
+
+During its development additional components of a Technology Preview feature may become available to the public for testing. It is the intention of Red Hat to fully support Technology Preview features in a future release.

Comment 7 Pete MacKinnon 2011-10-24 14:22:14 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -3,6 +3,8 @@
 Change: New view collector plugin was developed to write classad data (Machine, Submitter) to a mongodb NoSQL database.
 Result: Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers (C++, Python, Ruby, etc.).
 
+This Technology Preview is only supported on RHEL6 at this time.
+
 Technology Preview features are not currently supported under Red Hat Enterprise Linux subscription services, may not be functionally complete, and are generally not suitable for production use. However, these features are included as a customer convenience and to provide the technologies with wider exposure.
 
 Customers may find these features useful in non-production environments, and can provide feedback and functionality suggestions prior to their transition to fully supported status. Erratas will be provided for high-priority security issues.

Comment 8 Lubos Trilety 2011-10-24 14:29:28 UTC
Tested on:
condor-7.6.5-0.2
condor-plumage-7.6.5-0.2

View server stacks dump when tries to open
/var/lib/condor/ViewHist/viewhist0.0.new.

# cat /var/log/condor/ViewServerLog
...
10/24/11 17:45:18 Accumulating data: Time=1319471118
10/24/11 17:45:18 Openning file /var/lib/condor/ViewHist/viewhist0.0.new
10/24/11 17:45:18 Could not open data file
/var/lib/condor/ViewHist/viewhist0.0.new for appending!!! errno=13
10/24/11 17:45:18 ERROR "Could not open data file appending!!!" at line 739 in
file /builddir/build/BUILD/condor-7.6.3/src/condor_collector.V6/view_server.cpp
Stack dump for process 22932 at timestamp 1319471118 (13 frames)
condor_collector(dprintf_dump_stack+0x44)[0x811d354]
condor_collector[0x81235c7]
[0x252400]
[0x252416]
/lib/libc.so.6(gsignal+0x51)[0x68eaf1]
/lib/libc.so.6(abort+0x17a)[0x6903ca]
condor_collector(_EXCEPT_+0xb2)[0x811c472]
condor_collector(_ZN10ViewServer12WriteHistoryEv+0x86c)[0x80a108c]
condor_collector(_ZN12TimerManager7TimeoutEv+0x3bb)[0x80ca07b]
condor_collector(_ZN10DaemonCore6DriverEv+0x265)[0x80bf595]
condor_collector(main+0x13c2)[0x80afff2]
/lib/libc.so.6(__libc_start_main+0xe6)[0x67ace6]
condor_collector[0x809aa91]
...

# ls -l /var/lib/condor/ViewHist/
total 0

>>> ASSIGNED

Comment 9 Timothy St. Clair 2011-10-24 16:30:30 UTC
It seems this requires a permissions modification in the spec.

Comment 11 Tomas Capek 2011-11-17 16:58:02 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1,4 @@
-Cause: Desire for a new operational data store capability for Grid using a NoSQL database.
-Consequence: Existing statistic collection facilities (ViewCollector) use flat files that are only usable through condor_stats tool.
-Change: New view collector plugin was developed to write classad data (Machine, Submitter) to a mongodb NoSQL database.
-Result: Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers (C++, Python, Ruby, etc.).
+Previously, existing statistic collection facilities (ViewCollector) used flat files that were only usable through the condor_stats tool. With this update, a new view collector plug-in has been developed to provide a new operational data store capability for Grid using a NoSQL database. The plug-in writes classad data (Machine, Submitter) to a mongodb NoSQL database. Grid machine and submitter statistics are now generally available to a variety of mongodb programming language drivers for C++, Python, Ruby, and other languages.
 
 This Technology Preview is only supported on RHEL6 at this time.

Comment 12 Lubos Trilety 2011-11-23 14:04:33 UTC
Tested with:
condor-7.6.5-0.7
condor-plumage-7.6.5-0.7
mongodb-1.6.4-4
pymongo-1.9-8
js-1.70-8

Tested on:
RHEL6 i386, x86_64

Statistics were correctly collected to new database.

>>> VERIFIED

Comment 13 errata-xmlrpc 2012-01-23 17:28:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2012-0045.html