Bug 872746
Summary: | RHN: Cleaning technical debt by enabling search (GSA) to crawl packages and errata | ||
---|---|---|---|
Product: | [Retired] Red Hat Network | Reporter: | Nicky <nbronson> |
Component: | RHN/Maintenance | Assignee: | Jared Blashka <jblashka> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nicole Yancey <nyancey> |
Severity: | low | Docs Contact: | |
Priority: | high | ||
Version: | MR48 (AMS) | CC: | bsaylor, dspaldin, hdandala, ihands, jpazdziora, jturel, nbronson, nyancey, rbernlei, tjackson, vlaad |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-04-25 13:43:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nicky
2012-11-02 21:35:57 UTC
US27779 added to AMS backlog. Spoke with Mike Amburn. He wants new pages created that output information on RHN Errata & Packages. The pages should output XML formatted like the Errata & Package examples from the GSA feed structure document: https://docspace.corp.redhat.com/docs/DOC-125957 The pages should allow limiting the search results by suppling constraints on the content's update date (updatedBefore & updatedAfter). If no dates are supplied the feed returns results for all Errata/Packages. Page should probably be secured in some fashion to prevent abuse as the underlying query is resource intensive and time consuming to run if unlimited. Method of securing the page is still undecided. Gov Board update 1/10/13 - will be in MR45 which is scheduled for 1st week of Feb release. Code reviewed, good. http://git.corp.redhat.com/cgit/rhn/rhn/commit/?id=a02269017bc95b08a5e58256c917c3510c82c89c Feed pages are /rhn/gsa/PackageFeed.do /rhn/gsa/ErrataFeed.do Each page can have one, neither, or both of the following arguments in the URL: updatedAfter updatedBefore Date format is YYYY-MM-DD for those arguments (ex. .../rhn/gsa/ErrataFeed.do?updatedAfter=2012-10-01&updatedBefore=2012-12-31). Maxiumum time span that is searchable is one year. If updatedAfter isn't supplied, page sets updatedAfter=2000-01-01. If updatedBefore isn't supplied, page sets updatedBefore to tomorrow's date. Deployed on rhn.webdev. Link to test run - https://tcms.engineering.redhat.com/run/54834/ verified on am-qa Spoke with Ian Hands, who will be creating the crawler for this page. He requested a few changes. He'd like to be able to specify the time down to the second for the updatedAfter & updatedBefore parameters. Rather than limiting the searching to a two month time span, the results should be paginated (&page=0 fetches xml for first n records, &page=1 fetches the next n records after that, etc). When requesting a timespan, the crawler will increment the page parameter until it doesn't get back any results in the xml file. The results returned by the xml page should be ordered by updated date in ascending order. The addition of a meta field that has some sort description of the errata/package (ex. first 120 characters of the description) The url for the errata/packages in the xml should include the hostname, as GSA will have no idea where the xml was generated. This won't be going out with MR45 anymore. Jared, Thank you. I understand these changes could not be consumed in time to make the QA push. Therefore this bug was pushed to AMS's next release. Please ensure that you have all the changes needed to prevent further delays. Overall, I am really glad that the changes were caught in time to prevent issues, though I do need to watch this delay. Please let me know if there is anything else I can do to facilitate / ensure that this gets into MR46. Gov Board update 1/24/13 - due to the incoming RHEL7 extended QA work scheduled for Feb, AMS will try hard to get this into MR46 (late Feb), however at this time there are no guarantees. We need to ensure that no further requirements are needed from GSS side, and no further changes are requested of the AMS developer. Move to AMS backlog - US30282 Gov Board update 2/7/13 - confirmed that this is in sprint. There are a couple of things that might need to change. For reference have a quick look at the page setup to browse packages : https://access.devgssci.devlab.phx1.redhat.com/search/beta/browse/packages 1) One thing that appears missing immediately is the ability to filter packages based off of RHEL version. This might be an RFE though. This is because there is no RHEL version data in the feed metadata. Example: <meta name="portal_id" content="749185"/> <meta name="portal_title" content="vino-debuginfo-2.28.1-8.el6_3.x86_64.rpm"/> <meta name="portal_description" content="This package provides debug information for package vino. Debug information is useful when developing applications that "/> <meta name="portal_product" content="vino-debuginfo"/> <meta name="portal_product_version" content="2.28.1"/> <meta name="portal_architecture" content="x86_64"/> <meta name="portal_package_version" content="4.8.0"/> <meta name="portal_publication_date" content="2013-01-21T17:30:12"/> <meta name="portal_update_date" content="2013-01-21T17:30:11"/> <meta name="portal_requires_subscription" content="no"/> I think the values in portal_product and portal_product_version might be useful at some point, but what is more useful is the parent product/version. For example in the meta's given above we *could* build a filter on portal_product, but almost every record is going to have a unique product name so the filter would be thousands of entries long... and less useful as a filter. Instead if this vino-debuginfo had the product of "Red Hat Enterprise Linux" and the version of "5". And another vino-debuginfo record had the "Red Hat Enterprise Linux" and the version of "6", then we could easily build the filter for RHEL 5 and RHEL 6. Is there any way to relate a record with it's "parent" product/version (where parent product is ??in all cases?? the RHEL prod/vers)? If so I'd like to see portal_product and portal_product_version be the parent. You can continue to provide the info you currently do in a portal_package and portal_package_version or some similar meta field. FYI: errata currently behaves this way, where the portal_product is like RHEL, RHEL AS, RHEL WS, etc. see: https://access.devgssci.devlab.phx1.redhat.com/search/beta/browse/errata 2) I have performed a few full crawls and noticed that the packages counts seem lower than expected: scratch/errata/.state.yml: 7021 scratch/package/.state.yml: 3148 Is this number 3148 (total uniqed records after the crawl from 1980 to today) right? If it is just a "AM-QA only has a subset of data" thing I understand, and it is probably not much to worry about. I think I recall crawling one full crawl and seeing a much larger number though. Ignore any previous comments detailing functionality. Refer to https://docspace.corp.redhat.com/docs/DOC-125957 for updated feed format. In addition to the fields listed in the above doc, the feed includes a <meta name="portal_description"/> field. Feed pages are: /rhn/gsa/PackageFeed.do /rhn/gsa/ErrataFeed.do Each of these pages can either, none, or both of the following parameters: onOrAfter (restricts returned results to those with a last_modified time on or after the supplied time) onOrBefore (restricts returned results to those with a last_modified time on or before the supplied time) These parameters are a timestamp of the following format: YYYY-MM-DDTHH24:MI:SS (That is literally a "T" character in the timestamp) This is Year-Month-DayTHour:Minute:Second. Ex: 2010-05-15T15:20:11 There is no restriction on the length of time queried by the feed. Rather than restricting a time length, the feed will limit the returned results to 200 entries. If a page=# parameter is appended to the url, the feed will return the #th set of 200 results. Page # starts at 1. Ex. page=1 is 1st 200, page=2 is 2nd 200. If there are no results to return, the XML returned will have no <record> elements. Records are returned in ascending order of timestamps (Oldest records are first). If multiple records share the same last_update timestamp then those records are ordered by their relevant id (package/errata) in ascending order. If a package/errata has no associated products (Red Hat etc. etc.) it will not be returned by the feed. Example feed page in dev: https://rhn.webdev.redhat.com/rhn/gsa/PackageFeed.do?page=2&onOrBefore=2012-11-30T00:00:00&onOrAfter=2012-10-10T00:00:00 Deployed on rhn.webdev Updated the doc to reflect current feed format Must use gsa-doc-crawler login Testing blocked due to change in requirements Link to test run - https://tcms.engineering.redhat.com/run/57217/ Need to update the error message to reflect latest requirements. verified on rhn.webdev.redhat.com Gov Board update 2/21/13 - there is still ambiguity around the fine tuning of this bug. Until then we cannot release it. A meeting will be held to figure this out. Per meeting on 2/26/13 with Vkumar and Jared this Bug is ready to go to QA soon as QA is available. Gov Board Update 3/7/13 - confirmed this is verified and ready to push to prod when environment is available. This will be released with the RHN MR48 GSA release. Changing Version to MR48. Check our release schedule for RHN MR48 release date: https://docspace.corp.redhat.com/docs/DOC-126420/ Currently scheduled for RHN MR48 on 4/17 fail on QA - bad queries fail on qa and stage Proxy error generated on the following pages: 1) /rhn/gsa/ErrataFeed.do?&onOrBefore=2012-11-30T00:00:00&page=1 2) /rhn/gsa/ErrataFeed.do?page=2&onOrBefore=2012-11-30T00:00:00 3) /rhn/gsa/ErrataFeed.do?&onOrBefore=2012-11-30T00:00:00&page=3 4) /rhn/gsa/ErrataFeed.do?&onOrBefore=2012-11-30T00:00:00 5) /rhn/gsa/ErrataFeed.do? verified on rhn.webdev - https://tcms.engineering.redhat.com/run/60849/ verified in qa - https://tcms.engineering.redhat.com/run/60616/ This is schduled for release on 4/22 according to https://docspace.corp.redhat.com/docs/DOC-139955 fail on stage Proxy error generated on the following pages: 1) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?page=3&onOrBefore=2012-11-30T00:00:00 2) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?&onOrBefore=2012-11-30T00:00:00 3) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?&onOrBefore=2012-11-30T00:00:00&page=2 4) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do? verified in stage Stage DB is running slow, which is the cause of the Proxy error timeouts in the UI. This is now scheduled for release on 4/24, but won't be made available until architectual review sign off. Released to Production 4/24 |