Description of problem: There is a plan for the Google Search Appliance (GSA) to expand the search capabilities in the Customer Portal and crawl all browsing experience for the customer. This involves making changes with Package Search and Errata Search in RHN. Proposed Resolution (To do above the following needs to happen): (1) AMS would need to edit / add metatags to the Packages and Errata. This can be done by editing the content itself or via the feed. Method can be determined by AMS. (2) Updating the feed by enabling any changes to the packages or erratas to be seen by the GSA in a more timely fashion, thereby allowing the GSA to reindex the changes made and ultimately increasing the speed of search capability. (3) (this one is not as time sensitive, nor a blocker in accomplishing the goals of this BZ) - Deleting the code and eliminating the technical debt in RHN. Specifically, there is Public Errata (PERL) and Private Errata (JAVA). The Public is a list of packages, but w/o links; while the Private is a list of packages, but w/ links. The suggestion from the RHN Product Owner is to remove the Public Errata altogether. Timeline: In the meeting that took place on 11/1/2012, Mike mentioned that we wanted to have this accomplished asap. Contacts: Mike Amburn, Product Owner RHN Nicky Bronson, Sr BA RHN
US27779 added to AMS backlog.
Spoke with Mike Amburn. He wants new pages created that output information on RHN Errata & Packages. The pages should output XML formatted like the Errata & Package examples from the GSA feed structure document: https://docspace.corp.redhat.com/docs/DOC-125957 The pages should allow limiting the search results by suppling constraints on the content's update date (updatedBefore & updatedAfter). If no dates are supplied the feed returns results for all Errata/Packages. Page should probably be secured in some fashion to prevent abuse as the underlying query is resource intensive and time consuming to run if unlimited. Method of securing the page is still undecided.
Gov Board update 1/10/13 - will be in MR45 which is scheduled for 1st week of Feb release.
Code reviewed, good.
http://git.corp.redhat.com/cgit/rhn/rhn/commit/?id=a02269017bc95b08a5e58256c917c3510c82c89c Feed pages are /rhn/gsa/PackageFeed.do /rhn/gsa/ErrataFeed.do Each page can have one, neither, or both of the following arguments in the URL: updatedAfter updatedBefore Date format is YYYY-MM-DD for those arguments (ex. .../rhn/gsa/ErrataFeed.do?updatedAfter=2012-10-01&updatedBefore=2012-12-31). Maxiumum time span that is searchable is one year. If updatedAfter isn't supplied, page sets updatedAfter=2000-01-01. If updatedBefore isn't supplied, page sets updatedBefore to tomorrow's date. Deployed on rhn.webdev.
Link to test run - https://tcms.engineering.redhat.com/run/54834/
verified on am-qa
Spoke with Ian Hands, who will be creating the crawler for this page. He requested a few changes. He'd like to be able to specify the time down to the second for the updatedAfter & updatedBefore parameters. Rather than limiting the searching to a two month time span, the results should be paginated (&page=0 fetches xml for first n records, &page=1 fetches the next n records after that, etc). When requesting a timespan, the crawler will increment the page parameter until it doesn't get back any results in the xml file. The results returned by the xml page should be ordered by updated date in ascending order. The addition of a meta field that has some sort description of the errata/package (ex. first 120 characters of the description) The url for the errata/packages in the xml should include the hostname, as GSA will have no idea where the xml was generated. This won't be going out with MR45 anymore.
Jared, Thank you. I understand these changes could not be consumed in time to make the QA push. Therefore this bug was pushed to AMS's next release. Please ensure that you have all the changes needed to prevent further delays. Overall, I am really glad that the changes were caught in time to prevent issues, though I do need to watch this delay. Please let me know if there is anything else I can do to facilitate / ensure that this gets into MR46.
Gov Board update 1/24/13 - due to the incoming RHEL7 extended QA work scheduled for Feb, AMS will try hard to get this into MR46 (late Feb), however at this time there are no guarantees. We need to ensure that no further requirements are needed from GSS side, and no further changes are requested of the AMS developer.
Move to AMS backlog - US30282
Gov Board update 2/7/13 - confirmed that this is in sprint.
There are a couple of things that might need to change. For reference have a quick look at the page setup to browse packages : https://access.devgssci.devlab.phx1.redhat.com/search/beta/browse/packages 1) One thing that appears missing immediately is the ability to filter packages based off of RHEL version. This might be an RFE though. This is because there is no RHEL version data in the feed metadata. Example: <meta name="portal_id" content="749185"/> <meta name="portal_title" content="vino-debuginfo-2.28.1-8.el6_3.x86_64.rpm"/> <meta name="portal_description" content="This package provides debug information for package vino. Debug information is useful when developing applications that "/> <meta name="portal_product" content="vino-debuginfo"/> <meta name="portal_product_version" content="2.28.1"/> <meta name="portal_architecture" content="x86_64"/> <meta name="portal_package_version" content="4.8.0"/> <meta name="portal_publication_date" content="2013-01-21T17:30:12"/> <meta name="portal_update_date" content="2013-01-21T17:30:11"/> <meta name="portal_requires_subscription" content="no"/> I think the values in portal_product and portal_product_version might be useful at some point, but what is more useful is the parent product/version. For example in the meta's given above we *could* build a filter on portal_product, but almost every record is going to have a unique product name so the filter would be thousands of entries long... and less useful as a filter. Instead if this vino-debuginfo had the product of "Red Hat Enterprise Linux" and the version of "5". And another vino-debuginfo record had the "Red Hat Enterprise Linux" and the version of "6", then we could easily build the filter for RHEL 5 and RHEL 6. Is there any way to relate a record with it's "parent" product/version (where parent product is ??in all cases?? the RHEL prod/vers)? If so I'd like to see portal_product and portal_product_version be the parent. You can continue to provide the info you currently do in a portal_package and portal_package_version or some similar meta field. FYI: errata currently behaves this way, where the portal_product is like RHEL, RHEL AS, RHEL WS, etc. see: https://access.devgssci.devlab.phx1.redhat.com/search/beta/browse/errata 2) I have performed a few full crawls and noticed that the packages counts seem lower than expected: scratch/errata/.state.yml: 7021 scratch/package/.state.yml: 3148 Is this number 3148 (total uniqed records after the crawl from 1980 to today) right? If it is just a "AM-QA only has a subset of data" thing I understand, and it is probably not much to worry about. I think I recall crawling one full crawl and seeing a much larger number though.
Ignore any previous comments detailing functionality. Refer to https://docspace.corp.redhat.com/docs/DOC-125957 for updated feed format. In addition to the fields listed in the above doc, the feed includes a <meta name="portal_description"/> field. Feed pages are: /rhn/gsa/PackageFeed.do /rhn/gsa/ErrataFeed.do Each of these pages can either, none, or both of the following parameters: onOrAfter (restricts returned results to those with a last_modified time on or after the supplied time) onOrBefore (restricts returned results to those with a last_modified time on or before the supplied time) These parameters are a timestamp of the following format: YYYY-MM-DDTHH24:MI:SS (That is literally a "T" character in the timestamp) This is Year-Month-DayTHour:Minute:Second. Ex: 2010-05-15T15:20:11 There is no restriction on the length of time queried by the feed. Rather than restricting a time length, the feed will limit the returned results to 200 entries. If a page=# parameter is appended to the url, the feed will return the #th set of 200 results. Page # starts at 1. Ex. page=1 is 1st 200, page=2 is 2nd 200. If there are no results to return, the XML returned will have no <record> elements. Records are returned in ascending order of timestamps (Oldest records are first). If multiple records share the same last_update timestamp then those records are ordered by their relevant id (package/errata) in ascending order. If a package/errata has no associated products (Red Hat etc. etc.) it will not be returned by the feed. Example feed page in dev: https://rhn.webdev.redhat.com/rhn/gsa/PackageFeed.do?page=2&onOrBefore=2012-11-30T00:00:00&onOrAfter=2012-10-10T00:00:00 Deployed on rhn.webdev
Updated the doc to reflect current feed format
Must use gsa-doc-crawler login
Testing blocked due to change in requirements
Link to test run - https://tcms.engineering.redhat.com/run/57217/ Need to update the error message to reflect latest requirements.
verified on rhn.webdev.redhat.com
Gov Board update 2/21/13 - there is still ambiguity around the fine tuning of this bug. Until then we cannot release it. A meeting will be held to figure this out.
Per meeting on 2/26/13 with Vkumar and Jared this Bug is ready to go to QA soon as QA is available.
Gov Board Update 3/7/13 - confirmed this is verified and ready to push to prod when environment is available.
This will be released with the RHN MR48 GSA release. Changing Version to MR48. Check our release schedule for RHN MR48 release date: https://docspace.corp.redhat.com/docs/DOC-126420/
Currently scheduled for RHN MR48 on 4/17
fail on QA - bad queries
fail on qa and stage Proxy error generated on the following pages: 1) /rhn/gsa/ErrataFeed.do?&onOrBefore=2012-11-30T00:00:00&page=1 2) /rhn/gsa/ErrataFeed.do?page=2&onOrBefore=2012-11-30T00:00:00 3) /rhn/gsa/ErrataFeed.do?&onOrBefore=2012-11-30T00:00:00&page=3 4) /rhn/gsa/ErrataFeed.do?&onOrBefore=2012-11-30T00:00:00 5) /rhn/gsa/ErrataFeed.do?
verified on rhn.webdev - https://tcms.engineering.redhat.com/run/60849/
verified in qa - https://tcms.engineering.redhat.com/run/60616/
This is schduled for release on 4/22 according to https://docspace.corp.redhat.com/docs/DOC-139955
fail on stage Proxy error generated on the following pages: 1) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?page=3&onOrBefore=2012-11-30T00:00:00 2) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?&onOrBefore=2012-11-30T00:00:00 3) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?&onOrBefore=2012-11-30T00:00:00&page=2 4) https://rhn.code.stage.redhat.com/rhn/gsa/PackageFeed.do?
verified in stage
Stage DB is running slow, which is the cause of the Proxy error timeouts in the UI.
This is now scheduled for release on 4/24, but won't be made available until architectual review sign off.
Released to Production 4/24