Bug 535759 (RHQ-2421)

Summary: content source syncing fails due to "IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" error during RSS4J parsing of RSS XML
Product: [Other] RHQ Project Reporter: Ian Springer <ian.springer>
Component: ContentAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1.2CC: ccrouch, cwelton, jshaughn
Target Milestone: ---Keywords: SubBug
Target Release: ---   
Hardware: All   
OS: All   
URL: http://jira.rhq-project.org/browse/RHQ-2421
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-09 15:48:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 565635    

Description Ian Springer 2009-09-11 17:08:00 UTC
This looks to be caused by the request for the XHTML DTD being rejected by the W3C web server with a 503 error - see http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic for details. It looks like the fix will be to modify the RssParser class from RSS4J to use the  Xerces XML Catalog (http://xerces.apache.org/xerces2-j/faq-xcatalogs.html) support to cache the XHTML DTD and other DTDs, schemas, etc.  Based on the comments on the W3C blog, we may also need to set the User-Agent to tell the W3C web server we're not the raw Java libs.

Note, we need to find out where the source code for our forked RSS4J version lives, since we'll need to update it. Jess Sant worked on that. Charles said John Sanda may know where it is.

Here's the stack trace for the error:

2009-09-11 11:19:46,613 INFO  [org.quartz.core.JobRunShell] Job syncContentSource.f57d23fd--d225257a--123a9b04a3f threw a JobExecutionException:
org.quartz.JobExecutionException: Failed to sync content source in job [JobDetail 'syncContentSource.f57d23fd--d225257a--123a9b04a3f':  jobClass: 'org.rhq.enterprise.server.scheduler.jobs.ContentSourceSyncJob isStateful: true isVolatile: false isDurable: false requestsRecovers: false] [See nested exception: java.lang.Exception: Failed to sync content source [10001]]
        at org.rhq.enterprise.server.scheduler.jobs.ContentSourceSyncJob.execute(ContentSourceSyncJob.java:89)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
Caused by: java.lang.Exception: Failed to sync content source [10001]
        at org.rhq.enterprise.server.plugin.content.ContentSourceAdapterManager.synchronizeContentSource(ContentSourceAdapterManager.java:261)
        at org.rhq.enterprise.server.content.ContentSourceManagerBean.internalSynchronizeContentSource(ContentSourceManagerBean.java:717)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:112)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:166)
        at org.rhq.enterprise.server.common.TransactionInterruptInterceptor.addCheckedActionToTransactionManager(TransactionInterruptInterceptor.java:77)
        at sun.reflect.GeneratedMethodAccessor159.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118)
        at org.rhq.enterprise.server.authz.RequiredPermissionsInterceptor.checkRequiredPermissions(RequiredPermissionsInterceptor.java:153)
        at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:118)
        at org.jboss.ejb3.interceptor.EJB3InterceptorsInterceptor.invoke(EJB3InterceptorsInterceptor.java:63)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.entity.TransactionScopedEntityManagerInterceptor.invoke(TransactionScopedEntityManagerInterceptor.java:54)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.AllowedOperationsInterceptor.invoke(AllowedOperationsInterceptor.java:47)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.tx.TxPolicy.invokeInNoTx(TxPolicy.java:66)
        at org.jboss.aspects.tx.TxInterceptor$Never.invoke(TxInterceptor.java:66)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:95)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:62)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.aspects.security.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:77)
        at org.jboss.ejb3.security.Ejb3AuthenticationInterceptor.invoke(Ejb3AuthenticationInterceptor.java:110)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:46)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.asynchronous.AsynchronousInterceptor.invoke(AsynchronousInterceptor.java:106)
        at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:101)
        at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:240)
        at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:210)
        at org.jboss.ejb3.stateless.StatelessLocalProxy.invoke(StatelessLocalProxy.java:84)
        at $Proxy254.internalSynchronizeContentSource(Unknown Source)
        at org.rhq.enterprise.server.scheduler.jobs.ContentSourceSyncJob.synchronizeAndLoad(ContentSourceSyncJob.java:142)
        at org.rhq.enterprise.server.scheduler.jobs.ContentSourceSyncJob.execute(ContentSourceSyncJob.java:84)
        ... 2 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.rhq.enterprise.server.plugin.content.ContentSourceAdapterManager$IsolatedInvocationHandler.invoke(ContentSourceAdapterManager.java:571)
        at $Proxy446.synchronizePackages(Unknown Source)
        at org.rhq.enterprise.server.plugin.content.ContentSourceAdapterManager.synchronizeContentSource(ContentSourceAdapterManager.java:206)
        ... 45 more
Caused by: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
        at churchillobjects.rss4j.parser.RssParser.parseRss(RssParser.java:221)
        at churchillobjects.rss4j.parser.RssParser.parseRss(RssParser.java:117)
        at org.rhq.enterprise.server.plugins.jboss.software.JBossSoftwareContentSourceAdapter.retrieveRssDocument(JBossSoftwareContentSourceAdapter.java:227)
        at org.rhq.enterprise.server.plugins.jboss.software.JBossSoftwareContentSourceAdapter.synchronizePackages(JBossSoftwareContentSourceAdapter.java:120)
        ... 52 more


Comment 1 Ian Springer 2009-09-11 17:12:46 UTC
The RSS4J source is available in the Maven repo (http://repository.jboss.com/maven2/rss4j/rss4j/0.92-on.2/rss4j-0.92-on.2-sources.jar). If we can't find where it is in some SVN repo, we may just want to add it to the RHQ SVN repo and use that location to maintain it from here on out.

Charles, to clarify, did you say to ask John because the CSP also uses the same modified RSS4J that we do and it therefore might live in their SVN?


Comment 2 Red Hat Bugzilla 2009-11-10 21:04:08 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-2421


Comment 3 wes hayutin 2010-02-16 16:56:19 UTC
Temporarily adding the keyword "SubBug" so we can be sure we have accounted for all the bugs.

keyword:
new = Tracking + FutureFeature + SubBug

Comment 4 wes hayutin 2010-02-16 17:01:30 UTC
making sure we're not missing any bugs in rhq_triage

Comment 5 Corey Welton 2010-12-23 14:44:48 UTC
ips - what's status on this?