Bug 788576

Summary: Publican generating duplicate id labels in html output
Product: [Community] Publican Reporter: William Cohen <wcohen>
Component: publicanAssignee: Jeff Fearn <jfearn>
Status: CLOSED CURRENTRELEASE QA Contact: Ruediger Landmann <rlandman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.8CC: mhideo, mmcallis, r.landmann, rlandman, sgordon
Target Milestone: 3.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 3.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-30 23:11:47 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 820023    

Description William Cohen 2012-02-08 09:04:04 EST
Description of problem:

The resulting publican html output for the systemtap beginners guide has duplicated id="<label>". Each label on a web page should be unique.


Version-Release number of selected component (if applicable):
publican-2.8-1.fc16.noarch
systemtap-1.6-1.fc16.x86_64


How reproducible:
always

Steps to Reproduce:
1. yumdownloader --source systemtap
2. yum-builddep ./systemtap-1.6-1.fc16.src.rpm
3. rpm -Uvh systemtap-1.6-1.fc16.src.rpm
4. cd rpmbuild/SPECS/; rpmbuild --define "with_publican 1" -ba systemtap.spec
5. cd ~/rpmbuild/BUILD/systemtap-1.6/doc/beginners/SystemTap_Beginners_Guide


  
Actual results:

Many of the .html pages have duplicated 'id="<label>"' such as the 'id="goal"' in instrouction.html. This can be checked with http://validator.w3.org/check


Expected results:

No duplicated labels in the generated html.


Additional info:

Can also see the same problem on a number of the Red Hat documentation pages such as:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.0_Release_Notes/installer.html

http://validator.w3.org/check?uri=http%3A%2F%2Fdocs.redhat.com%2Fdocs%2Fen-US%2FRed_Hat_Enterprise_Linux%2F6%2Fhtml%2F6.0_Release_Notes%2Finstaller.html&charset=%28detect+automatically%29&doctype=Inline&group=0
Comment 1 William Cohen 2012-02-08 11:34:45 EST
The duplicates all seem to be for <section id="<label>">

<section id="cross-compiling">
  <title>Generating Instrumentation for Other Computers</title>

</section>

Also looks like it might only be the first <section id="<label>"> that gets the problem html generated.
Comment 2 Ruediger Landmann 2012-03-12 19:40:26 EDT
Thanks William; moving this upstream
Comment 3 Jeff Fearn 2012-03-13 02:50:53 EDT
Removed duplicate IDs in HTML outputs.

Pushed To ssh://git.fedorahosted.org/git/publican.git
   55c8a86..a033b42  master -> master
Comment 4 Michael Hideo 2012-06-07 21:51:14 EDT
(In reply to comment #0)
> Description of problem:
> 
> The resulting publican html output for the systemtap beginners guide has
> duplicated id="<label>". Each label on a web page should be unique.
> 
> 
> Version-Release number of selected component (if applicable):
> publican-2.8-1.fc16.noarch
> systemtap-1.6-1.fc16.x86_64
> 
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. yumdownloader --source systemtap
> 2. yum-builddep ./systemtap-1.6-1.fc16.src.rpm
> 3. rpm -Uvh systemtap-1.6-1.fc16.src.rpm
> 4. cd rpmbuild/SPECS/; rpmbuild --define "with_publican 1" -ba systemtap.spec
> 5. cd ~/rpmbuild/BUILD/systemtap-1.6/doc/beginners/SystemTap_Beginners_Guide
> 
> 
>   
> Actual results:
> 
> Many of the .html pages have duplicated 'id="<label>"' such as the
> 'id="goal"' in instrouction.html. This can be checked with
> http://validator.w3.org/check
> 
> 
> Expected results:
> 
> No duplicated labels in the generated html.
> 
> 
> Additional info:
> 
> Can also see the same problem on a number of the Red Hat documentation pages
> such as:
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.
> 0_Release_Notes/installer.html
> 
> http://validator.w3.org/check?uri=http%3A%2F%2Fdocs.redhat.com%2Fdocs%2Fen-
> US%2FRed_Hat_Enterprise_Linux%2F6%2Fhtml%2F6.0_Release_Notes%2Finstaller.
> html&charset=%28detect+automatically%29&doctype=Inline&group=0

follow Will's 5 steps and verify.
Comment 5 Michael Hideo 2012-06-07 21:58:23 EDT
(In reply to comment #0)
> Description of problem:
> 
> The resulting publican html output for the systemtap beginners guide has
> duplicated id="<label>". Each label on a web page should be unique.
> 
> 
> Version-Release number of selected component (if applicable):
> publican-2.8-1.fc16.noarch
> systemtap-1.6-1.fc16.x86_64
> 
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. yumdownloader --source systemtap
> 2. yum-builddep ./systemtap-1.6-1.fc16.src.rpm
> 3. rpm -Uvh systemtap-1.6-1.fc16.src.rpm
> 4. cd rpmbuild/SPECS/; rpmbuild --define "with_publican 1" -ba systemtap.spec
> 5. cd ~/rpmbuild/BUILD/systemtap-1.6/doc/beginners/SystemTap_Beginners_Guide
> 
> 
>   
> Actual results:
> 
> Many of the .html pages have duplicated 'id="<label>"' such as the
> 'id="goal"' in instrouction.html. This can be checked with
> http://validator.w3.org/check
> 
> 
> Expected results:
> 
> No duplicated labels in the generated html.
> 
> 
> Additional info:
> 
> Can also see the same problem on a number of the Red Hat documentation pages
> such as:
> 
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.
> 0_Release_Notes/installer.html
> 
> http://validator.w3.org/check?uri=http%3A%2F%2Fdocs.redhat.com%2Fdocs%2Fen-
> US%2FRed_Hat_Enterprise_Linux%2F6%2Fhtml%2F6.0_Release_Notes%2Finstaller.
> html&charset=%28detect+automatically%29&doctype=Inline&group=0

follow Will's 5 steps and verify.
Comment 6 Stephen Gordon 2012-06-08 14:09:36 EDT
There was actually an additional step required here because the revision history entries of the SystemTap Beginners Guide don't match the format expected by publican 3.0 (has 2.0 instead of 2-0). 

To get around this I had to extract the tar file in ~/rpmbuild/SOURCES/, modify the Revision_History.xml in the source tree, and then re-create the tar file. These actions should not however impact the validity of the test results.

Once the build completed I changed into the directory containing the html and ran a check to find duplicate IDs, none were returned (the sort is required because uniq -d only returns duplicates if they are 'touching'):

$ grep -o 'id=\"[^ ]*\"' *.html | sort | uniq -d
$

I also did a check specifically on one of the examples cited in the bug description and found only the one instance, no duplicates:

$ grep -o 'id=\"goals\"' *.html
introduction.html:id="goals"

Based on the above moving to VERIFIED.