Bug 804472 - RFE - ability to report on unintended duplicates in a content spec
Summary: RFE - ability to report on unintended duplicates in a content spec
Keywords:
Status: NEW
Alias: None
Product: PressGang CCMS
Classification: Community
Component: DocBook-builder
Version: 1.x
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-19 01:39 UTC by David Le Sage
Modified: 2023-02-21 23:20 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description David Le Sage 2012-03-19 01:39:16 UTC
When you run

csprocessor push spec_file_name.txt

it should check for duplicates/multiple instances of the same topic id in the spec.  


This would be a handy validation test for those of us dealing with long spec files where errors like this can easily occur.

Comment 1 David Le Sage 2012-03-26 00:44:16 UTC
Hey Lee,

To clarify, I simply mean if the same topic appears twice in single content spec.

I don't think there would be many use cases where we want the same topic appearing multiple times in one book.

At the moment, the CSP does not check for this though.


Cheers,


David

Comment 2 Lee Newson 2012-03-26 00:57:40 UTC
Thanks for clearing that up, i'll take a look and see how often a topic may be used twice in one spec and then come back to this.

Comment 3 Joshua Wulf 2012-03-28 02:46:33 UTC
The CSP actually contains functionality to specifically support reusing the same topic multiple times in a book, for example - declaring a new topic and then reusing it:

 # Initial specification of new topic
  Some new topic that I want to use several times in this book [N2, Task]
  ......
 # Second occurrence of that new topic
  Some new topic that I want to use several times in this book [X2]
  ......
 # Third occurrence of that new topic
  Some new topic that I want to use several times in this book [X2]

This is in addition to reusing an existing topic by simply including it more than once with its existing topic ID.

I'm writing a book now which has several tutorial chapters. There are tasks that are repeated in each of those tutorials, so I'm reusing topics multiple times. I'm reusing both existing topics and new topics.

So (re)using the same topic multiple times in a book is probably a common use case for reusable modular content. 

I don't know how you could detect (in software) "intentional reuse" vs "unintentional duplication". 

You could have a csprocessor command "reportdups" that reports on reused topics in a single content spec, and use that as an aid in debugging a content spec.

Comment 4 David Le Sage 2012-03-28 02:50:28 UTC
A reportdups option *would* meet my needs.

Comment 5 Joshua Wulf 2012-03-28 02:57:34 UTC
Here's a partial hack-around.

The shell command:

 sed '/^$/d; s/^[ ]*//g; s/[ ]*$//g' <SPECFILE> | uniq -d

will report duplicate lines in a content spec*.

The sed command removes leading whitespaces, so it will report duplicates no matter what level of indentation they are at. 

* It won't detect topics that are duplicated but differ in their prerequisite or related declarations.

So it will detect:

  Some Topic [486]
  ...
    Some Topic [486]

but won't detect:

  Some Topic [486] [P: 34]
  ... 
    Some Topic [486]

You could make it detect those by selecting the topic declaration up to the first occurrence of ']' and comparing based on that, so it would appear like this to the uniq -d command (whitespace removed, selected up to the first ']'):

Some Topic [486]
Some Topic [486]

The sed/awk command for that is outside my kung-fu.

Comment 6 Lee Newson 2012-03-28 03:18:35 UTC
Just thought I'd let you know that that command won't work. The uniq only works for strings that are "adjacent". So if you have:

  Some Topic [486]
  Section: Some Section
    Some Topic [486]

Then it won't work.

Sources: Tried it and also the uniq man page.

Comment 7 Joshua Wulf 2012-03-28 03:21:33 UTC
Right you are, I forgot the sort part. This works for the test case above:

sed '/^$/d; s/^[ ]*//g; s/[ ]*$//g' <SPECFILE> | sort | uniq -d

Comment 8 Lee Newson 2014-01-13 06:15:17 UTC
Moving this over to DocBuilder as we are trying to include all report information in that component now.


Note You need to log in before you can comment on or make changes to this bug.