804472 – RFE - ability to report on unintended duplicates in a content spec

Bug 804472 - RFE - ability to report on unintended duplicates in a content spec

Summary: RFE - ability to report on unintended duplicates in a content spec

Keywords:
Status:	NEW
Alias:	None
Product:	PressGang CCMS
Classification:	Community
Component:	DocBook-builder
Sub Component:
Version:	1.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Nobody
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-19 01:39 UTC by David Le Sage
Modified:	2023-02-21 23:20 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Description David Le Sage 2012-03-19 01:39:16 UTC

When you run

csprocessor push spec_file_name.txt

it should check for duplicates/multiple instances of the same topic id in the spec.  


This would be a handy validation test for those of us dealing with long spec files where errors like this can easily occur.

Comment 1 David Le Sage 2012-03-26 00:44:16 UTC

Hey Lee,

To clarify, I simply mean if the same topic appears twice in single content spec.

I don't think there would be many use cases where we want the same topic appearing multiple times in one book.

At the moment, the CSP does not check for this though.


Cheers,


David

Comment 2 Lee Newson 2012-03-26 00:57:40 UTC

Thanks for clearing that up, i'll take a look and see how often a topic may be used twice in one spec and then come back to this.

Comment 3 Joshua Wulf 2012-03-28 02:46:33 UTC

The CSP actually contains functionality to specifically support reusing the same topic multiple times in a book, for example - declaring a new topic and then reusing it:

 # Initial specification of new topic
  Some new topic that I want to use several times in this book [N2, Task]
  ......
 # Second occurrence of that new topic
  Some new topic that I want to use several times in this book [X2]
  ......
 # Third occurrence of that new topic
  Some new topic that I want to use several times in this book [X2]

This is in addition to reusing an existing topic by simply including it more than once with its existing topic ID.

I'm writing a book now which has several tutorial chapters. There are tasks that are repeated in each of those tutorials, so I'm reusing topics multiple times. I'm reusing both existing topics and new topics.

So (re)using the same topic multiple times in a book is probably a common use case for reusable modular content. 

I don't know how you could detect (in software) "intentional reuse" vs "unintentional duplication". 

You could have a csprocessor command "reportdups" that reports on reused topics in a single content spec, and use that as an aid in debugging a content spec.

Comment 4 David Le Sage 2012-03-28 02:50:28 UTC

A reportdups option *would* meet my needs.

Comment 5 Joshua Wulf 2012-03-28 02:57:34 UTC

Here's a partial hack-around.

The shell command:

 sed '/^$/d; s/^[ ]*//g; s/[ ]*$//g' <SPECFILE> | uniq -d

will report duplicate lines in a content spec*.

The sed command removes leading whitespaces, so it will report duplicates no matter what level of indentation they are at. 

* It won't detect topics that are duplicated but differ in their prerequisite or related declarations.

So it will detect:

  Some Topic [486]
  ...
    Some Topic [486]

but won't detect:

  Some Topic [486] [P: 34]
  ... 
    Some Topic [486]

You could make it detect those by selecting the topic declaration up to the first occurrence of ']' and comparing based on that, so it would appear like this to the uniq -d command (whitespace removed, selected up to the first ']'):

Some Topic [486]
Some Topic [486]

The sed/awk command for that is outside my kung-fu.

Comment 6 Lee Newson 2012-03-28 03:18:35 UTC

Just thought I'd let you know that that command won't work. The uniq only works for strings that are "adjacent". So if you have:

  Some Topic [486]
  Section: Some Section
    Some Topic [486]

Then it won't work.

Sources: Tried it and also the uniq man page.

Comment 7 Joshua Wulf 2012-03-28 03:21:33 UTC

Right you are, I forgot the sort part. This works for the test case above:

sed '/^$/d; s/^[ ]*//g; s/[ ]*$//g' <SPECFILE> | sort | uniq -d

Comment 8 Lee Newson 2014-01-13 06:15:17 UTC

Moving this over to DocBuilder as we are trying to include all report information in that component now.

Note You need to log in before you can comment on or make changes to this bug.