Skip to content

Commit

Permalink
resolves #85 document additional conversion scripts
Browse files Browse the repository at this point in the history
- document the tool that converts Confluence XHTML to AsciiDoc
- document DocBookRx, a tool that converts DocBook to AsciiDoc
  • Loading branch information
jaredmorgs authored and mojavelinux committed Mar 3, 2016
1 parent b4c9185 commit 5288b26
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 0 deletions.
70 changes: 70 additions & 0 deletions docs/_includes/confluence.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
////
Header: Convert Confluence XHTML to Asciidoctor

Included in:

- user-manual
////
You can convert Atlassian Confluence XHTML pages to Asciidoctor using this http://www.groovy-lang.org/download.html[Groovy] script.

The script calls the http://pandoc.org/[Pandoc] tool to convert single or multiple HTML files exported from Confluence to AsciiDoc files.
You will need Pandoc installed before running this script.

NOTE: If you have trouble running this script, you can use the Pandoc command inside the script to manually convert XHTML files to AsciiDoc.

.convert.groovy Confluence XHTML Script
[source,groovy]
----
@Grab('net.sourceforge.htmlcleaner:htmlcleaner:2.4')
import org.htmlcleaner.*
def src = new File('html').toPath()
def dst = new File('asciidoc').toPath()
def cleaner = new HtmlCleaner()
def props = cleaner.properties
props.translateSpecialEntities = false
def serializer = new SimpleHtmlSerializer(props)
src.toFile().eachFileRecurse { f ->
def relative = src.relativize(f.toPath())
def target = dst.resolve(relative)
if (f.isDirectory()) {
target.toFile().mkdir()
} else if (f.name.endsWith('.html')) {
def tmpHtml = File.createTempFile('clean', 'html')
println "Converting $relative"
def result = cleaner.clean(f)
result.traverse({ tagNode, htmlNode ->
tagNode?.attributes?.remove 'class'
if ('td' == tagNode?.name || 'th'==tagNode?.name) {
tagNode.name='td'
String txt = tagNode.text
tagNode.removeAllChildren()
tagNode.insertChild(0, new ContentNode(txt))
}
true
} as TagNodeVisitor)
serializer.writeToFile(
result, tmpHtml.absolutePath, "utf-8"
)
"pandoc -f html -t asciidoc -R -S --normalize -s $tmpHtml -o ${target}.adoc".execute().waitFor()
tmpHtml.delete()
}/* else {
"cp html/$relative $target".execute()
}*/
}
----

The script is designed to be run locally on HTML files or directories containing HTML files exported from Confluence.

.Usage
. Save the script contents to a `convert.groovy` file in a working directory.
. Make the file executable according to your specific OS requirements.
. Place individual files, or a directory containing files into the working directory.
. Run `groovy convert filename.html` to convert a single file.
. Once you have confirmed the output file meets requirements, you can recurse through a directory by using this command pattern: `groovy convert directory/*.html`

This script was created by Cédric Champeau (https://gist.github.com/melix[melix]). You can find the original version of the script on this https://gist.github.com/melix/6020336[GitHub Gist].
19 changes: 19 additions & 0 deletions docs/_includes/docbookrx.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
////
Included in:

- user-manual: Convert DocBook 5 to Asciidoctor
////
One of the things Asciidoctor excels at is converting AsciiDoc source into valid and well-formed DocBook 5 XML content.

What if you're in the position where you need to go the other way: migrate all your legacy DocBook 5 XML content to AsciiDoc?
The prescription (℞) you need to get rid of your DocBook pains could be DocBook℞, which is hosted at https://github.com/opendevise/docbookrx.

DocBookRx is the start of a DocBook to AsciiDoc converter written in Ruby.
This converter is far from perfect at the moment, and some of the conversion is done hastily.

The plan is to evolve it into a robust library for performing this conversion in a reliable way.
You can read more about this initiative in the linked repository.

The best thing for this tool is active users putting it through its paces.
The more advanced the DocBook XML converted by the tool, the better the tool will become.
8 changes: 8 additions & 0 deletions docs/user-manual.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1751,6 +1751,14 @@ NOTE: Section pending
.Discuss and Contribute
TIP: Use {uri-ad-org-issues}/462[Issue 462] to drive development of this section. Your contributions make a difference. No contribution is too small.

== Convert Confluence XHTML to Asciidoctor

include::{includedir}/confluence.adoc[]

== Convert DocBook 5 XML to Asciidoctor

include::{includedir}docbookrx.adoc[]

= Resources

[partintro]
Expand Down

0 comments on commit 5288b26

Please sign in to comment.