resolves #85 document additional conversion scripts

- document the tool that converts Confluence XHTML to AsciiDoc - document DocBookRx, a tool that converts DocBook to AsciiDoc
asciidoctor · Mar 3, 2016 · 5288b26 · 5288b26
1 parent b4c9185
commit 5288b26
Show file tree

Hide file tree

Showing 3 changed files with 97 additions and 0 deletions.
diff --git a/docs/_includes/confluence.adoc b/docs/_includes/confluence.adoc
@@ -0,0 +1,70 @@
+////
+Header: Convert Confluence XHTML to Asciidoctor
+
+Included in:
+
+- user-manual
+////
+
+You can convert Atlassian Confluence XHTML pages to Asciidoctor using this http://www.groovy-lang.org/download.html[Groovy] script.
+
+The script calls the http://pandoc.org/[Pandoc] tool to convert single or multiple HTML files exported from Confluence to AsciiDoc files.
+You will need Pandoc installed before running this script.
+
+NOTE: If you have trouble running this script, you can use the Pandoc command inside the script to manually convert XHTML files to AsciiDoc.
+
+.convert.groovy Confluence XHTML Script
+[source,groovy]
+----
+@Grab('net.sourceforge.htmlcleaner:htmlcleaner:2.4')
+import org.htmlcleaner.*
+
+def src = new File('html').toPath()
+def dst = new File('asciidoc').toPath()
+
+def cleaner = new HtmlCleaner()
+def props = cleaner.properties
+props.translateSpecialEntities = false
+def serializer = new SimpleHtmlSerializer(props)
+
+src.toFile().eachFileRecurse { f ->
+    def relative = src.relativize(f.toPath())
+    def target = dst.resolve(relative)
+    if (f.isDirectory()) {
+        target.toFile().mkdir()
+    } else if (f.name.endsWith('.html')) {
+        def tmpHtml = File.createTempFile('clean', 'html')
+        println "Converting $relative"
+        def result = cleaner.clean(f)
+        result.traverse({ tagNode, htmlNode ->
+                tagNode?.attributes?.remove 'class'
+                if ('td' == tagNode?.name || 'th'==tagNode?.name) {
+                    tagNode.name='td'
+                    String txt = tagNode.text
+                    tagNode.removeAllChildren()
+                    tagNode.insertChild(0, new ContentNode(txt))
+                }
+
+            true
+        } as TagNodeVisitor)
+        serializer.writeToFile(
+                result, tmpHtml.absolutePath, "utf-8"
+        )
+        "pandoc -f html -t asciidoc -R -S --normalize -s $tmpHtml -o ${target}.adoc".execute().waitFor()
+        tmpHtml.delete()
+    }/* else {
+        "cp html/$relative $target".execute()
+    }*/
+}
+----
+
+The script is designed to be run locally on HTML files or directories containing HTML files exported from Confluence.
+
+.Usage
+. Save the script contents to a `convert.groovy` file in a working directory.
+. Make the file executable according to your specific OS requirements.
+. Place individual files, or a directory containing files into the working directory.
+. Run `groovy convert filename.html` to convert a single file.
+. Once you have confirmed the output file meets requirements, you can recurse through a directory by using this command pattern: `groovy convert directory/*.html`
+
+This script was created by Cédric Champeau (https://gist.github.com/melix[melix]). You can find the original version of the script on this https://gist.github.com/melix/6020336[GitHub Gist].
diff --git a/docs/_includes/docbookrx.adoc b/docs/_includes/docbookrx.adoc
@@ -0,0 +1,19 @@
+////
+Included in:
+
+- user-manual: Convert DocBook 5 to Asciidoctor
+////
+
+One of the things Asciidoctor excels at is converting AsciiDoc source into valid and well-formed DocBook 5 XML content.
+
+What if you're in the position where you need to go the other way: migrate all your legacy DocBook 5 XML content to AsciiDoc?
+The prescription (℞) you need to get rid of your DocBook pains could be DocBook℞, which is hosted at https://github.com/opendevise/docbookrx.
+
+DocBookRx is the start of a DocBook to AsciiDoc converter written in Ruby.
+This converter is far from perfect at the moment, and some of the conversion is done hastily.
+
+The plan is to evolve it into a robust library for performing this conversion in a reliable way.
+You can read more about this initiative in the linked repository.
+
+The best thing for this tool is active users putting it through its paces.
+The more advanced the DocBook XML converted by the tool, the better the tool will become.
diff --git a/docs/user-manual.adoc b/docs/user-manual.adoc
@@ -1751,6 +1751,14 @@ NOTE: Section pending
 .Discuss and Contribute
 TIP: Use {uri-ad-org-issues}/462[Issue 462] to drive development of this section. Your contributions make a difference. No contribution is too small.
 
+== Convert Confluence XHTML to Asciidoctor
+
+include::{includedir}/confluence.adoc[]
+
+== Convert DocBook 5 XML to Asciidoctor
+
+include::{includedir}docbookrx.adoc[]
+
 = Resources
 
 [partintro]