by

Awesome Asciidoctor.js: Find broken links in your documentation

One of the most frustrating experiences for a reader is broken links. Using a good site generator can help but what about external links ?
How do you make sure that all the external links in your documentation still work ?

With Asciidoctor.js, you can enable a catalog using the catalog_assets option. Once this option is enabled, the processor will capture every links found in your document, and store them in the catalog.
For reference, the parser will also capture footnotes, images, index terms, cross-references and ids. But let’s focus on links for now.

Here’s an example where catalog_assets is enabled:

const input = `
* link:subdir/foo.pdf[]
* link:subdir/bar.pdf[]
* link:quz.pdf[]
* https://antoraa.org
* https://asciidoctor.org
* https://yuzutech.fr
* https://asciidoctor.org/doc
* http://neverssl.com`

const doc = asciidoctor.load(input, { 'catalog_assets': true }) 
doc.convert() 
const linksCatalog = doc.getLinks() 
console.log(linksCatalog) // [ 'subdir/foo.pdf', 'subdir/bar.pdf', ... ]
1 Enable the catalog_assets option
2 Convert the document because links will only be available after the document has been converted
3 Return an Array of links found in the document

Now that we have all the links present in our document, we can make sure that every one of them is still working.
To do that we should probably use a library but here’s a naive implementation to give you an idea.

Here, we are using the http and https module from Node.js to make sure that the server does not return a 4xx or 5xx errors for the URL:

const https = require('https')
const http = require('http')

const checkHttpLink = link => new Promise(resolve => {
  const module = link.startsWith('https://') ? https : http
  module.get(link, res => {
    const isError = res.statusCode >= 400 && res.statusCode < 600
    if (isError) { 
      resolve({
        error: true,
        message: `Found a broken link: ${link} - Status code is: ${res.statusCode}`
      })
    } else { 
      resolve({ error: false })
    }
  }).on('error', e => resolve({ 
    error: true,
    message: `Found a broken link: ${link} - ${e}`
  }))
})
1 The server returns an error code 4xx or 5xx
2 The status code is (considered) valid
3 The client returns an error (most likely because the server is nonexistent)

We are not using reject because we want all the promises to be resolved (ie. we don’t want to stop at the first failure).

We also define a function to check if a file exists:

const util = require('util')
const stat = util.promisify(require('fs').stat)

const checkFileLink = path => stat(path) 
  .catch(error => ({
    error: true,
    message: `Found a broken link: ${path} - ${error.toString()}`
  }))
1 Use stat to make sure that the file exists

And finally, we iterate on each link:

const ospath = require('path')
const url = require('url')

const promises = linksCatalog.map((link) => {
  const uri = url.parse(link) 
  if (uri.protocol === 'https:' || uri.protocol === 'http:') {
    return checkHttpLink(link) 
  }
  if (uri.protocol === 'file:') {
    return checkFileLink(ospath.normalize(`${uri.host}${uri.path}`)) 
  }
  if (uri.protocol === null) {
    return checkFileLink(link) 
  }
  return Promise.resolve({ 
    error: true,
    message: `Unsupported protocol ${uri.protocol}. Unable to check the ${link}.`
  })
})

Promise.all(promises)
  .then((result) => {
    const errors = result.filter(item => item.error === true)
    if (errors.length > 0) { 
      errors.forEach(error => {
        console.log(error.message)
      })
      // abort the mission!
      process.exit(1) 
    } else {
      // all good...
    }
  })
1 Parse the link
2 If the protocol is http: or http:, use the checkHttpLink function
3 If the protocol is file:, normalize the path and use the checkFileLink function
4 If the protocol is undefined, use the checkFileLink function
5 If the protocol is unsupported, return an error
6 Check if there’s one or more errors
7 Exit the program with the return code 1

And here’s the output of this program:

Found a broken link: subdir/bar.pdf - Error: ENOENT: no such file or directory, stat 'subdir/bar.pdf'
Found a broken link: quz.pdf - Error: ENOENT: no such file or directory, stat 'quz.pdf'
Found a broken link: https://antoraa.org - Error: getaddrinfo ENOTFOUND antoraa.org antoraa.org:443
Found a broken link: https://asciidoctor.org/doc - Status code is: 404