Create Keywords Metadata From Index Terms

In an HTML page, the keywords can be listed in the metadata. An example might look like this

<meta name="keywords" content="DocBook, XML, XSLT, index terms" />

This metadata is hardly ever used these days, but at one time, those words may have helped search engines find the page. However, it doesn’t work that way anymore. For the past several years, most search engines don’t use the metadata and most people never use them. In some situations they can still be useful, say if you are generating pages for a corporation using a Google Search Appliance that has been explicitly configured to include the keywords meta tag.

If you are in such a situation where keyword metadata is actually used and you are converting from DocBook XML to HTML, this article can help. It shows how, using XSLT1.0, you can harvest the DocBook primary index terms and use them as keyword metadata in the HTML generated by the DocBook XSL stylesheets.

We start by adding a stylesheet to our DocBook customization layer. You can add it to your user.head.content template or create one from scratch. That template is automatically called from the DocBook stylesheet as a hook so you can add your own content in the head element in the output HTML files. You can find the details in the excellent DocBook XSL: The Complete Guide. Here’s an example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl" 
    version="1.0">

    <xsl:template name="user.head.content">
        <xsl:call-template name="keywordset" />
    </xsl:template>

The head element will have content from the following template named keywordset. First, define a variable section_level that describes the depth of the current section; we only want keywords that are actually in the HTML page, not necessarily all the terms in the section. That depends on the chunking depth, so we need to know how deep we are.

<xsl:template name="keywordset">
    <xsl:variable name="section_level">
        <xsl:number value="count(ancestor-or-self::d:section)" />
    </xsl:variable>

In the following code block, if we are at the start of a part or a book, the template does nothing. If we are not on a part or book, and we have index terms, create a new variable called indexterms.

If we’re at the top level (a chapter), do nothing. The only thing that gets created at that level is a table of contents so we don’t want any keywords there.

If we’re not at the bottom (that is, the current section level is less than the level we are chunking to, as set in the parameter chunk.section.depth), get all index primary terms that are immediate children and all index primary terms that are children of non-section elements (for example, index terms inside itemized lists). The main issue is that we do not want to gather the terms inside subsections—they will be on a different HTML page because of the chunking depth.

<xsl:if test="not(self::part) and not(self::book) and .//d:indexterm">
    <xsl:variable name="indexterms">
        <xsl:choose>
        <xsl:when test="$section_level = 0" />
        <xsl:when test="$section_level &lt; $chunk.section.depth">
            <xsl:copy-of select="./d:indexterm/d:primary|./*[not(self::d:section)]//d:indexterm/d:primary" />
        </xsl:when>

Otherwise, we are in a section at a depth greater than or equal to the chunking depth. So get all of the descendant primary index terms.

      <xsl:otherwise>
        <xsl:copy-of select=".//d:indexterm/d:primary" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>

So now we have a node-set (we used copy-of) of all the index terms that are germane to the current chunked section and that set is contained in the variable indexterms.

Of course it’s likely we have duplicate terms, so let’s get all the terms that are unique and put them into the variable indexterms-unique. The following technique is one of many recipes that you might find useful in XSLT CookBook.

Use xsl:for-each to loop through the primary terms we’ve just gathered, and filter them so that we skip any that are already in the new unique list. That is, skip the term if it matches a preceding-sibling.

Then we use normalize-space to get the string value of the element, and append a comma (,) if this term isn’t the last one of the list.

  <xsl:variable name="indexterms-unique">
    <xsl:for-each select="exsl:node-set($indexterms)/*[not(. = preceding-sibling::*)]">
      <xsl:value-of select="normalize-space(.)" />
      <xsl:if test="not(position() =  last())">,</xsl:if>
    </xsl:for-each>
  </xsl:variable>

Finally, insert the unique, comma-separated strings into the meta tag as the content attribute string.

  :::xml
  <meta name="keywords">
  <xsl:attribute name="content">
    <xsl:value-of select="$indexterms-unique" />
  </xsl:attribute>
</meta>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

While keyword metadata isn’t useful for general search engines, you still run into situations where it is used. Harvesting index primary terms to use as keywords populate that metadata with exactly what the information needed.

For convenience, here is the stylesheet in one block:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl"
 version="1.0">

<xsl:template name="user.head.content">
  <xsl:call-template name="keywordset"/>
</xsl:template>

<xsl:template name="keywordset">
  <xsl:variable name="section_level">
    <xsl:number value="count(ancestor-or-self::d:section)"/>
  </xsl:variable>

  <xsl:if test="not(self::part) and not(self::book) and .//d:indexterm">
    <xsl:variable name="indexterms">
      <xsl:choose>
        <xsl:when test="$section_level = 0"/>
        <xsl:when test="$section_level &lt; $chunk.section.depth">
          <xsl:copy-of select="./d:indexterm/d:primary|./*[not(self::d:section)]//d:indexterm/d:primary"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select=".//d:indexterm/d:primary"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <xsl:variable name="indexterms-unique">
      <xsl:for-each select="exsl:node-set($indexterms)/*[not(. = preceding-sibling::*)]">
        <xsl:value-of select="normalize-space(.)"/>
        <xsl:if test="not(position() =  last())">,</xsl:if>
      </xsl:for-each>
    </xsl:variable>

    <meta name="keywords">
      <xsl:attribute name="content">
        <xsl:value-of select="$indexterms-unique"/>
      </xsl:attribute>
    </meta>
  </xsl:if>

</xsl:template>

links

social