Google Search Appliance Tip: making PDF search even better

Searching within PDFs: a bit of a pain

A client of ours recently had a problem. The Google Search Appliance had indexed a repository of PDF files and was returning relevant results in sub-second response times.

However, the PDFs in the search results ran to about 90 pages on average. So the problem was this: once the PDF was opened from the search results page, the visitor had to initiate another search within the PDF (using Acrobat Reader) to find the information they were looking for.

Bridging the Gap between the search results and another PDF search

I came up with a simple solution to make the users’ lives a little bit easier. By adding the search terms to the URL of the PDF by way of a querystring, a search within the PDF can be initiated saving the user from having to start another search once the PDF has opened. All it requires is a small modification of the Google Search Appliance XSLT.

Google Search Appliance XSLT code to modify the search result


<xsl:value-of disable-output-escaping='yes' select="U"/>
<xsl:if test="$res_type=’[PDF]‘">
<xsl:text disable-output-escaping="yes">#search=</xsl:text><xsl:value-of select="$space_normalized_query"/>
</xsl:if>

The querystring parameters are based on Open Parameters for PDF (PDF, 156KB).

Categories Enterprise Search, Technology, Usability