Tech by LLM
Laura L Martin

Blog by LLM
Speed Up Searching Inside.

A Problem Solving Experience


I was a member of a team responsible for building custom procedures, numbering in the thousands, which documented how to maintain power plant equipment. The client would send the team large packages of equipment manufacturer information in PDF format. The team gathered information from those PDFs, and used that material as the basis for building custom procedures in Word, enhanced with custom drawings, designs and other material, which the client required.


Each PDF was a very large package of content, housing dozens of equipment manuals, and numbering in the thousands of pages.  Hundreds of PDF packages were sent to us by the client every month.


Initially the PDFs and the custom procedures (Word files) were stored in the same folder hierarchy on a local server, organized by plant.  Searching the folder hierarchy for details inside the files was a very slow process, since the Windows search engine had to search tens of thousands of pages housed in both PDF and Word files.


I suggested a redesign of the repository and the proposal was approved.  I implemented a storage paradigm of two major hierarchies, where all the PDF files were stored in one folder hierarchy, and all the custom procedures in Word were stored in another.  The files were still organized secondarily by plant, which required plant subfolders in each of the two major hierarchies storing documents by type.  The replication of a plant subfolder structure in each hierarchy was worth the initial setup effort, however, since the result was that the Word files were now isolated from the much larger and denser PDFs. 


Results: This  sped up the search process exponentially, especially when searching for information in the custom work (the Word docs).  Windows search could be targeted to the smaller hierarchy of Word files.  Previously, the search function had to "open" and search inside thousands of PDFs as well as the Word docs, when all the PDF and Word files were stored together in the same plant folder.


Takeaway: When designing your content repository, take into account:


Weigh the pros and cons of a little bit of setup work to implement a new storage paradigm, in exchange for a long-term solution to resolve a problem encountered with the previous paradigm.


All content copyright Laura L. Martin.  All rights reserved.

This content may not be copied or reproduced without prior written permission and credit to Laura L. Martin.