Using our TextPipe, WordPipe, ExcelPipe and PowerPointPipe tools on a large set of Word documents, Excel spreadsheets and PowerPoint slides (100,000+) requires deliberate planning and speed considerations.

In general, we advise for after-hours document mitigation, generally on Friday after close-of-business. This ensures the minimum number of documents are locked for editing.

Serial processing on File Server / Parallel processing on Workstations (2+)

Our general advice is that, if possible, Microsoft Office and WordPipe, ExcelPipe and PowerPointPipe are installed directly on the File Server. This eliminates network latency overheads, but is not normally possible.

Alternatively, multiple workstations should be located in close proximity to the file server (to reduce network latency). A file list should be generated of all Microsoft Word, Excel and PowerPoint documents, and this list should be divided between each of the workstations. e.g. with 100,000 documents and 4 workstations, use TextPipe Standard Download (using Filters\Special\Split files), and give each product ¼ of the complete file list, or 25,000 documents.


While automated, our tools do occasionally run into memory leaks with Microsoft products that require Microsoft Word, Excel or PowerPoint to be restarted. The Full version of our tools do this automatically every 1000 documents (customizable). Even with this automated monitoring, some 'baby-sitting' of the process is required to detect unforeseeable or unusual problems with Microsoft Word, Excel or PowerPoint, such as corrupted documents that cause Microsoft to crash.


In the case of a restart being required, the Files to Process tab of WordPipe, ExcelPipe and PowerPointPipe provides a Restart at file facility to skip files already processed. The Log screen shows the last processed file.

Archive and Retention Policy

WordPipe, ExcelPipe and PowerPointPipe all provide (on the Options tab) the facility to preserve the File Modification date. If your needs require, we can also enhance these products to preserve the Last Access Date. This means that old files that have not been accessed in a long time will not be 'awoken' and retained due to access by our tools.