Traditional text transformation software has a number of limitations:
You may be familiar with the concept of 'pipes' from Unix or DOS (don't worry if you haven't).
c:\> dir | sort | head | more
This concept allows one program to run, and for its output to be fed into the next program as its input. In the example above, we perform a dir, getting a directory listing, which is then fed into sort, to sort the output, which is then fed into head, which gives us the first 20 lines, which is then fed into more, to allow us to scroll through the list.
TextPipe architecture is very similar to this.
Each file is broken up into small pieces, which are progressively fed through a series of filters. Each filter performs a simple piece of text manipulation, like a search and replace, converting to UPPERCASE, removing HTML etc. When the filter is finished with its piece, it feeds it into the next filter, and finally, the output file.
This architecture allows TextPipe to work with files or streams of unlimited size, and perform an unlimited number of operations on each file, without using temporary disk space (except for sorting).
Normally all the input is processed by every filter as it moves sequentially from input to output. TextPipe introduces a new text processing concept called sub filtering (also referred to as filter cascades, or 2-dimensional text processing).
A TextPipe restriction filter (found in the Restrict Menu) allows sub sections of the text to be operated on independently of the main text. Let's look at an example:
Flowchart representation | Looks like this in TextPipe: |
Click here to receive our document TextPipe Restrictions - What are they and how do I use them?.
In the example above, a restriction filter passes lines 1 to 100 of the main text to the sub filter, which can be any combination of filters, such as a Convert to Uppercase filter, a Database filter, a Search and Replace filter, or another restriction. The processed text then rejoins the main text. The net effect is that the filter constrains or limits the effect of the sub filters to just lines 1 to 100, all other lines are unaffected.
Here's another example.
Flowchart representation | Looks like this in TextPipe: |
In the example above, a filter restricts lines from the main text to those matching a pattern. These matching lines get passed to the sub filter, which can be any combination of filters, a VBScript filter, a Search and Replace filter, an Output to File filter etc. The processed text then rejoins the main text. The net effect here is that the restriction filter limits the effect of the sub filters to just lines matching a pattern, all other lines are unaffected.
A Restrict Lines filter can be combined with a Restrict Columns filters to sub filter a rectangular block of text, for example, Searching and Replacing just inside a block of text. We also have filters to restrict CSV fields, restrict Tab-delimited fields, restrict blocks of a specified record length, and restrict matching and non-matching lines. All these restriction filters can be combined together with any number of sub filters.
The Search and Replace filter is a special kind of restriction filter. It allows sub filters to operate on the replacement text, which can be used in a number of ways:
TextPipe provides an array of over 100 inbuilt filters that is unmatched by any other processing tool. We have filters for extracting data from databases, extracting text from web pages and mainframe reports for data mining, and a scripting filter where you can code your own filter in VBScript, JScript or a range of other scripting languages.
TextPipe takes advantage of a variety of other performance enhancements:
With an appropriate market need, TextPipe can support parallel SMP processor arrays performing huge quantities of text processing. One such market need might include search engine document preparation.
If you have any questions, please contact us.