Architecture of FAST ESP: An Introduction

I have been working on FAST ESP since 16 months now and thought it may be a good time to share some of the scenarions that I have come across. Before going to the depths, I would begin with the basics so that even a relatively inexperienced user too gets what’s being discussed.

What is FAST ESP?

  • FAST ESP (Enterprise Search Platform) is an integrated software application that provides a platform for searching and filtering services.
  • It’s a distributed system that enables information retrieval from any type of information. ESP combines real-time searching, advanced linguistics, and a variety of content access options into a modular, scalable product suite.

Below is the architecture of a Single Node Installation:

Below is a brief description of the modules. In subsequent posts, I would try to elaborate more on the modules

Content Feeding

  • Crawler : To crawl Web Pages
  • File Traverser : Traverses and retrieves files from directories on file servers.
  • JDBC Connector : To feed content from Databases like Oracle, MySQL, MSSQL, DB2 and  many more.
  • SharePoint Connector : To feed content from the sharepoint dashboard Environment
  • Other Sources include Exchange Connector, Lotus Notes Connector, Documentum Connector, etc
  • Apart from the sources listed above, we also have the content APIs which allow custom applications, to push content to the FAST Content Distributor. The API is available in Java, C++, and .NET

Document Processing:

Performs document processing tasks for format conversion and document relevancy such as language detection Asian language tokenization, and emmatization.

Indexing:

In this stage, content that has been processed would be stored as stored as binary index. Also it should be noted that fixml would also be created at this point which acts as a back-up for the binary index. fixml would also help us find out how the documents have been indexed i.e. what fields have been indexed and whether linguistic rules had been applied while indexing them or not and so on. Should there be any inconsistency within the binary index, it can be rebuilt with the help of the fixml without having to re-index the data.

FAST Search Engine:

Performs the indexing and searching tasks within FAST ESP. It indexes new FAST Search Engine Matching and Query/Result Processing documents coming from the FAST Document Processing Engine, matches them against search queries submitted by the Query Result Server, and returns a list of resulting documents and result set navigation options to the Query and Result Server.

QRServer:

Processes search queries and search results to enable relevancy-focused searching and result presentation. It provides linguistic query processing features like spell checking, and results processing features like result clustering.

Search Client:

This could either be the Search Front End provided along with FAST ESP or a custom Search API.

I felt that it is important to cover this product because it is believed that this product would be eventually merged with SharePoint going forward. When the capabilities of FAST are going to be integrated with SharePoint, SharePoint would benefit immensely. While a lot can be found about SharePoint over the internet, not much can be found about FAST. Going forward, I would try to cover specific issues along with the modules so that this could act as a reference should you have some doubts. I would also try to cover the troubleshooting approaches to be taken for some of the problems and work arounds available for the same.

If you have any questions, feel free to initiate a discussion in the form of comments and I would try to answer you whenever possible.

Comments are closed.