Fixml defragmentation in FAST ESP: All you need to know

Fixml compaction occurs by default as part of the nightly cleanup routines depending on the configuration set in rtsearchrc.xml. Nightly cleanup (between 3-6 am by default) runs compaction daily which physically removes all invalidated documents from Elem files. When cleanup occurs on the fixml, there a few operations that are performed:

  • Compaction
  • Compression
  • Defragmentation (potentially)

A batch of documents submitted to the content distributor is split up more or less evenly between columns by the indexing dispatcher. These partial batches are written to data_fixml in a single Elem file, regardless of the numberDocsPerFixml setting in rtsearchrc.xml. As deletes or updates come in, particular documents in this Elem file are “invalidated”, rendering them unindexable, but they are still in the file on disk.

Compaction is something that is scheduled and will clean invalid fixml documents but shouldn’t interrupt feeding and indexing.

When defragmentation is triggered (when the threshold is reached) the feeding is interrupted while the defragmentation is executed. Therefore during the cleanup interval all incoming operations will be queued. This uses a lot of memory depending on the queue size. Stopping feeding during the clean-up time would be a best practice to reduce the memory usage.

The following detail parameters from rtsearchrc.xml related to the fixmml cleanup process:

cleanupTimes: A string holding a set of time intervals in which it is allowed to perform cleanup. The string uses a 4-column UNIX crontable-like format with hour precision. There are up to 4 time elements in the specification string. Where missing entries are given the value ”. The time and date fields are hour (0-23), day of month (1-31), month (1-12), day of week (0-7, where 0 or 7 is Sun). The string “3-4,12-14” will allow cleanup to be performed between hours 3 and 4, and 12 and 14, system local time.

compactFixml: (Optional Parameter)

Instead of defragmenting the fixml, the default way to free up invalid fixml documents in ESP is to simply do an in-place compaction of all the fixml files, removing any invalid documents. When compaction is enabled, this process is performed during cleanup times.

compressFixml: (Optional Parameter)

Compress the index source data (fixml data) on disk. Can be quite CPU intensive for larger files but saves diskspace. Note that this is enabled by default and will reduce the performance of document management unit of the Indexer.

maxFixmlFiles:

Specify an optional threshold for the number of fixml files in the data_fixml folder (sum across all sets). During the cleanup cycle, fixml defragmentation will be triggered if this threshold is reached or exceeded. Note that this is not a hard limit, the indexer will continue to accept new content and push the number of fixml files. A warning will be issues when the number of set folders reaches 90% of the maximum limit. If set to 0, the check will be disabled.

maxSetDirs:

Specify an optional threshold for the number of Set_XXXXX directories in data_fixml. During the cleanup cycle, fixml defragmentation will be triggered if this threshold is reached or exceeded. Note that this is not a hard limit, the indexer will continue to accept new content and push the number of set folders beyond this limit. A warning will be issues when the number of set folders reaches 90% of the maximum limit. If set to 0, the check is disabled.

performCleanup: (True/false)

To control the automated cleanup process. If set to “false” manual cleanup is necessary control cleanup.

A large FIXML file/folder structure will increase the amount of time it required to resetindex because all fixml files will need to be read before the index reset a start.

 

Should you have any queries, feel free to comment below

The DNetWorks Team

Comments are closed.