Package org.opencms.search.extractors

Contains a generic, low-level framework for extration of plain text content out of various popular file formats.

See:
          Description

Interface Summary
I_CmsExtractionResult The result of a document text extraction.
I_CmsTextExtractor Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format.
 

Class Summary
A_CmsTextExtractor Base utility class that allows extraction of the indexable "plain" text from a given document format.
A_CmsTextExtractorMsOfficeBase Base class to extract summary information from MS office documents.
CmsExtractionResult The result of a document text extraction.
CmsExtractorHtml Extracts the text form a RTF document.
CmsExtractorMsExcel Extracts the text form an MS Excel document.
CmsExtractorMsPowerPoint Extracts the text form an MS PowerPoint document.
CmsExtractorMsWord Extracts the text form an MS Word document.
CmsExtractorPdf Extracts the text form a PDF document.
CmsExtractorRtf Extracts the text form a RTF document.
 

Package org.opencms.search.extractors Description

Contains a generic, low-level framework for extration of plain text content out of various popular file formats.

Since:
6.0.0
Version:
$Revision: 1.6 $