|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.opencms.search.extractors.A_CmsTextExtractor
org.opencms.search.extractors.A_CmsTextExtractorMsOfficeBase
Base class to extract summary information from MS office documents.
Field Summary | |
protected static java.lang.String |
ENCODING_CP1252
Windows Cp1252 endocing (western europe) is used as default for single byte fields. |
protected static java.lang.String |
ENCODING_UTF16
UTF-16 encoding is used for double byte fields. |
protected static java.lang.String |
POWERPOINT_EVENT_NAME
Event event name for a MS PowerPoint document. |
protected static int |
PPT_TEXTBYTE_ATOM
PPT text byte atom. |
protected static int |
PPT_TEXTCHAR_ATOM
PPT text char atom. |
Fields inherited from class org.opencms.search.extractors.A_CmsTextExtractor |
m_inputBuffer |
Constructor Summary | |
A_CmsTextExtractorMsOfficeBase()
|
Method Summary | |
protected void |
cleanup()
Cleans up some internal memory. |
protected I_CmsExtractionResult |
createExtractionResult(java.lang.String rawContent)
Creates the extraction result for this MS Office document. |
void |
processPOIFSReaderEvent(org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent event)
|
Methods inherited from class org.opencms.search.extractors.A_CmsTextExtractor |
combineContentItem, extractText, extractText, extractText, extractText, getStreamCopy, removeControlChars |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected static final java.lang.String ENCODING_CP1252
protected static final java.lang.String ENCODING_UTF16
protected static final java.lang.String POWERPOINT_EVENT_NAME
protected static final int PPT_TEXTBYTE_ATOM
protected static final int PPT_TEXTCHAR_ATOM
Constructor Detail |
public A_CmsTextExtractorMsOfficeBase()
Method Detail |
public void processPOIFSReaderEvent(org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent event)
processPOIFSReaderEvent
in interface org.apache.poi.poifs.eventfilesystem.POIFSReaderListener
POIFSReaderListener.processPOIFSReaderEvent(org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent)
protected void cleanup()
protected I_CmsExtractionResult createExtractionResult(java.lang.String rawContent)
The extraction result contains the raw content, plus additional meta information as content items read from the MS Office document properties.
rawContent
- the raw content extracted from the document
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |