org.opencms.search.documents
Class A_CmsVfsDocument

java.lang.Object
  extended byorg.opencms.search.documents.A_CmsVfsDocument
All Implemented Interfaces:
I_CmsDocumentFactory, I_CmsSearchExtractor
Direct Known Subclasses:
CmsDocumentGeneric, CmsDocumentHtml, CmsDocumentMsExcel, CmsDocumentMsPowerPoint, CmsDocumentMsWord, CmsDocumentPdf, CmsDocumentPlainText, CmsDocumentRtf, CmsDocumentXmlContent, CmsDocumentXmlPage

public abstract class A_CmsVfsDocument
extends java.lang.Object
implements I_CmsDocumentFactory

Base document factory class for a VFS CmsResource, just requires a specialized implementation of I_CmsSearchExtractor.extractContent(CmsObject, A_CmsIndexResource, String) for text extraction from the binary document content.

Since:
6.0.0
Version:
$Revision: 1.14 $
Author:
Carsten Weinholz, Alexander Kandzior

Field Summary
protected  java.lang.String m_name
          Name of the documenttype.
static java.lang.String VFS_DOCUMENT_KEY_PREFIX
          The vfs prefix for document keys.
 
Fields inherited from interface org.opencms.search.documents.I_CmsDocumentFactory
DOC_CATEGORY, DOC_CONTENT, DOC_DATE_CREATED, DOC_DATE_LASTMODIFIED, DOC_DESCRIPTION, DOC_KEYWORDS, DOC_META, DOC_PATH, DOC_PRIORITY, DOC_ROOT, DOC_TITLE_INDEXED, DOC_TITLE_KEY, DOC_TYPE, SEARCH_PRIORITY_HIGH_VALUE, SEARCH_PRIORITY_LOW_VALUE, SEARCH_PRIORITY_MAX_VALUE, SEARCH_PRIORITY_NORMAL_VALUE
 
Constructor Summary
A_CmsVfsDocument(java.lang.String name)
          Creates a new instance of this lucene document factory.
 
Method Summary
 java.lang.String getDocumentKey(java.lang.String resourceType)
          Returns the document key for the search manager.
 java.util.List getDocumentKeys(java.util.List resourceTypes, java.util.List mimeTypes)
          Returns a list of document keys for the documenttype.
 java.lang.String getName()
          Returns the name of the documenttype.
protected  java.lang.String mergeMetaInfo(I_CmsExtractionResult extractedContent)
          Returns a String created out of the content and the most important meta information in the given extraction result.
 org.apache.lucene.document.Document newInstance(CmsObject cms, A_CmsIndexResource resource, java.lang.String language)
          Generates a new lucene document instance from contents of the given resource.
protected  CmsFile readFile(CmsObject cms, CmsResource resource)
          Upgrades the given resource to a CmsFile with content.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.opencms.search.documents.I_CmsSearchExtractor
extractContent
 

Field Detail

VFS_DOCUMENT_KEY_PREFIX

public static final java.lang.String VFS_DOCUMENT_KEY_PREFIX
The vfs prefix for document keys.

See Also:
Constant Field Values

m_name

protected java.lang.String m_name
Name of the documenttype.

Constructor Detail

A_CmsVfsDocument

public A_CmsVfsDocument(java.lang.String name)
Creates a new instance of this lucene document factory.

Parameters:
name - name of the documenttype
Method Detail

getDocumentKey

public java.lang.String getDocumentKey(java.lang.String resourceType)
                                throws CmsIndexException
Description copied from interface: I_CmsDocumentFactory
Returns the document key for the search manager.

Specified by:
getDocumentKey in interface I_CmsDocumentFactory
Parameters:
resourceType - the resource type to get the document key for
Returns:
the document key for the search manager
Throws:
CmsIndexException
See Also:
I_CmsDocumentFactory.getDocumentKey(java.lang.String)

getDocumentKeys

public java.util.List getDocumentKeys(java.util.List resourceTypes,
                                      java.util.List mimeTypes)
                               throws CmsException
Description copied from interface: I_CmsDocumentFactory
Returns a list of document keys for the documenttype.

The list of accepted resource types may contain a catch-all entry "*"; in this case, a list for all possible resource types is returned, calculated by a logic depending on the document handler class.

Specified by:
getDocumentKeys in interface I_CmsDocumentFactory
Parameters:
resourceTypes - list of accepted resource types
mimeTypes - list of accepted mime types
Returns:
a list of document keys for this document factory
Throws:
CmsException - if something goes wrong
See Also:
I_CmsDocumentFactory.getDocumentKeys(java.util.List, java.util.List)

getName

public java.lang.String getName()
Description copied from interface: I_CmsDocumentFactory
Returns the name of the documenttype.

Specified by:
getName in interface I_CmsDocumentFactory
Returns:
the name of the documenttype
See Also:
I_CmsDocumentFactory.getName()

newInstance

public org.apache.lucene.document.Document newInstance(CmsObject cms,
                                                       A_CmsIndexResource resource,
                                                       java.lang.String language)
                                                throws CmsException
Generates a new lucene document instance from contents of the given resource.

Specified by:
newInstance in interface I_CmsDocumentFactory
Parameters:
cms - the cms object
resource - a cms resource
language - the requested language
Returns:
a lucene document for the given resource
Throws:
CmsException - if something goes wrong
See Also:
I_CmsDocumentFactory.newInstance(org.opencms.file.CmsObject, org.opencms.search.A_CmsIndexResource, java.lang.String)

mergeMetaInfo

protected java.lang.String mergeMetaInfo(I_CmsExtractionResult extractedContent)
Returns a String created out of the content and the most important meta information in the given extraction result.

OpenCms uses it's own properties for the text "Title" etc. field, this method ensures the most important document meta information can still be found as part of the content.

Parameters:
extractedContent - the extraction result to merge
Returns:
a String created out of the most important meta information in the given map and the content

readFile

protected CmsFile readFile(CmsObject cms,
                           CmsResource resource)
                    throws CmsException,
                           CmsIndexException
Upgrades the given resource to a CmsFile with content.

Parameters:
cms - the current users OpenCms context
resource - the resource to upgrade
Returns:
the given resource upgraded to a CmsFile with content
Throws:
CmsException - if the resource could not be read
CmsIndexException - if the resource has no content