org.opencms.search
Class CmsSearchManager

java.lang.Object
  extended byorg.opencms.search.CmsSearchManager
All Implemented Interfaces:
I_CmsEventListener, I_CmsScheduledJob

public class CmsSearchManager
extends java.lang.Object
implements I_CmsScheduledJob, I_CmsEventListener

Implements the general management and configuration of the search and indexing facilities in OpenCms.

Since:
6.0.0
Version:
$Revision: 1.55 $
Author:
Carsten Weinholz, Thomas Weckert

Field Summary
static java.lang.String JOB_PARAM_INDEXLIST
          Scheduler parameter: Update only a specified list of indexes.
static java.lang.String JOB_PARAM_WRITELOG
          Scheduler parameter: Write the output of the update to the logfile.
 
Fields inherited from interface org.opencms.main.I_CmsEventListener
EVENT_BEFORE_PUBLISH_PROJECT, EVENT_CLEAR_CACHES, EVENT_CLEAR_OFFLINE_CACHES, EVENT_CLEAR_ONLINE_CACHES, EVENT_CLEAR_PRINCIPAL_CACHES, EVENT_FLEX_CACHE_CLEAR, EVENT_FLEX_PURGE_JSP_REPOSITORY, EVENT_LOGIN_USER, EVENT_PROJECT_MODIFIED, EVENT_PROPERTY_DEFINITION_CREATED, EVENT_PROPERTY_DEFINITION_MODIFIED, EVENT_PROPERTY_MODIFIED, EVENT_PUBLISH_PROJECT, EVENT_RESOURCE_AND_PROPERTIES_MODIFIED, EVENT_RESOURCE_COPIED, EVENT_RESOURCE_CREATED, EVENT_RESOURCE_DELETED, EVENT_RESOURCE_LIST_MODIFIED, EVENT_RESOURCE_MODIFIED, EVENT_RESOURCES_AND_PROPERTIES_MODIFIED, EVENT_RESOURCES_MODIFIED, EVENT_UPDATE_EXPORTS, KEY_DBCONTEXT, KEY_PROJECTID, KEY_PUBLISHID, KEY_PUBLISHLIST, KEY_REPORT, LISTENERS_FOR_ALL_EVENTS
 
Constructor Summary
CmsSearchManager()
          Default constructer when called as cron job.
 
Method Summary
 void addAnalyzer(CmsSearchAnalyzer analyzer)
          Adds an analyzer.
 void addDocumentTypeConfig(CmsSearchDocumentType documentType)
          Adds a document type.
 void addSearchIndex(CmsSearchIndex searchIndex)
          Adds a search index to the configuration.
 void addSearchIndexSource(CmsSearchIndexSource searchIndexSource)
          Adds a search index source configuration.
protected  boolean checkIndexLock(CmsSearchIndex index, I_CmsReport report)
          Checks is a given index is locked, if so waits for a numer of seconds and checks again, until either the index is unlocked or a limit of seconds set by setIndexLockMaxWaitSeconds(int) is reached.
 void cmsEvent(CmsEvent event)
          Implements the event listener of this class.
protected  org.apache.lucene.analysis.Analyzer getAnalyzer(java.lang.String locale)
          Returns an analyzer for the given language.
 java.util.Map getAnalyzers()
          Returns an unmodifiable view (read-only) of the Analyzers Map.
 CmsSearchAnalyzer getCmsSearchAnalyzer(java.lang.String locale)
          Returns the CmsSearchAnalyzer Object.
 java.lang.String getDirectory()
          Returns the name of the directory below WEB-INF/ where the search indexes are stored.
protected  I_CmsDocumentFactory getDocumentFactory(A_CmsIndexResource resource)
          Returns a lucene document factory for given resource.
 CmsSearchDocumentType getDocumentTypeConfig(java.lang.String name)
          Returns a document type config.
 java.util.Map getDocumentTypeConfigs()
          Returns an unmodifiable view (read-only) of the DocumentTypeConfigs Map.
protected  java.util.List getDocumentTypes()
          Returns the set of names of all configured documenttypes.
 I_CmsTermHighlighter getHighlighter()
          Returns the highlighter.
 CmsSearchIndex getIndex(java.lang.String indexName)
          Returns the index belonging to the passed name.
 int getIndexLockMaxWaitSeconds()
          Returns the seconds to wait for an index lock during an update operation.
 java.util.List getIndexNames()
          Returns the names of all configured indexes.
 CmsSearchIndexSource getIndexSource(java.lang.String sourceName)
          Returns a search index source for a specified source name.
 int getMaxExcerptLength()
          Returns the max. excerpt length.
protected  java.util.Map getResultCache()
          Returns the common cache for buffering search results.
 java.lang.String getResultCacheSize()
          Returns the result cache size.
 java.util.List getSearchIndexes()
          Returns an unmodifiable list of all configured CmsSearchIndex instances.
 java.util.Map getSearchIndexSources()
          Returns an unmodifiable view (read-only) of the SearchIndexSources Map.
 java.lang.String getTimeout()
          Returns the timeout to abandon threads indexing a resource.
protected  void initAvailableDocumentTypes()
          Initializes the available Cms resource types to be indexed.
 void initialize(CmsObject cms)
          Initializes the search manager.
 void initializeIndexes()
          Initializes all configured document types and search indexes.
protected  void initSearchIndexes()
          Initializes the configured search indexes.
 java.lang.String launch(CmsObject cms, java.util.Map parameters)
          Updates the indexes from as a scheduled job.
 void rebuildAllIndexes(I_CmsReport report)
          Rebuilds (if required creates) all configured indexes.
 void rebuildAllIndexes(I_CmsReport report, boolean wait)
          Rebuilds (if required creates) all configured indexes.
 void rebuildIndex(java.lang.String indexName, I_CmsReport report)
          Rebuilds (if required creates) the index with the given name.
 void rebuildIndexes(java.util.List indexNames, I_CmsReport report)
          Rebuilds (if required creates) the List of indexes with the given name.
 void removeSearchIndex(CmsSearchIndex searchIndex)
          Removes a search index from the configuration.
 void removeSearchIndexes(java.util.List indexNames)
          Removes all indexes included in the given list (which must contain the name of an index to remove).
 boolean removeSearchIndexSource(CmsSearchIndexSource indexsource)
          Removes this indexsource from the OpenCms configuration (if it is not used any more).
 void setDirectory(java.lang.String value)
          Sets the name of the directory below WEB-INF/ where the search indexes are stored.
 void setHighlighter(java.lang.String highlighter)
          Sets the highlighter.
 void setIndexLockMaxWaitSeconds(int value)
          Sets the seconds to wait for an index lock during an update operation.
 void setMaxExcerptLength(java.lang.String maxExcerptLength)
          Sets the max. excerpt length.
 void setResultCacheSize(java.lang.String value)
          Sets the result cache size.
 void setTimeout(java.lang.String value)
          Sets the timeout to abandon threads indexing a resource.
protected  void updateAllIndexes(CmsObject adminCms, CmsUUID publishHistoryId, I_CmsReport report)
          Incrementally updates all indexes that have their rebuild mode set to "auto" after resources have been published.
protected  void updateIndex(CmsSearchIndex index, I_CmsReport report, boolean wait, java.util.List resourcesToIndex, java.util.Map documentCache)
          Updates (if required creates) the index with the given name.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

JOB_PARAM_INDEXLIST

public static final java.lang.String JOB_PARAM_INDEXLIST
Scheduler parameter: Update only a specified list of indexes.

See Also:
Constant Field Values

JOB_PARAM_WRITELOG

public static final java.lang.String JOB_PARAM_WRITELOG
Scheduler parameter: Write the output of the update to the logfile.

See Also:
Constant Field Values
Constructor Detail

CmsSearchManager

public CmsSearchManager()
Default constructer when called as cron job.

Method Detail

addAnalyzer

public void addAnalyzer(CmsSearchAnalyzer analyzer)
Adds an analyzer.

Parameters:
analyzer - an analyzer

addDocumentTypeConfig

public void addDocumentTypeConfig(CmsSearchDocumentType documentType)
Adds a document type.

Parameters:
documentType - a document type

addSearchIndex

public void addSearchIndex(CmsSearchIndex searchIndex)
Adds a search index to the configuration.

Parameters:
searchIndex - the search index to add

addSearchIndexSource

public void addSearchIndexSource(CmsSearchIndexSource searchIndexSource)
Adds a search index source configuration.

Parameters:
searchIndexSource - a search index source configuration

cmsEvent

public void cmsEvent(CmsEvent event)
Implements the event listener of this class.

Specified by:
cmsEvent in interface I_CmsEventListener
Parameters:
event - CmsEvent that has occurred
See Also:
I_CmsEventListener.cmsEvent(org.opencms.main.CmsEvent)

getAnalyzers

public java.util.Map getAnalyzers()
Returns an unmodifiable view (read-only) of the Analyzers Map.

Returns:
an unmodifiable view (read-only) of the Analyzers Map

getCmsSearchAnalyzer

public CmsSearchAnalyzer getCmsSearchAnalyzer(java.lang.String locale)
Returns the CmsSearchAnalyzer Object.

Parameters:
locale - unique locale key to specify the CmsSearchAnalyzer in HashMap
Returns:
the CmsSearchAnalyzer Object

getDirectory

public java.lang.String getDirectory()
Returns the name of the directory below WEB-INF/ where the search indexes are stored.

Returns:
the name of the directory below WEB-INF/ where the search indexes are stored

getDocumentTypeConfig

public CmsSearchDocumentType getDocumentTypeConfig(java.lang.String name)
Returns a document type config.

Parameters:
name - the name of the document type config
Returns:
the document type config.

getDocumentTypeConfigs

public java.util.Map getDocumentTypeConfigs()
Returns an unmodifiable view (read-only) of the DocumentTypeConfigs Map.

Returns:
an unmodifiable view (read-only) of the DocumentTypeConfigs Map

getHighlighter

public I_CmsTermHighlighter getHighlighter()
Returns the highlighter.

Returns:
the highlighter

getIndex

public CmsSearchIndex getIndex(java.lang.String indexName)
Returns the index belonging to the passed name.

The index must exist already.

Parameters:
indexName - then name of the index
Returns:
an object representing the desired index

getIndexLockMaxWaitSeconds

public int getIndexLockMaxWaitSeconds()
Returns the seconds to wait for an index lock during an update operation.

Returns:
the seconds to wait for an index lock during an update operation

getIndexNames

public java.util.List getIndexNames()
Returns the names of all configured indexes.

Returns:
list of names

getIndexSource

public CmsSearchIndexSource getIndexSource(java.lang.String sourceName)
Returns a search index source for a specified source name.

Parameters:
sourceName - the name of the index source
Returns:
a search index source

getMaxExcerptLength

public int getMaxExcerptLength()
Returns the max. excerpt length.

Returns:
the max excerpt length

getResultCacheSize

public java.lang.String getResultCacheSize()
Returns the result cache size.

Returns:
the result cache size

getSearchIndexes

public java.util.List getSearchIndexes()
Returns an unmodifiable list of all configured CmsSearchIndex instances.

Returns:
an unmodifiable list of all configured CmsSearchIndex instances

getSearchIndexSources

public java.util.Map getSearchIndexSources()
Returns an unmodifiable view (read-only) of the SearchIndexSources Map.

Returns:
an unmodifiable view (read-only) of the SearchIndexSources Map

getTimeout

public java.lang.String getTimeout()
Returns the timeout to abandon threads indexing a resource.

Returns:
the timeout to abandon threads indexing a resource

initialize

public void initialize(CmsObject cms)
                throws CmsRoleViolationException
Initializes the search manager.

Parameters:
cms - the cms object
Throws:
CmsRoleViolationException - in case the given opencms object does not have CmsRole.SEARCH_MANAGER permissions

initializeIndexes

public void initializeIndexes()
Initializes all configured document types and search indexes.

This methods needs to be called if after a change in the index configuration has been made.


launch

public final java.lang.String launch(CmsObject cms,
                                     java.util.Map parameters)
                              throws java.lang.Exception
Updates the indexes from as a scheduled job.

Specified by:
launch in interface I_CmsScheduledJob
Parameters:
cms - the OpenCms user context to use when reading resources from the VFS
parameters - the parameters for the scheduled job
Returns:
the String to write in the scheduler log
Throws:
java.lang.Exception - if something goes wrong
See Also:
I_CmsScheduledJob.launch(org.opencms.file.CmsObject, java.util.Map)

rebuildAllIndexes

public void rebuildAllIndexes(I_CmsReport report)
                       throws CmsException
Rebuilds (if required creates) all configured indexes.

Parameters:
report - the report object to write messages (or null)
Throws:
CmsException - if something goes wrong

rebuildAllIndexes

public void rebuildAllIndexes(I_CmsReport report,
                              boolean wait)
                       throws CmsException
Rebuilds (if required creates) all configured indexes.

Parameters:
report - the report object to write messages (or null)
wait - signals to wait until all the indexing threads are finished
Throws:
CmsException - if something goes wrong

rebuildIndex

public void rebuildIndex(java.lang.String indexName,
                         I_CmsReport report)
                  throws CmsException
Rebuilds (if required creates) the index with the given name.

Parameters:
indexName - the name of the index to rebuild
report - the report object to write messages (or null)
Throws:
CmsException - if something goes wrong

rebuildIndexes

public void rebuildIndexes(java.util.List indexNames,
                           I_CmsReport report)
                    throws CmsException
Rebuilds (if required creates) the List of indexes with the given name.

Parameters:
indexNames - the names (String) of the index to rebuild
report - the report object to write messages (or null)
Throws:
CmsException - if something goes wrong

removeSearchIndex

public void removeSearchIndex(CmsSearchIndex searchIndex)
Removes a search index from the configuration.

Parameters:
searchIndex - the search index to remove

removeSearchIndexes

public void removeSearchIndexes(java.util.List indexNames)
Removes all indexes included in the given list (which must contain the name of an index to remove).

Parameters:
indexNames - the names of the index to remove

removeSearchIndexSource

public boolean removeSearchIndexSource(CmsSearchIndexSource indexsource)
                                throws CmsIllegalStateException
Removes this indexsource from the OpenCms configuration (if it is not used any more).

Parameters:
indexsource - the indexsource to remove from the configuration
Returns:
true if remove was successful, false if preconditions for removal are ok but the given searchindex was unknown to the manager.
Throws:
CmsIllegalStateException - if the given indexsource is still used by at least one CmsSearchIndex.

setDirectory

public void setDirectory(java.lang.String value)
Sets the name of the directory below WEB-INF/ where the search indexes are stored.

Parameters:
value - the name of the directory below WEB-INF/ where the search indexes are stored

setHighlighter

public void setHighlighter(java.lang.String highlighter)
Sets the highlighter.

A highlighter is a class implementing org.opencms.search.documents.I_TermHighlighter.

Parameters:
highlighter - the package/class name of the highlighter

setIndexLockMaxWaitSeconds

public void setIndexLockMaxWaitSeconds(int value)
Sets the seconds to wait for an index lock during an update operation.

Parameters:
value - the seconds to wait for an index lock during an update operation

setMaxExcerptLength

public void setMaxExcerptLength(java.lang.String maxExcerptLength)
Sets the max. excerpt length.

Parameters:
maxExcerptLength - the max. excerpt length to set

setResultCacheSize

public void setResultCacheSize(java.lang.String value)
Sets the result cache size.

Parameters:
value - the result cache size

setTimeout

public void setTimeout(java.lang.String value)
Sets the timeout to abandon threads indexing a resource.

Parameters:
value - the timeout in milliseconds

checkIndexLock

protected boolean checkIndexLock(CmsSearchIndex index,
                                 I_CmsReport report)
Checks is a given index is locked, if so waits for a numer of seconds and checks again, until either the index is unlocked or a limit of seconds set by setIndexLockMaxWaitSeconds(int) is reached.

Parameters:
index - the index to check the lock for
report - the report to write error messages on
Returns:
true if the index is locked

getAnalyzer

protected org.apache.lucene.analysis.Analyzer getAnalyzer(java.lang.String locale)
                                                   throws CmsIndexException
Returns an analyzer for the given language.

The analyzer is selected according to the analyzer configuration.

Parameters:
locale - a language id, i.e. de, en, it
Returns:
the appropriate lucene analyzer
Throws:
CmsIndexException - if something goes wrong

getDocumentFactory

protected I_CmsDocumentFactory getDocumentFactory(A_CmsIndexResource resource)
Returns a lucene document factory for given resource.

The type of the document factory is selected by the type of the resource and the mimetype of the resource content according to the documenttype configuration.

Parameters:
resource - a cms resource
Returns:
a lucene document factory or null

getDocumentTypes

protected java.util.List getDocumentTypes()
Returns the set of names of all configured documenttypes.

Returns:
the set of names of all configured documenttypes

getResultCache

protected java.util.Map getResultCache()
Returns the common cache for buffering search results.

Returns:
the cache

initAvailableDocumentTypes

protected void initAvailableDocumentTypes()
Initializes the available Cms resource types to be indexed.

A map stores document factories keyed by a string representing a colon separated list of Cms resource types and/or mimetypes.

The keys of this map are used to trigger a document factory to convert a Cms resource into a Lucene index document.

A document factory is a class implementing the interface I_CmsDocumentFactory.


initSearchIndexes

protected void initSearchIndexes()
Initializes the configured search indexes.

This initializes also the list of Cms resources types to be indexed by an index source.


updateAllIndexes

protected void updateAllIndexes(CmsObject adminCms,
                                CmsUUID publishHistoryId,
                                I_CmsReport report)
Incrementally updates all indexes that have their rebuild mode set to "auto" after resources have been published.

Parameters:
adminCms - an OpenCms user context with Admin permissions
publishHistoryId - the history ID of the published project
report - the report to write the output to

updateIndex

protected void updateIndex(CmsSearchIndex index,
                           I_CmsReport report,
                           boolean wait,
                           java.util.List resourcesToIndex,
                           java.util.Map documentCache)
                    throws CmsException
Updates (if required creates) the index with the given name.

If the optional List of CmsPublishedResource instances is provided, the index will be incrementally updated for these resources only. If this List is null or empty, the index will be fully rebuild.

Parameters:
index - the index to update or rebuild
report - the report to write output messages to
wait - signals to wait until all the indexing threads are finished
resourcesToIndex - an (optional) list of CmsPublishedResource objects to update in the index
documentCache - a cache for the created search documents, to avoid multiple text extraction
Throws:
CmsException - if something goes wrong