This section describes the full-text index provided with TM4J. The TM4J Full Text Index API is intended to support integration with any full-text indexing implementation. At the present time, the only implementation distributed with TM4J uses Jakarta Lucene as the full-text search engine.
The TM4J Full Text Index API is encapsulated in the interface org.tm4j.topicmap.index.FullTextIndex. This interface extends the org.tm4j.topicmap.index.Index interface with the following method:
public QueryResult findByText(String query, boolean includeURIs) throws IndexException;
The result of the query is returned in a org.tm4j.topicmap.index.QueryResult object which somewhat resembles a Java Array or ArrayList. It´s size() method returns the number of hits in the QueryResult. With its getHit(int index), the result hit can be retrieved. Every hit is represented by a org.tm4j.topicmap.index.QueryHit object which stores the org.tm4j.topicmap.TopicMapObject that contains the hit and the score that this hit got from the search engine.
The FullText index is not part of the BasicIndexProvider, but is instead provided by a separate FullTextIndexProvider which must be registered with the IndexManager before you can use the FullTextIndex instance.
The FullTextIndexProvider class has a constructor which takes a java.utils.Properties instance as a parameter. This Properties instance provides configuration properties for the index which can be used to control how indexing is done and whether the index is a transient, in-memory index or a persistent, file-based index. The indexing method is controlled by specifying the Lucene Analyzer instance to be used. Lucene comes with several different analyzers including analyzers for German and Russian as well as for English. By default, the Lucene StandardAnalyzer is used. For more details on Lucene Analyzers, please refer to the Lucene FAQ on indexing.
This example shows how a FullText index can be created an used:
Example 6.5. Example of using the Lucene FullText Index
TopicMap map = getTopicMap(); // get a TopicMap FullTextIndex index = map.getIndexManager().getIndex(FullTextIndex.class); index.open(); QueryResult result = index.findByText("tm4j", false); // displaying all BaseNames in the result for (int i = 0; i < result.size(); i++) { QueryHit hit = result.getHit(i); System.out.print(hit.getScore()+" "); if(hit.getObject() instanceof BaseName) { BaseName name = (BaseName) hit.getObject(); System.out.println(name.getData()); } }
The full-text index is a new feature in TM4J release 0.9.0 and the current implementation has a number of limitations which you should be aware of if you plan to use this feature.
Known Limitations of the Full-Text Index
Future releases of TM4J will attempt to address these issues.