본문 바로가기
OpenSource/Lucene

[Lucene 5] - Talk and Analyze about Index with goodtac!

by 태하팍 2012. 9. 13.
반응형

oh my god....keyborad bug.....I can't write in Korean...T.T
1. optimize deprecated!
Good bye...

optimize

@Deprecated
public void optimize(boolean doWait)
              throws CorruptIndexException,
                     IOException
Deprecated. 

This method has been deprecated, as it is horribly inefficient and very rarely justified. Lucene's multi-segment search performance has improved over time, and the default TieredMergePolicy now targets segments with deletions.

Throws:
CorruptIndexException
IOException

2. maxDoc() and numDocs() diffrence~!!

    int numIndexed = writer.maxDoc();   
    int numIndexed2 = writer.numDocs();


maxDoc : total cout(deletion count)

public int maxDoc()
Returns total number of docs in this index, including docs not yet flushed (still in the RAM buffer), not counting deletions.

See Also:
numDocs()

numDocs

: include deletion count!, (total cout - deletion count)

public int numDocs()
            throws IOException
Returns total number of docs in this index, including docs not yet flushed (still in the RAM buffer), and including deletions. NOTE: buffered deletions are not counted. If you really need these to be counted you should call commit() first.

Throws:
IOException
See Also:
numDocs()


Delete Document in the index~!
Use..IndexReader Class and The Class don't delete right now! just change status to deletion.
But Do you want to delete..Use that grammer => "reader.close()"


About Field..
Lucene 1.4 : provide Keyword, UnIndexed, Unstored, Text
CHANGE~~
Lucene 3.5 : provide ANALYZED, ANALYZED_NO_NORMS, NO, NOT_ANALYZED, NOT_ANALYZED_NO_NORMS

Enum Constant Summary
ANALYZED
          Index the tokens produced by running the field's value through an Analyzer.
ANALYZED_NO_NORMS
          Expert: Index the tokens produced by running the field's value through an Analyzer, and also separately disable the storing of norms.
NO
          Do not index the field value.
NOT_ANALYZED
          Index the field's value without using an Analyzer, so it can be searched.
NOT_ANALYZED_NO_NORMS
          Expert: Index the field's value without an Analyzer, and also disable the indexing of norms.


Field

public Field(String name,
             String value,
             Field.Store store,
             Field.Index index,
             Field.TermVector termVector)
Create a field by specifying its name, value and how it will be saved in the index.

Parameters:
name - The name of the field
value - The string to process
store - Whether value should be stored in the index
index - Whether the field should be indexed, and if so, if it should be tokenized before indexing
termVector - Whether term vector should be stored
Throws:
NullPointerException - if name or value is null
IllegalArgumentException - in any of the following situations:
  • the field is neither stored nor indexed
  • the field is not indexed but termVector is TermVector.YES


AND..FIELD

1) About doc.add(new Field("contents", new FileReader(f)));

  /**
   * Create a tokenized and indexed field that is not stored. Term vectors will
   * not be stored.  The Reader is read only when the Document is added to the index,
   * i.e. you may not close the Reader until {@link IndexWriter#addDocument(Document)}
   * has been called.
   *
   * @param name The name of the field
   * @param reader The reader with the content
   * @throws NullPointerException if name or reader is <code>null</code>
   */
  public Field(String name, Reader reader) {
    this(name, reader, TermVector.NO);
  }

============================================================

2) About doc.add(new Field("filename", f.getCanonicalPath(),Field.Store.YES,Field.Index.NO));

 /**
   * <p>Adds a field to a document.  Several fields may be added with
   * the same name.  In this case, if the fields are indexed, their text is
   * treated as though appended for the purposes of search.</p>
   * <p> Note that add like the removeField(s) methods only makes sense
   * prior to adding a document to an index. These methods cannot
   * be used to change the content of an existing index! In order to achieve this,
   * a document has to be deleted from an index and a new changed version of that
   * document has to be added.</p>
   */
  public final void add(Fieldable field) {
    fields.add(field);
  }

Class Document  : http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/index.html
Documents are the unit of indexing and search. A Document is a set of fields. Each field has a name and a textual value. A field may be stored with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it.

sub function

add

public final void add(Fieldable field)

Adds a field to a document. Several fields may be added with the same name. In this case, if the fields are indexed, their text is treated as though appended for the purposes of search.

Note that add like the removeField(s) methods only makes sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

index info



save this filenames that this is .. .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis

반응형