[Lucene 5] - Talk and Analyze about Index with goodtac!

oh my god....keyborad bug.....I can't write in Korean...T.T
1. optimize deprecated!
Good bye...

optimize

@Deprecated
public void optimize(boolean doWait)
              throws CorruptIndexException,
                     IOException

Deprecated.

This method has been deprecated, as it is horribly inefficient and very rarely justified. Lucene's multi-segment search performance has improved over time, and the default TieredMergePolicy now targets segments with deletions.

Throws:: CorruptIndexException; IOException

2. maxDoc() and numDocs() diffrence~!!

int numIndexed = writer.maxDoc();
int numIndexed2 = writer.numDocs();

maxDoc : total cout(deletion count)

public int maxDoc()

Returns total number of docs in this index, including docs not yet flushed (still in the RAM buffer), not counting deletions.

See Also:: numDocs()

numDocs

: include deletion count!, (total cout - deletion count)

public int numDocs()
            throws IOException

Returns total number of docs in this index, including docs not yet flushed (still in the RAM buffer), and including deletions. NOTE: buffered deletions are not counted. If you really need these to be counted you should call commit() first.

Throws:: IOException
See Also:: numDocs()

Delete Document in the index~!
Use..IndexReader Class and The Class don't delete right now! just change status to deletion.
But Do you want to delete..Use that grammer => "reader.close()"

About Field..
Lucene 1.4 : provide Keyword, UnIndexed, Unstored, Text
CHANGE~~
Lucene 3.5 : provide ANALYZED, ANALYZED_NO_NORMS, NO, NOT_ANALYZED, NOT_ANALYZED_NO_NORMS

Enum Constant Summary
`ANALYZED` Index the tokens produced by running the field's value through an Analyzer.
`ANALYZED_NO_NORMS` Expert: Index the tokens produced by running the field's value through an Analyzer, and also separately disable the storing of norms.
`NO` Do not index the field value.
`NOT_ANALYZED` Index the field's value without using an Analyzer, so it can be searched.
`NOT_ANALYZED_NO_NORMS` Expert: Index the field's value without an Analyzer, and also disable the indexing of norms.

Field

public Field(String name,
             String value,
             Field.Store store,
             Field.Index index,
             Field.TermVector termVector)

Create a field by specifying its name, value and how it will be saved in the index.

Parameters:

name - The name of the field

value - The string to process

store - Whether value should be stored in the index

index - Whether the field should be indexed, and if so, if it should be tokenized before indexing

termVector - Whether term vector should be stored

Throws:

NullPointerException - if name or value is null

IllegalArgumentException - in any of the following situations:

the field is neither stored nor indexed
the field is not indexed but termVector is TermVector.YES

AND..FIELD

1) About doc.add(new Field("contents", new FileReader(f)));

/**
   * Create a tokenized and indexed field that is not stored. Term vectors will
   * not be stored. The Reader is read only when the Document is added to the index,
   * i.e. you may not close the Reader until {@link IndexWriter#addDocument(Document)}
   * has been called.
   *
   * @param name The name of the field
   * @param reader The reader with the content
   * @throws NullPointerException if name or reader is <code>null</code>
   */
public Field(String name, Reader reader) {
    this(name, reader, TermVector.NO);
}

============================================================

2) About doc.add(new Field("filename", f.getCanonicalPath(),Field.Store.YES,Field.Index.NO));

/**
   * <p>Adds a field to a document. Several fields may be added with
   * the same name. In this case, if the fields are indexed, their text is
   * treated as though appended for the purposes of search.</p>
   * <p> Note that add like the removeField(s) methods only makes sense
   * prior to adding a document to an index. These methods cannot
   * be used to change the content of an existing index! In order to achieve this,
   * a document has to be deleted from an index and a new changed version of that
   * document has to be added.</p>
   */
public final void add(Fieldable field) {
    fields.add(field);
}

Class Document : http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/index.html
Documents are the unit of indexing and search. A Document is a set of fields. Each field has a name and a textual value. A field may be stored with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it.

sub function

add

public final void add(Fieldable field)

Adds a field to a document. Several fields may be added with the same name. In this case, if the fields are indexed, their text is treated as though appended for the purposes of search.

Note that add like the removeField(s) methods only makes sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

index info

save this filenames that this is .. .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis

저작자표시 비영리 변경금지 (새창열림)

'OpenSource > Lucene' 카테고리의 다른 글

[Lucene 7회 차] about index (2)	2012.10.04
[Lecene 6회차] Welcome to New face & Analyze about Index. (0)	2012.09.20
[Study_4회차(2)] Lucene 셋팅(3.5.0) (0)	2012.09.07
[Study_4회차(1)] Luke 셋팅 (0)	2012.09.06
<안정적인 버전> 루씬과 루크 (0)	2012.09.06

Developer 태하팍

[Lucene 5] - Talk and Analyze about Index with goodtac!

optimize

maxDoc : total cout(deletion count)

numDocs

: include deletion count!, (total cout - deletion count)

Field

add

'OpenSource > Lucene' 카테고리의 다른 글

티스토리툴바

[Lucene 5] - Talk and Analyze about Index with goodtac!

optimize

maxDoc : total cout(deletion count)

numDocs

: include deletion count!, (total cout - deletion count)

Field

add

'OpenSource > Lucene' 카테고리의 다른 글

관련글

티스토리툴바