Skip to content

[BUG] org.apache.lucene.analysis.TokenStream.end() can slow down elemental #140

@tohidemyname

Description

@tohidemyname

Describe the bug

Lucene fixed a performance bug: https://issues.apache.org/jira/browse/LUCENE-7419

It complains that TokenStream.end() is quite costly. This bug is marked as a blocker bug. According to the bug report, it affects 5.5.5, 6.2, and 7.0. The reporter complains that TokenStream.end() wrongly calls getAttribute().The buggy end() method is as follows:

public void end() throws IOException {
clearAttributes(); // LUCENE-3849: don't consume dirty atts
PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
if (posIncAtt != null) {
posIncAtt.setPositionIncrement(0);
}
}

Elemental uses lucene 4.10.4. I checked the source code of lucene 4.10.4. Its code is identical to the buggy code:

public void end() throws IOException {
clearAttributes(); // LUCENE-3849: don't consume dirty atts
PositionIncrementAttribute posIncAtt = getAttribute(PositionIncrementAttribute.class);
if (posIncAtt != null) {
posIncAtt.setPositionIncrement(0);
}
}

As a result, this bug should also affect 4.10.4.

To Reproduce

In the lucene bug report (LUCENE-7419), Michael McCandless mentioned that this bug was found by elasticsearch:

"This is the apparent source of the very unexpected slowdown here: elastic/elasticsearch#19867 (comment)"

He also explained how to reproduce such a bug.

Elemental calls the buggy method at the following locations:

<--XMLToQuery.phraseQuery
<--XMLToQuery.nearQuery
<--XMLToQuery.getTerm
<--MarkableTokenFilter.incrementToken
<--MarkableTokenFilter.incrementToken
<--RangeIndexWorker.analyzeContent

The lucene bug is fixed in 6.2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions