Lucene Index: Missing documents -
we have pretty basic lucene set up. noticed documents aren't written index.
this how create document:
private void addtodirectory(specialdomainobject specialdomainobject) throws ioexception { document document = new document(); document.add(new textfield("id", string.valueof(specialdomainobject.getid()), field.store.yes)); document.add(new textfield("name", specialdomainobject.getname(), field.store.yes)); document.add(new textfield("tags", jointags(specialdomainobject.gettags()), field.store.yes)); document.add(new textfield("contents", getcontents(specialdomainobject), field.store.yes)); (language language : getallassociatedlanguages(specialdomainobject)) { document.add(new intfield("languageid", language.getid(), field.store.yes)); } specialdomainobjectindexwriter.updatedocument(new term("id", document.getfield("id").stringvalue()), document); specialdomainobjectindexwriter.commit(); }
this how create analyzer , index writer:
<bean id="luceneversion" class="org.apache.lucene.util.version" factory-method="valueof"> <constructor-arg value="lucene_46"/> </bean> <bean id="analyzer" class="org.apache.lucene.analysis.standard.standardanalyzer"> <constructor-arg ref="luceneversion"/> </bean> <bean id="specialdomainobjectindexwriter" class="org.apache.lucene.index.indexwriter"> <constructor-arg ref="specialdomainobjectdirectory" /> <constructor-arg> <bean class="org.apache.lucene.index.indexwriterconfig"> <constructor-arg ref="luceneversion"/> <constructor-arg ref="analyzer" /> <property name="openmode" value="create_or_append"/> </bean> </constructor-arg> </bean>
indexing done scheduled task:
@component public class scheduledspecialdomainobjectindexcreationtask implements scheduledindexcreationtask { private static final logger logger = loggerfactory.getlogger(scheduledspecialdomainobjectindexcreationtask.class); @autowired private indexoperator specialdomainobjectindexoperator; @scheduled(fixeddelay = 3600 * 1000) @override public void createindex() { date indexcreationstartdate = new date(); try { logger.info("updating complete special domain object index..."); specialdomainobjectindexoperator.createindex(); if (logger.isdebugenabled()) { date indexcreationenddate = new date(); logger.debug("index creation duration: {} ms", indexcreationenddate.gettime() - indexcreationstartdate.gettime()); } } catch (ioexception e) { logger.error("could update complete special domain object index.", e); } } }
createindex() implemented follows:
@override public void createindex() throws ioexception { logger.trace("preparing index generation..."); indexwriter indexwriter = getindexwriter(); date start = new date(); logger.trace("deleting documents index..."); indexwriter.deleteall(); logger.trace("starting index generation..."); long numberofprocessedobjects = fillindex(); logger.debug("index written in " + (new date().gettime() - start.gettime()) + " milliseconds."); logger.debug("number of processed objects: {}", numberofprocessedobjects); logger.debug("number of documents in index: {}", indexwriter.numdocs()); indexwriter.commit(); indexwriter.forcemerge(1); } @override protected long fillindex() throws ioexception { page<specialdomainobject> specialdomainobjectspage = specialdomainobjectrepository.findall(new pagerequest(0, maximum_page_elements)); while (true) { addtodirectory(specialdomainobjectspage); if (specialdomainobjectspage.hasnextpage()) { specialdomainobjectspage = specialdomainobjectrepository.findall(new pagerequest(specialdomainobjectspage.getnumber() + 1, specialdomainobjectspage.getsize())); } else { break; } } return specialdomainobjectspage.gettotalelements(); }
there 2000 specialdomainobject instances , 80 aren't written index (we checked luke).
is there cause missing documents?
we found problem: default encoding of operating system not set utf-8.
Comments
Post a Comment