❶ solr 不設置commit within為什麼建不了索引
三種solr提交索引的方式
1. commit
通過api直接commit,這樣性能比較差,在我測試下,平均每條commit600ms
HttpSolrServer solrServer = new HttpSolrServer("h//localhost:8080/solr/dtrace");
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField("id", i);
solrServer.add(doc1);
solrServer.commit();
2. AutoCommit
❷ 如何將solrconfig.xml中row行設置最大
本節詳細講解solrconfig.xml
1.如果配置文件配置錯誤,是否提示。true要報錯,false不報錯。
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
2.solr版本。
<信培luceneMatchVersion>LUCENE_31</luceneMatchVersion>
3. 索引文件目錄,建索引的目錄和查詢的目錄都是它。
<dataDir>${solr.data.dir:./solr/db/data}</dataDir>
4.一些基礎配置
4.1多少個document進行合並
<mergeFactor>10</mergeFactor>
<!-- Sets the amount of RAM that may be used by Lucene indexing
for buffering added documents and deletions before they are
flushed to the Directory. -->
4.2 緩存大小
<ramBufferSizeMB>32</ramBufferSizeMB>
多少個文檔自動合並
<mergeFactor>10</mergeFactor>
(回去了,下次再更新。。)
(接著上次的更新)
4.3.
設置域的最大長度
<maxFieldLength>10000</maxFieldLength>
設置寫鎖的延遲時間
<writeLockTimeout>1000</writeLockTimeout>
設置提交鎖的延遲
<commitLockTimeout>10000</commitLockTimeout>
4.4
直接更新的方法:滑賀唯即調用solr默認的url訪問。
<updateHandler class="solr.DirectUpdateHandler2">
自動提交的最大文檔數,最大時間
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>1000</maxTime>
</autoCommit>拍跡
4.5包含所有查詢的參數設置 <query>
設置lru緩存
<filterCache class="solr.FastLRUCache"
size="16384"
initialSize="4096"
autowarmCount="4096"/>
設置查詢結果緩存
<queryResultCache class="solr.LRUCache"
size="16384"
initialSize="4096"
autowarmCount="1024"/>
設置文檔緩存
<documentCache class="solr.LRUCache"
size="16384"
initialSize="16384"/>
是否延遲載入索引域
<enableLazyFieldLoading>true</enableLazyFieldLoading>
設置查詢的最大doc數
<queryResultMaxDocsCached>500</queryResultMaxDocsCached>
這個參數暫時未用
<maxWarmingSearchers>2</maxWarmingSearchers>
假如用dataimport這solr自帶的導入數據命令時,的參數,即與資料庫對應的文件的位置
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">C:\solr-tomcat\solr\db\conf\db-data-config.xml</str>
</lst>
</requestHandler>
這個標簽是用來控制主索引伺服器,與從索引伺服器分發索引快照的所有屬性的
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
<str name="commitReserveDuration">00:00:60</str>
<str name="httpBasicAuthUser">345</str>
<str name="httpBasicAuthPassword">345</str>
</lst>
</requestHandler>
這個標簽和他的名字是一樣的,表示用於集群的組件所有參數
<searchComponent name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent" >
<!-- Declare an engine -->
<lst name="engine">
<!-- The name, only one can be named "default" -->
<str name="name">default</str>
<!-- Class name of Carrot2 clustering algorithm.
Currently available algorithms are:
* org.carrot2.clustering.lingo.LingoClusteringAlgorithm
* org.carrot2.clustering.stc.STCClusteringAlgorithm
See http://project.carrot2.org/algorithms.html for the
algorithm's characteristics.
-->
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<!-- Overriding values for Carrot2 default algorithm attributes.
For a description of all available attributes, see:
http://download.carrot2.org/stable/manual/#chapter.components.
Use attribute key as name attribute of str elements
below. These can be further overridden for indivial
requests by specifying attribute key as request parameter
name and attribute value as parameter value.
-->
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
<!-- The language to assume for the documents.
For a list of allowed values, see:
http://download.carrot2.org/stable/manual/#section.attribute.lingo.MultilingualClustering.defaultLanguage
-->
<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
當發生集群命令時,對應的相應參數。表示是否開啟集群等。
<requestHandler name="/clustering"
startup="lazy"
enable="${solr.clustering.enabled:false}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<!-- The title field -->
<str name="carrot.title">name</str>
<str name="carrot.url">id</str>
<!-- The field to cluster on -->
<str name="carrot.snippet">features</str>
<!-- proce summaries -->
<bool name="carrot.proceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- proce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
默認查詢條件
<admin>
<defaultQuery>*:*</defaultQuery>
<!-- configure a healthcheck file for servers behind a
loadbalancer
-->
<!--
<healthcheck type="file">server-enabled</healthcheck>
-->
</admin>
❸ solr的索引數據可以存放到資料庫嗎
在solr與tomcat整合文章中,我用的索引庫是,現在就以這個為例。
首先要准備jar包:solr-dataimporthandler-4.8.1.jar、solr-dataimporthandler-extras-4.8.1.jar和mysql-connector-java-5.0.7-bin.jar這三個包到solr的tomcat的webapps\solr\WEB-INF\lib下
在這個文件夾的conf下配置兩個文件,添加一個文件。先配置solrconfig.xml。
在該文件下添加一個新節點。
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
在solrconfig.xml的同目錄下創建data-config.xml。
配置:
復制代碼
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/courseman"
user="root"
password="mysql" />
<document>
<entity name="student"
query="SELECT * FROM student">
<field column="id" name="id" />
<field column="name" name="name" />
<field column="gender" name="gender" />
<field column="major" name="major" />
<field column="grade" name="grade" />
</entity>
</document>
</dataConfig>
復制代碼
schemal.xml的配置
復制代碼
<?xml version="1.0" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding right ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a of the License at
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<schema name="example core one" version="1.1">
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<!-- general -->
<field name="id" type="int" indexed="true" stored="true" />
<field name="gender" type="string" indexed="true" stored="true" />
<field name="name" type="string" indexed="true" stored="true" />
<field name="major" type="string" indexed="true" stored="true" />
<field name="grade" type="string" indexed="true" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>name</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
</schema>
復制代碼
默認的文件不是這樣的,稍微改動了一下。
field 的type類型是根據fieldtype 的name定義的。class是solr自定義的不能更改。
shcema.xml文件的field欄位的屬性介紹:
(1)name:欄位名稱
(2)type:欄位類型(此處type不是java類型,而是下面定義的fieldType)
(3)indexed:是否索引看true--solr會對這個欄位進行索引,只有經過索引的欄位才能被搜索、排序等;false--不索引
(4)stored:是否存儲看true--存儲,當我們需要在頁面顯示此欄位時,應設為true,否則false。
(5)required:是否必須看true--此欄位為必需,如果此欄位的內容為空,會報異常;false--不是必需
(6)multiValued:此欄位是否可以保存多個值看
(7)omitNorms:是否對此欄位進行解析看有時候我們想通過某個欄位的完全匹配來查詢信息,那麼設置 indexed="true"、omitNorms="true"。
(8)default:設置默認值
有這樣一個FieldType描述:
<fieldType name="text_general" positionIncrementGap="100">
<analyzer type="index">
<tokenizer/>
<filter ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter/>
</analyzer>
<analyzer type="query">
<tokenizer/>
<filter ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter/>
</analyzer>
</fieldType>
屬性說明:
(1)name:類型名稱,<field>中的type引用的就是這個name
(2)class:solr自定義的類型
(3)<analyzer type="index">定義建立索引時使用的分詞器及過濾器
(4)<analyzer type="query">定義搜索時所使用的分詞器及過濾器
(5)<tokenizer/>定義分詞器
(6)<filter/>定義過濾器
uniqueKey屬性
<uniqueKey>id</uniqueKey>
類似於數據表數據的id,solr索引庫中最好定義一個用於標示document唯一性的欄位,此欄位主要用於刪除document。
defaultSearchField屬性
就是你在做query搜尋時若不指定特定欄位做檢索時, Solr就會只查這個欄位.
<defaultSearchField>default</defaultSearchField>
Field屬性
是用來復制你一個欄位里的值到另一欄位用. 如你可以將name里的東西到major里, 這樣solr做檢索時也會檢索到name里的東西.
<Field source="name" dest="major"/>
現在可以將資料庫的數據導入solr了。
點擊Execute就可以了。