网站首页php
Coreseek中文全文检索安装及使用
发布时间:2016-11-30 18:48:02编辑:阅读(2815)
最近在开发政策库系统的时候,有一个政策全文搜索的需求,用到了这个coreseek。
coreseek算是sphinx的中文版本,详细内容可前往:http://www.coreseek.cn。
下面分享一些安装和使用中的体验.
系统里已经做好了mysql和php的基本检索。
1. 编译安装coreseek. 路上遇到的各种问题请移步网站常见问题 : )
a>$ wget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz b>$ tar xzvf coreseek-4.1-beta.tar.gz c>$ cd coreseek-4.1-beta ##安装mmseg d>$ cd mmseg-3.2.14 e>$ ./bootstrap f>$ ./configure --prefix=/usr/local/mmseg g>$ make && make install h>$ cd .. ##安装coreseek i>$ cd csft-4.1/ j>$ sh buildconf.sh k>$ ./configure --prefix=/usr/local/coreseek --without-python --with-mysql=/usr/local/mysql --with-mmseg=/usr/local/mmseg --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/ l>$ make && make install
2. 配置coreseek
a>$ cp /usr/local/coreseek/etc/sphinx.conf.dist /usr/local/coreseek/etc/sphinx.conf
b>$ vim /usr/local/coreseek/etc/sphinx.conf
source content
{
type = mysql
sql_host = localhost
sql_user = DB_USER
sql_pass = DB_PASSWORD
sql_db = DB_NAME
sql_port = 3306
sql_query_pre = SET NAMES utf8
sql_query = \
SELECT a.id, group_id, date_added, a.title, b.content FROM `news` a
INNER JOIN `newscontent` b ON a.id=b.id WHERE a.id=$id
sql_attr_uint = group_id
sql_attr_timestamp = pub_time
sql_query_info = SELECT * FROM contents WHERE id=$id
}
index content
{
source = content
path = /usr/local/coreseek/var/data/content
docinfo = extern
charset_dictpath = /usr/local/mmseg/etc/
charset_type = zh_cn.utf-8
ngram_len = 0
}
indexer
{
mem_limit = 32M
}
searchd
{
port = 9312
log = /usr/local/coreseek/var/log/searchd.log
query_log = /usr/local/coreseek/var/log/query.log
read_timeout = 5
max_children = 30
pid_file = /usr/local/coreseek/var/log/searchd.pid
max_matches= 1000
seamless_rotate = 1
preopen_indexes= 1
unlink_old = 1
}
c>$ mysql的默认连接字符集也要设置为utf8, 直接在my.cnf里加入:
character_set_server=utf83. 定时任务更新索引
A、在coreseek目录下,新建3个sh脚本,以便操作: a>stop.sh ##停止服务 #!/bin/bash /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf --stop b>build.sh ##建立索引 #!/bin/bash /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx.conf --all --rotate c>start.sh ##启动服务 #!/bin/bash /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf B、添加可执行权限: chmod +x start.sh chmod +x stop.sh chmod +x build.sh C、加入定时任务 $ crontab -e 0 2 * * * sh /usr/local/coreseek/build.sh >/dev/null 2>&1 D、运行start.sh后,使用crontab定时执行build.sh,就可更新索引,更新过索引后, 可以执行: $ /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/sphinx.conf -a 国家标准化管理委员会 即可看到执行结果。
4. 更新搜索代码
在/usr/local/src/coreseek-4.1-beta/csft-4.1/api目录下提供了PHP的接口文件 sphinxapi.php,
这个文件包含一个SphinxClient的类,copy到自己的web目录,PHP code如下:
require("sphinxapi.php");
$s = new SphinxClient;
$s->SetServer("localhost", 9312);
$s->setLimits($page, $pageSize);
$result = $s->Query('@title (测试) @content (网络)', "*");
/***在result['matches']中即为匹配结果, result['total']为匹配数量***/
echo '<pre>';
print_r($res['matches']);
echo '</pre>';
if( isset($result['matches']) && !empty($result['matches']) ){
foreach($result['matches'] as $k=>$v){
array_push($resultid, $k);
}
$resultid = implode(',', $resultid);
/***执行下面的sql语句,返回查询到的结果***/
$sql = "SELECT * FROM `news` WHERE id IN ({$resultid})";
//...
}
评论