1.安装elasticSearch,参看文档:http://www.fecmall.com/topic/672
2.下载相应的 IK Analysis版本,上面的文档安装的是6.1.3
,因此中文分词器IK Analysis
也要下载6.1.3
下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.3/elasticsearch-analysis-ik-6.1.3.zip
如果您是其他的版本,请将v6.1.3/elasticsearch-analysis-ik-6.1.3
,这2处的版本改成您的elasticSearch的版本进行下载
3.安装,进入elasticSearch的安装路径
3.1下载zip压缩包
cd ./plugins
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.3/elasticsearch-analysis-ik-6.1.3.zip
您也可以ftp上传上去,下载文成后是一个压缩包elasticsearch-analysis-ik-6.1.3.zip
3.2解压zip包,删除压缩包,更名文件夹
解压:unzip elasticsearch-analysis-ik-6.1.3.zip
删除压缩包(必须删除,否则elastic启动报错):rm -f elasticsearch-analysis-ik-6.1.3.zip
将解压后的文件夹elasticsearch
更名为ik
上面的步骤需要都操作,否则会出现启动失败
3.3重启elasticSearch
ps -ef | grep elastic
然后用kill掉,然后通过下面的命令启动es
su elasticsearch -c "/usr/local/elasticsearch/bin/elasticsearch -d"
启动成功,则代表成功
4.查看
4.1使用默认分词器
curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'
{
"text": "听说看这篇博客的哥们最帅、姑娘最美"
}'
结果:
[root@iZ942k2d5ezZ plugins]# curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'
> {
> "text": "听说看这篇博客的哥们最帅、姑娘最美"
> }'
{
"tokens" : [
{
"token" : "听",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "说",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "看",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "这",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "篇",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "博",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 5
},
{
"token" : "客",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 6
},
{
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "<IDEOGRAPHIC>",
"position" : 7
},
{
"token" : "哥",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 8
},
{
"token" : "们",
"start_offset" : 9,
"end_offset" : 10,
"type" : "<IDEOGRAPHIC>",
"position" : 9
},
{
"token" : "最",
"start_offset" : 10,
"end_offset" : 11,
"type" : "<IDEOGRAPHIC>",
"position" : 10
},
{
"token" : "帅",
"start_offset" : 11,
"end_offset" : 12,
"type" : "<IDEOGRAPHIC>",
"position" : 11
},
{
"token" : "姑",
"start_offset" : 13,
"end_offset" : 14,
"type" : "<IDEOGRAPHIC>",
"position" : 12
},
{
"token" : "娘",
"start_offset" : 14,
"end_offset" : 15,
"type" : "<IDEOGRAPHIC>",
"position" : 13
},
{
"token" : "最",
"start_offset" : 15,
"end_offset" : 16,
"type" : "<IDEOGRAPHIC>",
"position" : 14
},
{
"token" : "美",
"start_offset" : 16,
"end_offset" : 17,
"type" : "<IDEOGRAPHIC>",
"position" : 15
}
]
}
4.2使用ik_smart
分词器:
curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'
{
"analyzer": "ik_smart",
"text": "听说看这篇博客的哥们最帅、姑娘最美"
}'
结果
[root@iZ942k2d5ezZ plugins]# curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'
> {
> "analyzer": "ik_smart",
> "text": "听说看这篇博客的哥们最帅、姑娘最美"
> }'
{
"tokens" : [
{
"token" : "听说",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "看",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "这篇",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "博客",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "哥们",
"start_offset" : 8,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "最",
"start_offset" : 10,
"end_offset" : 11,
"type" : "CN_CHAR",
"position" : 6
},
{
"token" : "帅",
"start_offset" : 11,
"end_offset" : 12,
"type" : "CN_CHAR",
"position" : 7
},
{
"token" : "姑娘",
"start_offset" : 13,
"end_offset" : 15,
"type" : "CN_WORD",
"position" : 8
},
{
"token" : "最美",
"start_offset" : 15,
"end_offset" : 17,
"type" : "CN_WORD",
"position" : 9
}
]
}
4.3ik_max_word 和 ik_smart的区别
ik_max_word
: 会将文本做最细粒度拆分,比如将中华人民共和国国歌
拆分-->中华人民共和国
,中华人民
,中华
,华人
,人民共和国
,人民
,人
,民
,共和国
,共和
,和
,国国
,国歌
,会穷尽各种可能的组合;
ik_smart
: 会做最粗粒度的拆分,比如将中华人民共和国国歌
拆分-->中华人民共和国
,国歌
。
5.对于应用扩展:http://addons.fecmall.com/44669378
如果想使用中文分词,那么进行配置:
@fecelastic\models\elasticSearch\Product
, 将
public static $langAnalysis = [
'zh' => 'cjk', // 中国
'kr' => 'cjk', // 韩国
'jp' => 'cjk', // 日本
'en' => 'english', //
'fr' => 'french',
'de' => 'german',
'it' => 'italian',
'pt' => 'portuguese',
'es' => 'spanish',
'ru' => 'russian',
'nl' => 'dutch',
'br' => 'brazilian',
];
'zh' => 'cjk'
改成 'zh' => 'ik_smart',
然后根目录执行
./yii elasticsearch/clean
./yii elasticsearch/updatemapping
然后,执行数据同步
cd ./vendor/fancyecommerce/fecshop/shell/search
sh fullSearchSync.sh
不清楚是内存太小还是其他原因,我安装了这个中文分词插件,elasticSearch很卡,因此这个我暂时没验证结果,您可以自行验证。