elasticSearch 安装中文分词器 IK Analysis

elasticSearch · Fecmall · 于 4年前发布 · 2395 次阅读

1.安装elasticSearch，参看文档：http://www.fecmall.com/topic/672

2.下载相应的 IK Analysis版本，上面的文档安装的是6.1.3，因此中文分词器IK Analysis也要下载6.1.3

下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.3/elasticsearch-analysis-ik-6.1.3.zip

如果您是其他的版本，请将v6.1.3/elasticsearch-analysis-ik-6.1.3,这2处的版本改成您的elasticSearch的版本进行下载

3.安装，进入elasticSearch的安装路径

3.1下载zip压缩包

cd ./plugins
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.3/elasticsearch-analysis-ik-6.1.3.zip

您也可以ftp上传上去，下载文成后是一个压缩包elasticsearch-analysis-ik-6.1.3.zip

3.2解压zip包，删除压缩包，更名文件夹

解压：unzip elasticsearch-analysis-ik-6.1.3.zip

删除压缩包（必须删除，否则elastic启动报错）：rm -f elasticsearch-analysis-ik-6.1.3.zip

将解压后的文件夹elasticsearch更名为ik

上面的步骤需要都操作，否则会出现启动失败

3.3重启elasticSearch

ps -ef | grep elastic
然后用kill掉，然后通过下面的命令启动es
su elasticsearch  -c "/usr/local/elasticsearch/bin/elasticsearch -d"

启动成功，则代表成功

4.查看

4.1使用默认分词器

curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
{
  "text": "听说看这篇博客的哥们最帅、姑娘最美"
}'

结果：

[root@iZ942k2d5ezZ plugins]# curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
> {
>   "text": "听说看这篇博客的哥们最帅、姑娘最美"
> }' 
{
  "tokens" : [
    {
      "token" : "听",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "说",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "看",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "这",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "篇",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "博",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "客",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 7
    },
    {
      "token" : "哥",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 8
    },
    {
      "token" : "们",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 9
    },
    {
      "token" : "最",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 10
    },
    {
      "token" : "帅",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<IDEOGRAPHIC>",
      "position" : 11
    },
    {
      "token" : "姑",
      "start_offset" : 13,
      "end_offset" : 14,
      "type" : "<IDEOGRAPHIC>",
      "position" : 12
    },
    {
      "token" : "娘",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "<IDEOGRAPHIC>",
      "position" : 13
    },
    {
      "token" : "最",
      "start_offset" : 15,
      "end_offset" : 16,
      "type" : "<IDEOGRAPHIC>",
      "position" : 14
    },
    {
      "token" : "美",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "<IDEOGRAPHIC>",
      "position" : 15
    }
  ]
}

4.2使用ik_smart分词器：

curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
{
  "analyzer": "ik_smart",
  "text": "听说看这篇博客的哥们最帅、姑娘最美"
}'

结果

[root@iZ942k2d5ezZ plugins]# curl -XGET http://127.0.0.1:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
> {
>   "analyzer": "ik_smart",
>   "text": "听说看这篇博客的哥们最帅、姑娘最美"
> }' 
{
  "tokens" : [
    {
      "token" : "听说",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "看",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "这篇",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "博客",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "哥们",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "最",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "帅",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 7
    },
    {
      "token" : "姑娘",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "最美",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}

4.3ik_max_word 和 ik_smart的区别

ik_max_word: 会将文本做最细粒度拆分，比如将中华人民共和国国歌拆分-->中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌，会穷尽各种可能的组合；

ik_smart: 会做最粗粒度的拆分，比如将中华人民共和国国歌拆分-->中华人民共和国,国歌。

5.对于应用扩展：http://addons.fecmall.com/44669378

如果想使用中文分词，那么进行配置：

@fecelastic\models\elasticSearch\Product, 将

public static $langAnalysis = [
            'zh' => 'cjk', // 中国
            'kr' => 'cjk', // 韩国
            'jp' => 'cjk', // 日本
            'en' => 'english', //
            'fr' => 'french', 
            'de' => 'german',
            'it' => 'italian',
            'pt' => 'portuguese',
            'es' => 'spanish',
            'ru' => 'russian',
            'nl' => 'dutch',
            'br' => 'brazilian',            
        ];

'zh' => 'cjk' 改成 'zh' => 'ik_smart',

然后根目录执行

./yii elasticsearch/clean
./yii elasticsearch/updatemapping

然后，执行数据同步

cd  ./vendor/fancyecommerce/fecshop/shell/search
sh fullSearchSync.sh

不清楚是内存太小还是其他原因，我安装了这个中文分词插件，elasticSearch很卡，因此这个我暂时没验证结果，您可以自行验证。

0 个赞踩感谢关注收藏

共收到 0 条回复

没有找到数据。

添加回复 (需要登录)

需要登录后方可回复, 如果你还没有账号请点击这里注册。

FECMALL

FECMALL 是一个真正开源的电商商城，公司项目商用免费授权。

现在注册

已注册用户请登录

重要公告

Fecmall一览表

作者Terry寄语

Fecmall-2.x在持续开发中，后面的工作主要是增强fecmall的用户体验，以及做模板插件周边市场，修复fecmall的bug和添加一些特殊必要性的功能，欢迎各位在自己的电商项目使用fecmall，对于官方fecmall后面的功能的添加，会以独立扩展插件的形式制作，而不是在fecmall主体上面添加，有问题在论坛发帖，帖子一般在24小时内回复，论坛发帖，这样可以形成积累，同样的问题回答一次，后面的人可以通过搜索找到历史帖子解决问题，节省时间，Fecmall的流行离不开大家的努力！
致谢！