ElasticSearch (4) – 内置分词器
- Standard Analyzer – 默认分词器,按词切分,小写处理
- Simple Analyzer – 按照非字母切分(符号被过滤),小写处理
- Stop Analyzer – 小写处理,停用过滤词(the, a, is)
- Whitespace Analyzer – 按照空格切分,不转小写
- Keyword Analyzer – 不分词,直接将输入当作输出
- Pattern Analyzer – 正则表达式,默认 \W+ (非字符分隔)
- Language – 提供了30多种常见语言的分词器
- Custom Analyzer 自定义分词器
#whitespace分词测试 GET _analyze { "analyzer": "whitespace", "text":"he is-a boy" } # 响应如下: { "tokens" : [ { "token" : "he", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 0 }, { "token" : "is-a", "start_offset" : 3, "end_offset" : 7, "type" : "word", "position" : 1 }, { "token" : "boy", "start_offset" : 8, "end_offset" : 11, "type" : "word", "position" : 2 } ] }
Facebook评论