什么是数据建模

数据建模是一个用于定义和分析在组织的信息系统的范围内支持商业流程所需的数据要求的过程

数据类型

在索引的时候,如果字段第一次出现会自动是吧为某个类型
字段一旦设置了类型就不可更改

基本数据类型

字符串:text,keyword 数值型:long,interger,short,byte,double,float,half_float,scaled_float 布尔:boolean 日期:date 二进制:binary 范围类型:integer_rang,float_range,long_range,double_range,date_range

text和keyword的区别是text会被分词,一般text会自动创建一个keyword子类型

日期类型

es中日期类型可以是以下几种

日期格式的字符串：e.g. “2015-01-01” or “2015/01/01 12:10:30”.
long类型的毫秒数( milliseconds-since-the-epoch)
integer的秒数(seconds-since-the-epoch)
自定义格式

binary类型

binary类型接受base64编码的字符串，默认不存储也不可搜索

复合数据类型

数组

ELasticsearch没有专用的数组类型，默认情况下任何字段都可以包含一个或者多个值，但是一个数组中的值要是同一种类型

例如:

字符数组: [ “one”, “two” ]
整型数组：[1,3]
嵌套数组：[1,[2,3]],等价于[1,2,3]
对象数组：[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]

对象

1
2
3
4
5
6


{
  "user": {
    "age": 10,
    "name": "Joe"
  }
}

上面文档的Mapping

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


{
  "mappings": {
    "my_type": {
      "properties": {
        "user": {
          "properties": {
            "age": {
              "type": "interger"
            },
            "name": {
              "type": "text"
            }
          }
        }
      }
    }
  }
}

或者

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


{
  "mappings": {
    "my_type": {
      "properties": {
        "user.age": {
          "type": "interger"
        },
        "user.name": {
          "type": "text"
        }
      }
    }
  }
}

Geo类型

地理位置信息类型用于存储地理位置信息的经纬度

其他

mapping

类似于mysql中的表结构定义

增删改查

增加

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


curl -XPUT "http://elasticsearch:9200/{index_nmae}/_mapping/{type}" -H 'Content-Type: application/json' -d'
{
  "properties":{
    "{field_name1}":{
      "type":"keyword"
    },
    "{field_name2}":{
      "type":"text"
    }
  }
}'

获取已经定义的mapping

1

curl -XGET "http://elasticsearch:9200/{index_name}/_mapping/"

修改/删除

无法删除/修改已经创建的mapping,除非删除后重建索引

索引模板(Index Templates)

指定索引使用的映射,只对新建的索引生效,已经创建了的索引不受影响

6.0之前

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


curl -XPOST "http://elasticsearch:9200/_template/{template_name}" -H 'Content-Type: application/json' -d'
{
  "template": "myindex*", //想要匹配的索引,支持数组
  "mappings": {
    "{type}":{
      "properties":{
        "{field_name}":{
        "type":"integer"
      }
      }
    }
  }
}'

6.0修改了api

template字段改名为index_patterns

查看索引模板

1

curl -XGET "http://elasticsearch:9200/{index_name}/_mapping"

或者

1

curl -XGET "http://elasticsearch:9200/_template/{template_name_regex}"

动态映射

自动的检测和添加新的类型以及字段的过程，称之为动态映射,我们可以自定义映射规则 es是依靠JSON文档的字段类型来实现自动识别字段类型

支持的字段类型

动态模板

常见应用: 设置默认所有string类型不分词,指定字段开启分词,减少空间

创建

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


curl -XPUT "http://elasticsearch:9200/{index_name}" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "{type}": {
      "dynamic_templates": [
        {
          "{template_name}": {
            "match_mapping_type": "string", //指定匹配的类型
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}'

一共支持三种匹配方式类型匹配,字段名称匹配,按路径匹配

动态字段映射

默认情况,发现新的字段,es自动检测其 datatype 并将其加入到mapping type 中通过一些设置,我们可以控制字段动态映射的方式,包括:日期类型检测、数值类型检测、自定义日期类型的格式等

例子

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


PUT my_index         //禁用日期类型检测
{
  "mappings": {
    "my_type": {
      "date_detection": false
    }
  }
}
PUT my_index       //自定义日期类型的格式
{
  "mappings": {
    "my_type": {
      "dynamic_date_formats": ["MM/dd/yyyy"]
    }
  }
}
PUT my_index        //启用数值类型检测
{
  "mappings": {
    "my_type": {
      "numeric_detection": true
    }
  }
}

补救错误的模型

主要有两种手段:使用reindex重建索引和删除索引后重新导入数据

使用_reindex

基础api

1
2
3
4
5
6
7
8
9


curl -XPOST "http://elasticsearch:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "{index_name1}"
  },
  "dest": {
    "index": "{index_name2}"
  }
}'

例子:reindex结合索引模板

场景:index1某一个字段类型错误

一. 创建索引模板

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


curl -X POST "http://elasticsearch:9200/_template/tmp1" -H 'Content-Type: application/json' -d'
{
  "template": "index*v2",
  "mappings": {
    "doc": {
      "properties": {
        "duration": {
          "type": "long"
        }
      }
    }
  }
}'

二. reindex

1
2
3
4
5
6
7
8
9


curl -X POST "http://elasticsearch:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "index1"
  },
  "dest": {
    "index": "index1_v2"
  }
}'

成功后删除原始索引

借助java代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


  Date date = DateUtils.addDays(DateUtils.parseDate("2018.06.01", "yyyy.MM.dd"), 1);
        while (DateUtils.truncatedCompareTo(DateUtils.parseDate("2018.06.20","yyyy.MM.dd"),date,Calendar.DATE)!=-1){
            String time = DateFormatUtils.format(date, "yyyy.MM.dd");
            Process process = Runtime.getRuntime().exec(new String[]{"curl", "-X", "POST", "http://127.0.0.1:9200/_reindex?wait_for_completion=false", "-H", "Content-Type: application/json", "-d", "{\n" +
                    "  \"source\": {\n" +
                    "    \"index\": \"api-" + time + "\"\n" +
                    "  },\n" +
                    "  \"dest\": {\n" +
                    "    \"index\": \"api-\"\n" +
                    "  },\n" +
                    "  \"script\": {\n" +
                    "    \"lang\": \"painless\",\n" +
                    "    \"source\": \"ctx._index = 'api-' + (ctx._index.substring('api-'.length(), ctx._index.length())) + '_v2'\"\n" +
                    "  }\n" +
                    "}"});
            String json = IOUtils.toString(process.getInputStream());
            System.out.println("json==" + json);
            JSONObject jsonObject = JSON.parseObject(json);
            String taskkId = jsonObject.getString("task");
            String taskp = "curl -X GET http://127.0.0.1:9200/_tasks/" + taskkId;
            Process process1 = Runtime.getRuntime().exec(taskp);
            String json1 = IOUtils.toString(process1.getInputStream());
            System.out.println(json1);
            JSONObject jsonObject1 = JSON.parseObject(json1);
            while (!jsonObject1.getBoolean("completed")) {
                process1 = Runtime.getRuntime().exec(taskp);
                json1 = IOUtils.toString(process1.getInputStream());
                System.out.println(json1);
                jsonObject1 = JSON.parseObject(json1);
                TimeUnit.SECONDS.sleep(10);
            }
            System.out.println("----------------");
            Runtime.getRuntime().exec("curl -X DELETE http://127.0.0.1:9200/api-" +time);
            date=DateUtils.addDays(date,1);
            TimeUnit.SECONDS.sleep(10);
        }

例子:reindex结合Ingest Node

场景: 比如只保存了ip地址但是没有保存geo

先安装官方的geoip插件

1

/usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-geoip

然后创建一个pipline

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


PUT _ingest/pipeline/<自定义id>
{
    "test": {
        "description": "将ip编程geo",
        "processors": [
            {
                "geoip": {
                    "field": "<保存ip的字段>"
                }
            }
        ]
    }
}

最后在reindex的时候使用转换器

Update By Query

场景: 字段缺失,但是其他字段包含了所需信息

用法查看官网

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

重新导入数据

如果重新导入数据,源数据最好有时间标识

参考资料

Elasticsearch 5.4 Mapping详解 - CSDN博客

mapping 详解5（dynamic mapping） - SomerOS - 博客园

安装geoip

Elasticsearch建模

文章目录