在 Elasticsearch 中,每个文档都有唯一的文档 ID,用于标识该文档在索引中的位置。文档ID是一个重要的概念,了解它的长度限制非常重要。

Elasticsearch 文档 ID 的长度限制是 512 Bytes。这意味着文档 ID 不能超过 512 字节的大小。如果你尝试创建一个超过 512 字节的文档 ID,Elasticsearch 将会返回一个错误。这个限制是为了保证索引的性能和稳定性。

以下通过示例来说明 Elasticsearch 文档 ID 的长度限制。

Elasticsearch 版本:7.1.0
API 调用均在 Kibana 开发工具上操作

首先,创建一个名为 test_idx 索引:

1
2
3
4
5
6
7
8
9
10
11
12
13
PUT test_idx 
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"nickname": {
"type": "keyword"
}
}
}
}

然后,插入一个 ID 长度为 678 的文档:
1
2
3
4
POST test_idx/_doc/oooooooooojirgrjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjpppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
{
"nickname":"nick"
}

可以看到返回错误信息 Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 678,具体结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"error": {
"root_cause": [
{
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 678;"
}
],
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 678;"
},
"status": 400
}

如果要插入的文档 ID 为中文的话,则长度不能超过 170。因为大部分中文字符,在 UTF-8 中是采用 3 个字节进行存储的,所以文档 ID 为中文时最大长度为:512/3=170.7。

例如,插入 ID 值为_舒客泡感牙膏专业系列防蛀美白含氟清新口气清洁口腔旗舰店家庭装舒客泡感牙膏专业系列防蛀美白含氟清新口气清洁口腔旗舰店家庭装舒客泡感牙膏专业系列防蛀美白含氟清新口气清洁口腔旗舰店家庭装舒客泡感牙膏专业系列防蛀美白含氟清新口气清洁口腔旗舰店家庭装舒客泡感牙膏专业系列防蛀美白含氟清新口气清洁口腔旗舰店家庭装舒客泡感牙膏专业系列防蛀美白含氟清新口气清洁口腔旗舰店家庭装_的文档,该 ID 的长度为 180,540 字节(注意:对于中文的文档 ID,在调用 API 时,需要先进行 URLEncode):

1
2
3
4
POST test_idx/_doc/%E8%88%92%E5%AE%A2%E6%B3%A1%E6%84%9F%E7%89%99%E8%86%8F%E4%B8%93%E4%B8%9A%E7%B3%BB%E5%88%97%E9%98%B2%E8%9B%80%E7%BE%8E%E7%99%BD%E5%90%AB%E6%B0%9F%E6%B8%85%E6%96%B0%E5%8F%A3%E6%B0%94%E6%B8%85%E6%B4%81%E5%8F%A3%E8%85%94%E6%97%97%E8%88%B0%E5%BA%97%E5%AE%B6%E5%BA%AD%E8%A3%85%E8%88%92%E5%AE%A2%E6%B3%A1%E6%84%9F%E7%89%99%E8%86%8F%E4%B8%93%E4%B8%9A%E7%B3%BB%E5%88%97%E9%98%B2%E8%9B%80%E7%BE%8E%E7%99%BD%E5%90%AB%E6%B0%9F%E6%B8%85%E6%96%B0%E5%8F%A3%E6%B0%94%E6%B8%85%E6%B4%81%E5%8F%A3%E8%85%94%E6%97%97%E8%88%B0%E5%BA%97%E5%AE%B6%E5%BA%AD%E8%A3%85%E8%88%92%E5%AE%A2%E6%B3%A1%E6%84%9F%E7%89%99%E8%86%8F%E4%B8%93%E4%B8%9A%E7%B3%BB%E5%88%97%E9%98%B2%E8%9B%80%E7%BE%8E%E7%99%BD%E5%90%AB%E6%B0%9F%E6%B8%85%E6%96%B0%E5%8F%A3%E6%B0%94%E6%B8%85%E6%B4%81%E5%8F%A3%E8%85%94%E6%97%97%E8%88%B0%E5%BA%97%E5%AE%B6%E5%BA%AD%E8%A3%85%E8%88%92%E5%AE%A2%E6%B3%A1%E6%84%9F%E7%89%99%E8%86%8F%E4%B8%93%E4%B8%9A%E7%B3%BB%E5%88%97%E9%98%B2%E8%9B%80%E7%BE%8E%E7%99%BD%E5%90%AB%E6%B0%9F%E6%B8%85%E6%96%B0%E5%8F%A3%E6%B0%94%E6%B8%85%E6%B4%81%E5%8F%A3%E8%85%94%E6%97%97%E8%88%B0%E5%BA%97%E5%AE%B6%E5%BA%AD%E8%A3%85%E8%88%92%E5%AE%A2%E6%B3%A1%E6%84%9F%E7%89%99%E8%86%8F%E4%B8%93%E4%B8%9A%E7%B3%BB%E5%88%97%E9%98%B2%E8%9B%80%E7%BE%8E%E7%99%BD%E5%90%AB%E6%B0%9F%E6%B8%85%E6%96%B0%E5%8F%A3%E6%B0%94%E6%B8%85%E6%B4%81%E5%8F%A3%E8%85%94%E6%97%97%E8%88%B0%E5%BA%97%E5%AE%B6%E5%BA%AD%E8%A3%85%E8%88%92%E5%AE%A2%E6%B3%A1%E6%84%9F%E7%89%99%E8%86%8F%E4%B8%93%E4%B8%9A%E7%B3%BB%E5%88%97%E9%98%B2%E8%9B%80%E7%BE%8E%E7%99%BD%E5%90%AB%E6%B0%9F%E6%B8%85%E6%96%B0%E5%8F%A3%E6%B0%94%E6%B8%85%E6%B4%81%E5%8F%A3%E8%85%94%E6%97%97%E8%88%B0%E5%BA%97%E5%AE%B6%E5%BA%AD%E8%A3%85
{
"nickname":"nick"
}

返回的结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"error": {
"root_cause": [
{
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 540;"
}
],
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: id is too long, must be no longer than 512 bytes but was: 540;"
},
"status": 400
}

可以看到返回同样的错误:id is too long, must be no longer than 512 byte

另外,在 Elasticsearch 的源码中也可以找到这个文档 ID 的限制:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
* Maximal allowed length (in bytes) of the document ID.
*/
public static final int MAX_DOCUMENT_ID_LENGTH_IN_BYTES = 512;

if (id != null && id.getBytes(StandardCharsets.UTF_8).length > MAX_DOCUMENT_ID_LENGTH_IN_BYTES) {
validationException = addValidationError(
"id ["
+ id
+ "] is too long, must be no longer than "
+ MAX_DOCUMENT_ID_LENGTH_IN_BYTES
+ " bytes but was: "
+ id.getBytes(StandardCharsets.UTF_8).length,
validationException
);
}

具体源码可以查看 IndexRequest 这个类:
https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/index/IndexRequest.java

为什么要限制文档 ID 的长度呢?这是因为文档 ID 在索引中扮演着非常重要的角色。文档 ID 被用来查找文档、更新文档、删除文档和排序文档。如果文档 ID 太长,那么每个文档的存储空间就会增加,这会导致索引的性能下降。此外,如果文档 ID 太长,那么 Elasticsearch 集群的内存和磁盘空间的使用也会增加,从而降低整个集群的性能。

为了避免文档 ID 过长,开发人员应该尽可能地使用简短而且唯一的文档 ID。通常,可以使用 UUID 或者数字作为文档 ID。如果需要一个更有意义的文档 ID,可以使用有限的字符集合和适当的编码方法来缩短文档 ID 的长度。

了解 Elasticsearch 文档 ID 的长度限制是非常重要的,它对于确保 Elasticsearch 索引的性能和稳定性至关重要。如果你是一个 Elasticsearch 开发人员,你应该遵守这个限制,并且尽可能地使用简短而且唯一的文档 ID。

(END)