Batch import of python data into Elasticsearch instances
- 2020-10-23 20:10:57
- OfStack
ES has been introduced in previous blogs and provides many interfaces. This article shows how to use python for bulk imports. ES has a lot of documentation on its website, so it shouldn't be hard to use it if you study it carefully and combine it with search engines.
The code first
#coding=utf-8
from datetime import datetime
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
actions = []
f=open('index.txt')
i=1
for line in f:
line = line.strip().split(' ')
action={
"_index":"image",
"_type":"imagetable",
"_id":i,
"_source":{
u" Pictures of ":line[0].decode('utf8'),
u" source ":line[1].decode('utf8'),
u" authority ":line[2].decode('utf8'),
u" The size of the ":line[3].decode('utf8'),
u" The quality of ":line[4].decode('utf8'),
u" category ":line[5].decode('utf8'),
u" model ":line[6].decode('utf8'),
u" country ":line[7].decode('utf8'),
u" Gathering people ":line[8].decode('utf8'),
u" Subordinate departments ":line[9].decode('utf8'),
u" keywords ":line[10].decode('utf8'),
u" Access permissions ":line[11].decode('utf8')
}
}
i+=1
actions.append(action)
if(len(actions)==500):
helpers.bulk(es, actions)
del actions[0:len(actions)]
if (len(actions) > 0):
helpers.bulk(es, actions)
First of all, index.txt is encoded with utf8, so decode('utf8') needs to be converted to an unicode object, and u needs to be added before "picture name", otherwise ES will report an error
The speed of the import is still very fast, more than 2,000 records per second.