Batch import of python data into Elasticsearch instances

2020-10-23 20:10:57
OfStack

ES has been introduced in previous blogs and provides many interfaces. This article shows how to use python for bulk imports. ES has a lot of documentation on its website, so it shouldn't be hard to use it if you study it carefully and combine it with search engines.

The code first


#coding=utf-8
from datetime import datetime
from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()
actions = []
f=open('index.txt')
i=1
for line in f:
 line = line.strip().split(' ')
 action={
 "_index":"image",
 "_type":"imagetable",
 "_id":i,
 "_source":{
  u" Pictures of ":line[0].decode('utf8'),
  u" source ":line[1].decode('utf8'),
  u" authority ":line[2].decode('utf8'),
  u" The size of the ":line[3].decode('utf8'),
  u" The quality of ":line[4].decode('utf8'),
  u" category ":line[5].decode('utf8'),
  u" model ":line[6].decode('utf8'),
  u" country ":line[7].decode('utf8'),
  u" Gathering people ":line[8].decode('utf8'),
  u" Subordinate departments ":line[9].decode('utf8'),
  u" keywords ":line[10].decode('utf8'),
  u" Access permissions ":line[11].decode('utf8') 
  }
 }
 i+=1
 actions.append(action)
 if(len(actions)==500):
 helpers.bulk(es, actions)
 del actions[0:len(actions)]
if (len(actions) > 0):
 helpers.bulk(es, actions)

First of all, index.txt is encoded with utf8, so decode('utf8') needs to be converted to an unicode object, and u needs to be added before "picture name", otherwise ES will report an error

The speed of the import is still very fast, more than 2,000 records per second.