Script sharing of Python batch deletion of tens of millions of large amounts of data in mysql

  • 2021-08-12 03:08:49
  • OfStack

Scene description

There is a table in the online mysql database with daily statistical results, with more than 10 million pieces every day, which is unexpected to us. Why are there so many statistical results? Operation and maintenance found it, and the disk accounted for 200G. Finally, I asked the operation, and only the last 3 days can be kept, and the previous data can only be deleted. Delete, how to delete?
Because this is an online database, there are many other data tables stored in it. If the data of this table is deleted directly, it will definitely not work, and it may affect other tables. Try to delete only 1 day of data at a time, or Catton's powerful, no way, write an Python script to delete it in batches.
The specific ideas are:

Delete only 1 day of data at a time; Delete data for 1 day, deleting 50,000 pieces at a time; After deleting the data for 1 day, start deleting the data for the next day;

Python code


# -*-coding:utf-8 -*-

import sys

#  This is our internal package Python Module
sys.path.append('/var/lib/hadoop-hdfs/scripts/python_module2')
import keguang.commons as commons
import keguang.timedef as timedef
import keguang.sql.mysqlclient as mysql

def run(starttime, endtime, regx):
 tb_name = 'statistic_ad_image_final_count'
 days = timedef.getDays(starttime,endtime,regx)
 #  Traversing to delete data for all days 
 for day in days:
  print '%s  Data deletion start '%(day)
  mclient = getConn()
  sql = '''
  select 1 from %s where date = '%s' limit 1
  '''%(tb_name, day)
  print sql
  result = mclient.query(sql)
  #  If you find this 1 Days of data, continue to delete 
  while result is not ():
   sql = 'delete from %s where date = "%s" limit 50000'%(tb_name, day)
   print sql
   mclient.execute(sql)
   sql = '''
   select 1 from %s where date = '%s' limit 1
   '''%(tb_name, day)
   print sql
   result = mclient.query(sql)
  print '%s  Data Deletion Complete '%(day)
  mclient.close()

#  Return mysql  Connect 
def getConn():
 return mysql.MysqlClient(host = '0.0.0.0', user = 'test', passwd = 'test', db= 'statistic')

if __name__ == '__main__':
 regx = '%Y-%m-%d'
 yesday = timedef.getYes(regx, -1)
 starttime = '2019-08-17'
 endtime ='2019-08-30'
 run(starttime, endtime, regx)

The above is Python batch deletion of mysql tens of millions of massive data script details, more information about python deletion of MySQL data please pay attention to other related articles on this site!


Related articles: