Script sharing of Python batch deletion of tens of millions of large amounts of data in mysql
- 2021-08-12 03:08:49
- OfStack
Scene description
There is a table in the online mysql database with daily statistical results, with more than 10 million pieces every day, which is unexpected to us. Why are there so many statistical results? Operation and maintenance found it, and the disk accounted for 200G. Finally, I asked the operation, and only the last 3 days can be kept, and the previous data can only be deleted. Delete, how to delete?
Because this is an online database, there are many other data tables stored in it. If the data of this table is deleted directly, it will definitely not work, and it may affect other tables. Try to delete only 1 day of data at a time, or Catton's powerful, no way, write an Python script to delete it in batches.
The specific ideas are:
Python code
# -*-coding:utf-8 -*-
import sys
# This is our internal package Python Module
sys.path.append('/var/lib/hadoop-hdfs/scripts/python_module2')
import keguang.commons as commons
import keguang.timedef as timedef
import keguang.sql.mysqlclient as mysql
def run(starttime, endtime, regx):
tb_name = 'statistic_ad_image_final_count'
days = timedef.getDays(starttime,endtime,regx)
# Traversing to delete data for all days
for day in days:
print '%s Data deletion start '%(day)
mclient = getConn()
sql = '''
select 1 from %s where date = '%s' limit 1
'''%(tb_name, day)
print sql
result = mclient.query(sql)
# If you find this 1 Days of data, continue to delete
while result is not ():
sql = 'delete from %s where date = "%s" limit 50000'%(tb_name, day)
print sql
mclient.execute(sql)
sql = '''
select 1 from %s where date = '%s' limit 1
'''%(tb_name, day)
print sql
result = mclient.query(sql)
print '%s Data Deletion Complete '%(day)
mclient.close()
# Return mysql Connect
def getConn():
return mysql.MysqlClient(host = '0.0.0.0', user = 'test', passwd = 'test', db= 'statistic')
if __name__ == '__main__':
regx = '%Y-%m-%d'
yesday = timedef.getYes(regx, -1)
starttime = '2019-08-17'
endtime ='2019-08-30'
run(starttime, endtime, regx)
The above is Python batch deletion of mysql tens of millions of massive data script details, more information about python deletion of MySQL data please pay attention to other related articles on this site!