Solve a pit where pandas executes fuzzy query sql
- 2021-09-16 07:25:55
- OfStack
The query engine uses presto, and fuzzy query is used in sql.
engine = create_engine(presto_url,encoding='utf-8')
sql_exe ="""select id,title,tags from source.base.table where tags like '% Hehe %' """
df = pd.read_sql_query(sql_exe,engine)
1 Direct error reporting:
unsupported format character
Solutions
Number 1:
sql_exe ="""select id,title,tags from source.base.table where tags like '%% Hehe %%' """
Number 2:
sql_exe ="""select id,title,tags from source.base.table where tags like %s """
df = pd.read_sql_query(sql_exe,engine,params=("% Hehe %",))
Add: pd.read_sql () It is enough to know this
As follows:
pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)
Meaning of each parameter
sql: SQL Command String
con: engine that connects to the sql database, 1 can be established with packages such as SQLalchemy or pymysql
index_col: Select a 1 column as index
coerce_float: Very useful for reading numeric strings directly into float
parse_dates: Converts a 1-column date string to datetime data, similar to the pd.to_datetime function. You can either directly provide the column name to be converted to the default date format, or you can provide the column name and converted date format in dictionary format, such as {column_name: ES60string} (format string: "% Y:% m:% H:% M:% S").
columns: The column to select. 1 is useless, because in the sql command, 1 specifies the column to be selected
chunksize: If 1 integer value is supplied, then 1 generator is returned, and the number of rows per output is the size of the supplied value.
Set the parameter can > Two Ways to Create Database Links
Constructing database link with sqlalchemy
import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine
# Use sqlalchemy Build database links engine
connect_info = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'
engine = create_engine(connect_info)
# sql Command
sql_cmd = "SELECT * FROM table"
df = pd.read_sql(sql=sql_cmd, con=engine)
Constructing database link with DBAPI
import pandas as pd
import pymysql
# sql Command
sql_cmd = "SELECT * FROM table"
# Use DBAPI Build database links engine
con = pymysql.connect(host=localhost, user=username, password=password, database=dbname, charset='utf8', use_unicode=True)
df = pd.read_sql(sql_cmd, con)
read_sql and read_sql_table, read_sql_query
read_sql is essentially a unified mode of read_sql_table and read_sql_query.
All 3 returned DataFrame from return.
1.
read_sql_table
Read SQL database table into a DataFrame.
2.
read_sql_query
Read SQL query into a DataFrame.