Solve a pit where pandas executes fuzzy query sql

  • 2021-09-16 07:25:55
  • OfStack

The query engine uses presto, and fuzzy query is used in sql.


engine = create_engine(presto_url,encoding='utf-8')
sql_exe ="""select id,title,tags from source.base.table where tags like '% Hehe %' """
df = pd.read_sql_query(sql_exe,engine)

1 Direct error reporting:

unsupported format character

Solutions

Number 1:


sql_exe ="""select id,title,tags from source.base.table where tags like '%% Hehe %%' """

Number 2:


sql_exe ="""select id,title,tags from source.base.table where tags like %s """
 df = pd.read_sql_query(sql_exe,engine,params=("% Hehe %",))

Add: pd.read_sql () It is enough to know this

As follows:


pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

Meaning of each parameter

sql: SQL Command String

con: engine that connects to the sql database, 1 can be established with packages such as SQLalchemy or pymysql

index_col: Select a 1 column as index

coerce_float: Very useful for reading numeric strings directly into float

parse_dates: Converts a 1-column date string to datetime data, similar to the pd.to_datetime function. You can either directly provide the column name to be converted to the default date format, or you can provide the column name and converted date format in dictionary format, such as {column_name: ES60string} (format string: "% Y:% m:% H:% M:% S").

columns: The column to select. 1 is useless, because in the sql command, 1 specifies the column to be selected

chunksize: If 1 integer value is supplied, then 1 generator is returned, and the number of rows per output is the size of the supplied value.

Set the parameter can > Two Ways to Create Database Links

Constructing database link with sqlalchemy


import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine
#  Use sqlalchemy Build database links engine
connect_info = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'
engine = create_engine(connect_info)
# sql  Command 
sql_cmd = "SELECT * FROM table"
df = pd.read_sql(sql=sql_cmd, con=engine)

Constructing database link with DBAPI


import pandas as pd
import pymysql
# sql  Command 
sql_cmd = "SELECT * FROM table"
#  Use DBAPI Build database links engine
con = pymysql.connect(host=localhost, user=username, password=password, database=dbname, charset='utf8', use_unicode=True)
df = pd.read_sql(sql_cmd, con)

read_sql and read_sql_table, read_sql_query

read_sql is essentially a unified mode of read_sql_table and read_sql_query.

All 3 returned DataFrame from return.

1. read_sql_table

Read SQL database table into a DataFrame.

2. read_sql_query

Read SQL query into a DataFrame.


Related articles: