Detailed Explanation of Time Series Processing in Basic Analysis of python Pandas Library
- 2021-07-18 08:09:55
- OfStack
Preface
When using Python for data analysis, we often encounter time and date format processing and conversion, especially analyzing and mining time-related data. For example, quantitative trading is to find the change law of stock price from historical data. The module of processing time in Python includes datetime, and NumPy library also provides corresponding methods. As a data analysis library in Python environment, Pandas provides powerful date data processing function and is a sharp weapon for processing time series.
1. Generate a date sequence
pd.data_range () and pd.period_range () are mainly provided. The given parameters include start time, end time, number of generation periods and time frequency (freq= 'M' month, 'D' day, 'W', week, 'Y' year), etc.
The main difference between the two is that pd.date_range () generates a date sequence in DatetimeIndex format; pd. period_range () generates a date sequence in PeriodIndex format.
The following is compared by generating monthly time series and weekly time series:
date_rng = pd.date_range('2019-01-01', freq='M', periods=12)
print(f'month date_range() :
{date_rng}')
"""
date_range() :
DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
'2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
'2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],
dtype='datetime64[ns]', freq='M')
"""
period_rng = pd.period_range('2019/01/01', freq='M', periods=12)
print(f'month period_range() :
{period_rng}')
"""
period_range() :
PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06',
'2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12'],
dtype='period[M]', freq='M')
"""
date_rng = pd.date_range('2019-01-01', freq='W-SUN', periods=12)
print(f'week date_range() :
{date_rng}')
"""
week date_range() :
DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27',
'2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24',
'2019-03-03', '2019-03-10', '2019-03-17', '2019-03-24'],
dtype='datetime64[ns]', freq='W-SUN')
"""
period_rng=pd.period_range('2019-01-01',freq='W-SUN',periods=12)
print(f'week period_range() :
{period_rng}')
"""
week period_range() :
PeriodIndex(['2018-12-31/2019-01-06', '2019-01-07/2019-01-13',
'2019-01-14/2019-01-20', '2019-01-21/2019-01-27',
'2019-01-28/2019-02-03', '2019-02-04/2019-02-10',
'2019-02-11/2019-02-17', '2019-02-18/2019-02-24',
'2019-02-25/2019-03-03', '2019-03-04/2019-03-10',
'2019-03-11/2019-03-17', '2019-03-18/2019-03-24'],
dtype='period[W-SUN]', freq='W-SUN')
"""
date_rng = pd.date_range('2019-01-01 00:00:00', freq='H', periods=12)
print(f'hour date_range() :
{date_rng}')
"""
hour date_range() :
DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 01:00:00',
'2019-01-01 02:00:00', '2019-01-01 03:00:00',
'2019-01-01 04:00:00', '2019-01-01 05:00:00',
'2019-01-01 06:00:00', '2019-01-01 07:00:00',
'2019-01-01 08:00:00', '2019-01-01 09:00:00',
'2019-01-01 10:00:00', '2019-01-01 11:00:00'],
dtype='datetime64[ns]', freq='H')
"""
period_rng=pd.period_range('2019-01-01 00:00:00',freq='H',periods=12)
print(f'hour period_range() :
{period_rng}')
"""
hour period_range() :
PeriodIndex(['2019-01-01 00:00', '2019-01-01 01:00', '2019-01-01 02:00',
'2019-01-01 03:00', '2019-01-01 04:00', '2019-01-01 05:00',
'2019-01-01 06:00', '2019-01-01 07:00', '2019-01-01 08:00',
'2019-01-01 09:00', '2019-01-01 10:00', '2019-01-01 11:00'],
dtype='period[H]', freq='H')
"""
2. Generate Timestamp object and transform it
Creating an Timestamp timestamp object has the pd. Timestamp () method and the pd. to_datetime () method. As shown below:
ts=pd.Timestamp(2019,1,1)
print(f'pd.Timestamp()-1 : {ts}')
#pd.Timestamp()-1 : 2019-01-01 00:00:00
ts=pd.Timestamp(dt(2019,1,1,hour=0,minute=1,second=1))
print(f'pd.Timestamp()-2 : {ts}')
#pd.Timestamp()-2 : 2019-01-01 00:01:01
ts=pd.Timestamp("2019-1-1 0:1:1")
print(f'pd.Timestamp()-3 : {ts}')
#pd.Timestamp()-3 : 2019-01-01 00:01:01
print(f'pd.Timestamp()-type : {type(ts)}')
#pd.Timestamp()-type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>
#dt=pd.to_datetime(2019,1,1) Not supported
dt=pd.to_datetime(dt(2019,1,1,hour=0,minute=1,second=1))
print(f'pd.to_datetime()-1 : {dt}')
#pd.to_datetime()-1 : 2019-01-01 00:01:01
dt=pd.to_datetime("2019-1-1 0:1:1")
print(f'pd.to_datetime()-2 : {dt}')
#pd.to_datetime()-2 : 2019-01-01 00:01:01
print(f'pd.to_datetime()-type : {type(dt)}')
#pd.to_datetime()-type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>
#pd.to_datetime Generate a custom time series
dtlist=pd.to_datetime(["2019-1-1 0:1:1", "2019-3-1 0:1:1"])
print(f'pd.to_datetime()-list : {dtlist}')
#pd.to_datetime()-list : DatetimeIndex(['2019-01-01 00:01:01', '2019-03-01 00:01:01'], dtype='datetime64[ns]', freq=None)
# The timestamp is converted to period Month period
pr = ts.to_period('M')
print(f'ts.to_period() : {pr}')
#ts.to_period() : 2019-01
print(f'pd.to_period()-type : {type(pr)}')
#pd.to_period()-type : <class 'pandas._libs.tslibs.period.Period'>
3. Generate period object and transform it
# Definition period period
per=pd.Period('2019')
print(f'pd.Period() : {per}')
#pd.Period() : 2019
per_del=pd.Period('2019')-pd.Period('2018')
print(f'2019 And 2018 Interval {per_del} Year ')# It can be directly + , - Integer (representing year)
#2019 And 2018 Interval 1 Year
# Convert period to timestamp
print(per.to_timestamp(how='end'))#2019-12-31 00:00:00
print(per.to_timestamp(how='start'))#2019-01-01 00:00:00
4. Generate time interval Timedelta
# Generation interval Timedelta
print(pd.Timedelta(days=5, minutes=50, seconds=20, milliseconds=10, microseconds=10, nanoseconds=10))
#5 days 00:50:20.010010
# Get the current time
now=pd.datetime.now()
# Calculate the current time later 50 Date of days
dt=now+pd.Timedelta(days=50)
print(f' The current time is {now}, 50 The time after the day is {dt}')
# The current time is 2019-06-08 17:59:31.726065, 50 The time after the day is 2019-07-28 17:59:31.726065
# Show only year, month and day
print(dt.strftime('%Y-%m-%d'))#2019-07-28
5. Resampling and frequency conversion
#asfreq Display index values by quarter
#'DatetimeIndex' object has no attribute 'asfreq'
date=pd.date_range('1/1/2018', periods=20, freq='D')
tsdat_series=pd.Series(range(20),index=date)
tsp_series=tsdat_series.to_period('D')
print(tsp_series.index.asfreq('Q'))
date=pd.period_range('1/1/2018', periods=20, freq='D')
tsper_series=pd.Series(range(20),index=date)
print(tsper_series.index.asfreq('Q'))
"""
PeriodIndex(['2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',
'2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',
'2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',
'2018Q1', '2018Q1'],
dtype='period[Q-DEC]', freq='Q-DEC')
"""
#resample Statistics and display by quarter
print(tsdat_series.resample('Q').sum().to_period('Q'))
"""
2018Q1 190
Freq: Q-DEC, dtype: int64
"""
#groupby Summarize and average weekly
print(tsdat_series.groupby(lambda x:x.weekday).mean())
"""
0 7.0
1 8.0
2 9.0
3 10.0
4 11.0
5 12.0
6 9.5
dtype: float64
"""