Detailed Explanation of Time Series Processing in Basic Analysis of python Pandas Library

  • 2021-07-18 08:09:55
  • OfStack

Preface

When using Python for data analysis, we often encounter time and date format processing and conversion, especially analyzing and mining time-related data. For example, quantitative trading is to find the change law of stock price from historical data. The module of processing time in Python includes datetime, and NumPy library also provides corresponding methods. As a data analysis library in Python environment, Pandas provides powerful date data processing function and is a sharp weapon for processing time series.

1. Generate a date sequence

pd.data_range () and pd.period_range () are mainly provided. The given parameters include start time, end time, number of generation periods and time frequency (freq= 'M' month, 'D' day, 'W', week, 'Y' year), etc.

The main difference between the two is that pd.date_range () generates a date sequence in DatetimeIndex format; pd. period_range () generates a date sequence in PeriodIndex format.

The following is compared by generating monthly time series and weekly time series:


date_rng = pd.date_range('2019-01-01', freq='M', periods=12)
print(f'month date_range() : 
{date_rng}')
"""
date_range() : 
DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
 '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
 '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],
 dtype='datetime64[ns]', freq='M')
"""
period_rng = pd.period_range('2019/01/01', freq='M', periods=12)
print(f'month period_range() : 
{period_rng}')
"""
period_range() : 
PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06',
 '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12'],
 dtype='period[M]', freq='M')
"""
date_rng = pd.date_range('2019-01-01', freq='W-SUN', periods=12)
print(f'week date_range() : 
{date_rng}')
"""
week date_range() : 
DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27',
 '2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24',
 '2019-03-03', '2019-03-10', '2019-03-17', '2019-03-24'],
 dtype='datetime64[ns]', freq='W-SUN')
"""
period_rng=pd.period_range('2019-01-01',freq='W-SUN',periods=12)
print(f'week period_range() : 
{period_rng}')
"""
week period_range() : 
PeriodIndex(['2018-12-31/2019-01-06', '2019-01-07/2019-01-13',
 '2019-01-14/2019-01-20', '2019-01-21/2019-01-27',
 '2019-01-28/2019-02-03', '2019-02-04/2019-02-10',
 '2019-02-11/2019-02-17', '2019-02-18/2019-02-24',
 '2019-02-25/2019-03-03', '2019-03-04/2019-03-10',
 '2019-03-11/2019-03-17', '2019-03-18/2019-03-24'],
 dtype='period[W-SUN]', freq='W-SUN')
"""
date_rng = pd.date_range('2019-01-01 00:00:00', freq='H', periods=12)
print(f'hour date_range() : 
{date_rng}')
"""
hour date_range() : 
DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 01:00:00',
 '2019-01-01 02:00:00', '2019-01-01 03:00:00',
 '2019-01-01 04:00:00', '2019-01-01 05:00:00',
 '2019-01-01 06:00:00', '2019-01-01 07:00:00',
 '2019-01-01 08:00:00', '2019-01-01 09:00:00',
 '2019-01-01 10:00:00', '2019-01-01 11:00:00'],
 dtype='datetime64[ns]', freq='H')
"""
period_rng=pd.period_range('2019-01-01 00:00:00',freq='H',periods=12)
print(f'hour period_range() : 
{period_rng}')
"""
hour period_range() : 
PeriodIndex(['2019-01-01 00:00', '2019-01-01 01:00', '2019-01-01 02:00',
 '2019-01-01 03:00', '2019-01-01 04:00', '2019-01-01 05:00',
 '2019-01-01 06:00', '2019-01-01 07:00', '2019-01-01 08:00',
 '2019-01-01 09:00', '2019-01-01 10:00', '2019-01-01 11:00'],
 dtype='period[H]', freq='H')
"""

2. Generate Timestamp object and transform it

Creating an Timestamp timestamp object has the pd. Timestamp () method and the pd. to_datetime () method. As shown below:


ts=pd.Timestamp(2019,1,1)
print(f'pd.Timestamp()-1 : {ts}')
#pd.Timestamp()-1 : 2019-01-01 00:00:00
ts=pd.Timestamp(dt(2019,1,1,hour=0,minute=1,second=1))
print(f'pd.Timestamp()-2 : {ts}')
#pd.Timestamp()-2 : 2019-01-01 00:01:01
ts=pd.Timestamp("2019-1-1 0:1:1")
print(f'pd.Timestamp()-3 : {ts}')
#pd.Timestamp()-3 : 2019-01-01 00:01:01
print(f'pd.Timestamp()-type : {type(ts)}')
#pd.Timestamp()-type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>
#dt=pd.to_datetime(2019,1,1)  Not supported 
dt=pd.to_datetime(dt(2019,1,1,hour=0,minute=1,second=1))
print(f'pd.to_datetime()-1 : {dt}')
#pd.to_datetime()-1 : 2019-01-01 00:01:01
dt=pd.to_datetime("2019-1-1 0:1:1")
print(f'pd.to_datetime()-2 : {dt}')
#pd.to_datetime()-2 : 2019-01-01 00:01:01
print(f'pd.to_datetime()-type : {type(dt)}')
#pd.to_datetime()-type : <class 'pandas._libs.tslibs.timestamps.Timestamp'>
#pd.to_datetime Generate a custom time series 
dtlist=pd.to_datetime(["2019-1-1 0:1:1", "2019-3-1 0:1:1"])
print(f'pd.to_datetime()-list : {dtlist}')
#pd.to_datetime()-list : DatetimeIndex(['2019-01-01 00:01:01', '2019-03-01 00:01:01'], dtype='datetime64[ns]', freq=None)
# The timestamp is converted to period Month period 
pr = ts.to_period('M')
print(f'ts.to_period() : {pr}')
#ts.to_period() : 2019-01
print(f'pd.to_period()-type : {type(pr)}')
#pd.to_period()-type : <class 'pandas._libs.tslibs.period.Period'>

3. Generate period object and transform it


# Definition period period
per=pd.Period('2019')
print(f'pd.Period() : {per}')
#pd.Period() : 2019
per_del=pd.Period('2019')-pd.Period('2018')
print(f'2019 And 2018 Interval {per_del} Year ')# It can be directly + , - Integer (representing year) 
#2019 And 2018 Interval 1 Year 
# Convert period to timestamp 
print(per.to_timestamp(how='end'))#2019-12-31 00:00:00
print(per.to_timestamp(how='start'))#2019-01-01 00:00:00

4. Generate time interval Timedelta


# Generation interval Timedelta
print(pd.Timedelta(days=5, minutes=50, seconds=20, milliseconds=10, microseconds=10, nanoseconds=10))
#5 days 00:50:20.010010
# Get the current time 
now=pd.datetime.now()
# Calculate the current time later 50 Date of days 
dt=now+pd.Timedelta(days=50)
print(f' The current time is {now}, 50 The time after the day is {dt}')
# The current time is 2019-06-08 17:59:31.726065, 50 The time after the day is 2019-07-28 17:59:31.726065
# Show only year, month and day 
print(dt.strftime('%Y-%m-%d'))#2019-07-28

5. Resampling and frequency conversion


#asfreq  Display index values by quarter 
#'DatetimeIndex' object has no attribute 'asfreq'
date=pd.date_range('1/1/2018', periods=20, freq='D')
tsdat_series=pd.Series(range(20),index=date)
tsp_series=tsdat_series.to_period('D')
print(tsp_series.index.asfreq('Q'))
date=pd.period_range('1/1/2018', periods=20, freq='D')
tsper_series=pd.Series(range(20),index=date)
print(tsper_series.index.asfreq('Q'))
"""
PeriodIndex(['2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',
 '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',
 '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1',
 '2018Q1', '2018Q1'],
 dtype='period[Q-DEC]', freq='Q-DEC')
"""
#resample  Statistics and display by quarter 
print(tsdat_series.resample('Q').sum().to_period('Q'))
"""
2018Q1 190
Freq: Q-DEC, dtype: int64
"""
#groupby  Summarize and average weekly 
print(tsdat_series.groupby(lambda x:x.weekday).mean())
"""
0 7.0
1 8.0
2 9.0
3 10.0
4 11.0
5 12.0
6 9.5
dtype: float64
"""

Related articles: