The subscript problem of Sequence section in Python is explained in detail

2020-06-03 07:11:38
OfStack

preface

In python, a slice is a common syntax, whether it's tuples, lists, or strings. The general syntax is:

sequence[ilow:ihigh:step] # ihigh ,step can be empty; For the sake of brevity, exclude step for now

Let's start with a quick demonstration of how to use it


sequence = [1,2,3,4,5]
sequence [ilow:ihigh] #  from ilow Start to ihigh-1 The end of the 
sequence [ilow:]  #  from ilow Start to end 
sequence [:ihigh]  #  Start at the head and go all the way up ihigh The end of the 
sequence [:]   #  Copy the entire list

The grammar is simple and easy to understand. It is simple and easy to use in our daily use. However, I believe that we will follow the following rules when we use the sliced grammar:

Both ilow and ihigh are smaller than the length of sequece ilow < ihigh

Because in most cases, only by following the above rules can we get the results we expect! But what if I don't? What happens to slices ?

Whether we're using tuples, lists, or strings, when we want to take 1 element in, we use the following syntax:


sequence = [1,2,3,4,5]
print sequence[1] #  The output 2
print sequence[2] #  The output 3

The 1,2 that appears above is called a subscript. Whether it is a tuple, a list, or a string, we can extract the corresponding value by subscript, but if the subscript exceeds the length of the object, an index exception will be raised (IndexError).


sequence = [1,2,3,4,5]
print sequence[15] 

###  The output  ###
Traceback (most recent call last):
 File "test.py", line 2, in <module>
 print a[20]
IndexError: list index out of range

What about slices ? The two grammars are very similar. Suppose I have ilow and ihigh which are 10 and 20 respectively

Scene again


# version: python2.7

a = [1, 2, 3, 5]
print a[10:20] #  Will the results be abnormal ?

Seeing that 10 and 20 are completely beyond the length of the sequence a, because of the previous code, or previous experience, we always feel that this will definitely lead to an IndexError, so let's open the terminal and experiment:


>>> a = [1, 2, 3, 5]
>>> print a[10:20]
[]

It turns out to be: [], which is kind of interesting.


>>> s = '23123123123'
>>> print s[400:2000]
''
>>> t = (1, 2, 3,4)
>>> print t[200: 1000]
()

The result is similar to that of the list, and returns an empty result of its own.

When we saw the result, we cried, instead of returning 1 IndexError, we directly returned the empty space, which made us think that in fact, the syntax is similar, but the things behind it must be different. Let's try to explain the result in the following 1

The principle of analysis

Before unfolding, we need to find out how python handles this slice, which can be assisted by the dis module:


#############  slice  ################
[root@iZ23pynfq19Z ~]# cat test.py
a = [11,2,3,4]
print a[20:30]

# The results of :
[root@iZ23pynfq19Z ~]# python -m dis test.py 
 1   0 LOAD_CONST    0 (11)
    3 LOAD_CONST    1 (2)
    6 LOAD_CONST    2 (3)
    9 LOAD_CONST    3 (4)
    12 BUILD_LIST    4
    15 STORE_NAME    0 (a)

 2   18 LOAD_NAME    0 (a)
    21 LOAD_CONST    4 (20)
    24 LOAD_CONST    5 (30)
    27 SLICE+3    
    28 PRINT_ITEM   
    29 PRINT_NEWLINE  
    30 LOAD_CONST    6 (None)
    33 RETURN_VALUE 

#############  Student: Single index  ################
[root@gitlab ~]# cat test2.py
a = [11,2,3,4]
print a[20]

# The results of :
[root@gitlab ~]# python -m dis test2.py
 1   0 LOAD_CONST    0 (11)
    3 LOAD_CONST    1 (2)
    6 LOAD_CONST    2 (3)
    9 LOAD_CONST    3 (4)
    12 BUILD_LIST    4
    15 STORE_NAME    0 (a)

 2   18 LOAD_NAME    0 (a)
    21 LOAD_CONST    4 (20)
    24 BINARY_SUBSCR  
    25 PRINT_ITEM   
    26 PRINT_NEWLINE  
    27 LOAD_CONST    5 (None)
    30 RETURN_VALUE

dis module under the simple introduction, experienced older drivers know that python explains script, there was a compilation process, compiled as a result, we often see pyc file, inside this codeobject object of bytecode, these bytecode and dis is displayed in a respectable manner, let us see the execution process, below is the output columns dis explanation:

Column 1 is the number that is the line number of the original source code. Column 2 is the bytecode offset: LOAD_CONST at line 0, and so on. The third column is the bytecode human-readable name. They are for programmers Column 4 represents the parameters of the instruction Column 5 is the actual calculated parameters

It is the process of reading constant storage variables. The main difference is: test py slice is using bytecode SLICE + 3, and test2. py single subscript values mainly through the bytecode BINARY_SUBSCR, as we guess 1 kind, grammar is similar to that of different code. Because we want to discuss is sliced (SLICE + 3), so it is no longer a BINARY_SUBSCR, interested in children's shoes can view the relevant source code to understand the specific implementation, location: python object/ceval c

So let's talk about SLICE+3


/* Taken from the : python2.7 python/ceval.c */

//  The first 1 step : 
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
  .... //  omit n Lines of code 
  TARGET_WITH_IMPL_NOARG(SLICE, _slice)
  TARGET_WITH_IMPL_NOARG(SLICE_1, _slice)
  TARGET_WITH_IMPL_NOARG(SLICE_2, _slice)
  TARGET_WITH_IMPL_NOARG(SLICE_3, _slice)
  _slice:
  {
   if ((opcode-SLICE) & 2)
    w = POP();
   else
    w = NULL;
   if ((opcode-SLICE) & 1)
    v = POP();
   else
    v = NULL;
   u = TOP();
   x = apply_slice(u, v, w); //  Take out the v: ilow, w: ihigh,  And then call apply_slice
   Py_DECREF(u);
   Py_XDECREF(v);
   Py_XDECREF(w);
   SET_TOP(x);
   if (x != NULL) DISPATCH();
   break;
  }

 .... //  omit n Lines of code 
}

//  The first 2 step :
apply_slice(PyObject *u, PyObject *v, PyObject *w) /* return u[v:w] */
{
 PyTypeObject *tp = u->ob_type;  
 PySequenceMethods *sq = tp->tp_as_sequence;

 if (sq && sq->sq_slice && ISINDEX(v) && ISINDEX(w)) { // v,w Type check of , To integer / Long integer objects 
  Py_ssize_t ilow = 0, ihigh = PY_SSIZE_T_MAX;
  if (!_PyEval_SliceIndex(v, &ilow))    //  will v The object is checked again ,  And convert it to its value , Save to ilow
   return NULL;
  if (!_PyEval_SliceIndex(w, &ihigh))    //  Same as above 
   return NULL;
  return PySequence_GetSlice(u, ilow, ihigh);  //  To obtain u The corresponding slice function of the object 
 }
 else {
  PyObject *slice = PySlice_New(v, w, NULL);
  if (slice != NULL) {
   PyObject *res = PyObject_GetItem(u, slice);
   Py_DECREF(slice);
   return res;
  }
  else
   return NULL;
 }

//  The first 3 step :
PySequence_GetSlice(PyObject *s, Py_ssize_t i1, Py_ssize_t i2)
{
 PySequenceMethods *m;
 PyMappingMethods *mp;

 if (!s) return null_error();

 m = s->ob_type->tp_as_sequence;
 if (m && m->sq_slice) {
  if (i1 < 0 || i2 < 0) {
   if (m->sq_length) {
    //  So let's do a simple initialization ,  If the left and right table is less than ,  To add sequence The length makes it zero 0
    Py_ssize_t l = (*m->sq_length)(s);
    if (l < 0)
     return NULL;
    if (i1 < 0)
     i1 += l;
    if (i2 < 0)
     i2 += l;
   }
  }
  //  To actually call the object sq_slice function ,  To perform the slicing operation 
  return m->sq_slice(s, i1, i2);
 } else if ((mp = s->ob_type->tp_as_mapping) && mp->mp_subscript) {
  PyObject *res;
  PyObject *slice = _PySlice_FromIndices(i1, i2);
  if (!slice)
   return NULL;
  res = mp->mp_subscript(s, slice);
  Py_DECREF(slice);
  return res;
 }

 return type_error("'%.200s' object is unsliceable", s);

Although the above code is a bit long, the key points are commented out, and it's enough to focus on those points m->sq_slice(s, i1, i2) ", but this sq_slice is a bit special because it corresponds to different functions for different objects. Here are the corresponding functions:


//  String object 
StringObject.c: (ssizessizeargfunc)string_slice, /*sq_slice*/

//  A list of objects 
ListObject.c: (ssizessizeargfunc)list_slice,  /* sq_slice */

//  tuples 
TupleObject.c: (ssizessizeargfunc)tupleslice,  /* sq_slice */

Since the implementation of the three functions is roughly the same, we can only analyze one of them. The following is the section function analysis of the list:


/*  Taken from the ListObject.c */
static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
 PyListObject *np;
 PyObject **src, **dest;
 Py_ssize_t i, len;
 if (ilow < 0)
  ilow = 0;
 else if (ilow > Py_SIZE(a))    //  if ilow Is greater than a The length of the ,  So re-assign it to theta a The length of the 
  ilow = Py_SIZE(a);
 if (ihigh < ilow)  
  ihigh = ilow;
 else if (ihigh > Py_SIZE(a))    //  if ihigh Is greater than a The length of the ,  So re-assign it to theta a The length of the  
  ihigh = Py_SIZE(a);
 len = ihigh - ilow;
 np = (PyListObject *) PyList_New(len); //  create 1 a ihigh - ilow Is a new list object 
 if (np == NULL)
  return NULL;

 src = a->ob_item + ilow;
 dest = np->ob_item;
 for (i = 0; i < len; i++) {    //  will a A member in that range ,  Add to new list object 
  PyObject *v = src[i];
  Py_INCREF(v);
  dest[i] = v;
 }
 return (PyObject *)np;
}

conclusion

It can be seen from the section function corresponding to sq_slice above that, if the left and right subscripts are larger than the length of sequence when using the section, they will be re-assigned to the length of sequence. Therefore, the section we started with 1: print a[10:20] , which actually runs: print a4:4 Through this analysis, in the future when the index is larger than the length of the slice, should not be muddlehead ~