The subscript problem of Sequence section in Python is explained in detail
- 2020-06-03 07:11:38
- OfStack
preface
In python, a slice is a common syntax, whether it's tuples, lists, or strings. The general syntax is:
sequence[ilow:ihigh:step] # ihigh
,step can be empty; For the sake of brevity, exclude step for now
Let's start with a quick demonstration of how to use it
sequence = [1,2,3,4,5]
sequence [ilow:ihigh] # from ilow Start to ihigh-1 The end of the
sequence [ilow:] # from ilow Start to end
sequence [:ihigh] # Start at the head and go all the way up ihigh The end of the
sequence [:] # Copy the entire list
The grammar is simple and easy to understand. It is simple and easy to use in our daily use. However, I believe that we will follow the following rules when we use the sliced grammar:
Both ilow and ihigh are smaller than the length of sequece ilow < ihighBecause in most cases, only by following the above rules can we get the results we expect! But what if I don't? What happens to slices ?
Whether we're using tuples, lists, or strings, when we want to take 1 element in, we use the following syntax:
sequence = [1,2,3,4,5]
print sequence[1] # The output 2
print sequence[2] # The output 3
The 1,2 that appears above is called a subscript. Whether it is a tuple, a list, or a string, we can extract the corresponding value by subscript, but if the subscript exceeds the length of the object, an index exception will be raised (IndexError).
sequence = [1,2,3,4,5]
print sequence[15]
### The output ###
Traceback (most recent call last):
File "test.py", line 2, in <module>
print a[20]
IndexError: list index out of range
What about slices ? The two grammars are very similar. Suppose I have ilow and ihigh which are 10 and 20 respectively
Scene again
# version: python2.7
a = [1, 2, 3, 5]
print a[10:20] # Will the results be abnormal ?
Seeing that 10 and 20 are completely beyond the length of the sequence a, because of the previous code, or previous experience, we always feel that this will definitely lead to an IndexError, so let's open the terminal and experiment:
>>> a = [1, 2, 3, 5]
>>> print a[10:20]
[]
It turns out to be: [], which is kind of interesting.
>>> s = '23123123123'
>>> print s[400:2000]
''
>>> t = (1, 2, 3,4)
>>> print t[200: 1000]
()
The result is similar to that of the list, and returns an empty result of its own.
When we saw the result, we cried, instead of returning 1 IndexError, we directly returned the empty space, which made us think that in fact, the syntax is similar, but the things behind it must be different. Let's try to explain the result in the following 1
The principle of analysis
Before unfolding, we need to find out how python handles this slice, which can be assisted by the dis module:
############# slice ################
[root@iZ23pynfq19Z ~]# cat test.py
a = [11,2,3,4]
print a[20:30]
# The results of :
[root@iZ23pynfq19Z ~]# python -m dis test.py
1 0 LOAD_CONST 0 (11)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
9 LOAD_CONST 3 (4)
12 BUILD_LIST 4
15 STORE_NAME 0 (a)
2 18 LOAD_NAME 0 (a)
21 LOAD_CONST 4 (20)
24 LOAD_CONST 5 (30)
27 SLICE+3
28 PRINT_ITEM
29 PRINT_NEWLINE
30 LOAD_CONST 6 (None)
33 RETURN_VALUE
############# Student: Single index ################
[root@gitlab ~]# cat test2.py
a = [11,2,3,4]
print a[20]
# The results of :
[root@gitlab ~]# python -m dis test2.py
1 0 LOAD_CONST 0 (11)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
9 LOAD_CONST 3 (4)
12 BUILD_LIST 4
15 STORE_NAME 0 (a)
2 18 LOAD_NAME 0 (a)
21 LOAD_CONST 4 (20)
24 BINARY_SUBSCR
25 PRINT_ITEM
26 PRINT_NEWLINE
27 LOAD_CONST 5 (None)
30 RETURN_VALUE
dis module under the simple introduction, experienced older drivers know that python explains script, there was a compilation process, compiled as a result, we often see pyc file, inside this codeobject object of bytecode, these bytecode and dis is displayed in a respectable manner, let us see the execution process, below is the output columns dis explanation:
Column 1 is the number that is the line number of the original source code. Column 2 is the bytecode offset: LOAD_CONST at line 0, and so on. The third column is the bytecode human-readable name. They are for programmers Column 4 represents the parameters of the instruction Column 5 is the actual calculated parametersIt is the process of reading constant storage variables. The main difference is: test py slice is using bytecode SLICE + 3, and test2. py single subscript values mainly through the bytecode BINARY_SUBSCR, as we guess 1 kind, grammar is similar to that of different code. Because we want to discuss is sliced (SLICE + 3), so it is no longer a BINARY_SUBSCR, interested in children's shoes can view the relevant source code to understand the specific implementation, location: python object/ceval c
So let's talk about SLICE+3
/* Taken from the : python2.7 python/ceval.c */
// The first 1 step :
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
.... // omit n Lines of code
TARGET_WITH_IMPL_NOARG(SLICE, _slice)
TARGET_WITH_IMPL_NOARG(SLICE_1, _slice)
TARGET_WITH_IMPL_NOARG(SLICE_2, _slice)
TARGET_WITH_IMPL_NOARG(SLICE_3, _slice)
_slice:
{
if ((opcode-SLICE) & 2)
w = POP();
else
w = NULL;
if ((opcode-SLICE) & 1)
v = POP();
else
v = NULL;
u = TOP();
x = apply_slice(u, v, w); // Take out the v: ilow, w: ihigh, And then call apply_slice
Py_DECREF(u);
Py_XDECREF(v);
Py_XDECREF(w);
SET_TOP(x);
if (x != NULL) DISPATCH();
break;
}
.... // omit n Lines of code
}
// The first 2 step :
apply_slice(PyObject *u, PyObject *v, PyObject *w) /* return u[v:w] */
{
PyTypeObject *tp = u->ob_type;
PySequenceMethods *sq = tp->tp_as_sequence;
if (sq && sq->sq_slice && ISINDEX(v) && ISINDEX(w)) { // v,w Type check of , To integer / Long integer objects
Py_ssize_t ilow = 0, ihigh = PY_SSIZE_T_MAX;
if (!_PyEval_SliceIndex(v, &ilow)) // will v The object is checked again , And convert it to its value , Save to ilow
return NULL;
if (!_PyEval_SliceIndex(w, &ihigh)) // Same as above
return NULL;
return PySequence_GetSlice(u, ilow, ihigh); // To obtain u The corresponding slice function of the object
}
else {
PyObject *slice = PySlice_New(v, w, NULL);
if (slice != NULL) {
PyObject *res = PyObject_GetItem(u, slice);
Py_DECREF(slice);
return res;
}
else
return NULL;
}
// The first 3 step :
PySequence_GetSlice(PyObject *s, Py_ssize_t i1, Py_ssize_t i2)
{
PySequenceMethods *m;
PyMappingMethods *mp;
if (!s) return null_error();
m = s->ob_type->tp_as_sequence;
if (m && m->sq_slice) {
if (i1 < 0 || i2 < 0) {
if (m->sq_length) {
// So let's do a simple initialization , If the left and right table is less than , To add sequence The length makes it zero 0
Py_ssize_t l = (*m->sq_length)(s);
if (l < 0)
return NULL;
if (i1 < 0)
i1 += l;
if (i2 < 0)
i2 += l;
}
}
// To actually call the object sq_slice function , To perform the slicing operation
return m->sq_slice(s, i1, i2);
} else if ((mp = s->ob_type->tp_as_mapping) && mp->mp_subscript) {
PyObject *res;
PyObject *slice = _PySlice_FromIndices(i1, i2);
if (!slice)
return NULL;
res = mp->mp_subscript(s, slice);
Py_DECREF(slice);
return res;
}
return type_error("'%.200s' object is unsliceable", s);
Although the above code is a bit long, the key points are commented out, and it's enough to focus on those points
m->sq_slice(s, i1, i2)
", but this sq_slice is a bit special because it corresponds to different functions for different objects. Here are the corresponding functions:
// String object
StringObject.c: (ssizessizeargfunc)string_slice, /*sq_slice*/
// A list of objects
ListObject.c: (ssizessizeargfunc)list_slice, /* sq_slice */
// tuples
TupleObject.c: (ssizessizeargfunc)tupleslice, /* sq_slice */
Since the implementation of the three functions is roughly the same, we can only analyze one of them. The following is the section function analysis of the list:
/* Taken from the ListObject.c */
static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
PyListObject *np;
PyObject **src, **dest;
Py_ssize_t i, len;
if (ilow < 0)
ilow = 0;
else if (ilow > Py_SIZE(a)) // if ilow Is greater than a The length of the , So re-assign it to theta a The length of the
ilow = Py_SIZE(a);
if (ihigh < ilow)
ihigh = ilow;
else if (ihigh > Py_SIZE(a)) // if ihigh Is greater than a The length of the , So re-assign it to theta a The length of the
ihigh = Py_SIZE(a);
len = ihigh - ilow;
np = (PyListObject *) PyList_New(len); // create 1 a ihigh - ilow Is a new list object
if (np == NULL)
return NULL;
src = a->ob_item + ilow;
dest = np->ob_item;
for (i = 0; i < len; i++) { // will a A member in that range , Add to new list object
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
return (PyObject *)np;
}
conclusion
It can be seen from the section function corresponding to sq_slice above that, if the left and right subscripts are larger than the length of sequence when using the section, they will be re-assigned to the length of sequence. Therefore, the section we started with 1:
print a[10:20]
, which actually runs:
print a4:4
Through this analysis, in the future when the index is larger than the length of the slice, should not be muddlehead ~