Python concatenates a string in seven ways

  • 2021-01-18 06:34:21
  • OfStack

preface

I can't remember where I saw a programmer joke. He said that programmers do two things every day, and one of them is to process strings. I believe many students will feel the same way.

In Python, we often encounter the concatenation problem of strings, almost every programming language, strings as the most basic and indispensable data type. And concatenation string is a necessary skill. Today, I'm going to learn seven ways to concatenate strings in Python.

The following words are not much to say, let's take a look at the detailed introduction

1. % mode from ES11en


print('%s %s' % ('Hello', 'world'))
>>> Hello world

The way the % number formats strings is inherited from the old C language, which has similar implementations in many programming languages. The %s in the above example is a placeholder that represents only a string and is not the actual content of the concatenation. The actual concatenation content is placed in a tuple after a single % number.

Similar placeholders include %d (for an integer), %f (for a floating point number), %x (for a hexadecimal number), etc. The % placeholder is both a feature and a limitation of this concatenation, because each placeholder has a specific meaning that is too cumbersome to actually use.

2. format() splicing mode


#  Concise version 
s1 = 'Hello {}! My name is {}.'.format('World', 'Python The cat ')
print(s1)
>>>Hello World! My name is Python The cat .

#  Number seated version 
s2 = 'Hello {0}! My name is {1}.'.format('World', 'Python The cat ')
s3 = 'Hello {name1}! My name is {name2}.'.format(name1='World', name2='Python The cat ')
print(s2)
>>>Hello World! My name is Python The cat .
print(s3)
>>>Hello World! My name is Python The cat .

This way, the braces {} are used as placeholders, and the actual concatenation value is transferred in the format method. It is easy to see that it is actually an improvement on the % number concatenation. This is introduced in Python2.6.

In the example above, the compact version of the curly braces has no content, the drawback is that it is easy to get the order wrong. There are two main versions of the serial number, 1 is passed into the serial number, 1 is the use of key-value. In practice, we recommend the latter 1, not only will not count in wrong order, but also more intuitive and readable.

3. () is similar to tuple mode


s_tuple = ('Hello', ' ', 'world')
s_like_tuple = ('Hello' ' ' 'world')

print(s_tuple) 
>>>('Hello', ' ', 'world')
print(s_like_tuple) 
>>>Hello world

type(s_like_tuple) >>>str

Note that in the above example, s_like_tuple is not a tuple because there are no comma delimiters between the elements, which can be separated with or without Spaces. When we look at it using type(), we find that it is an str type. I can't find out why, but guess that the content in the () brackets is being optimized by Python.

This may seem fast, but the parentheses () require that the element be a real string, and you can't mix variables, so it's not flexible enough.


#  When multiple elements are present, variables are not supported 
str_1 = 'Hello'
str_2 = (str_1 'world')
>>> SyntaxError: invalid syntax
str_3 = (str_1 str_1)
>>> SyntaxError: invalid syntax
#  But the way I write it is not going to give you an error 
str_4 = (str_1)

4. Object Oriented Template Stitching


from string import Template
s = Template('${s1} ${s2}!') 
print(s.safe_substitute(s1='Hello',s2='world')) 
>>> Hello world!

To be honest, I don't like this implementation. A thick stench of being poisoned by object-oriented thinking.

I won't go into details.

5, commonly used + sign way


str_1 = 'Hello world !  ' 
str_2 = 'My name is Python The cat .'
print(str_1 + str_2)
>>>Hello world !  My name is Python The cat .
print(str_1)
>>>Hello world !  

This approach is the most common, intuitive, understandable, and entry-level implementation. However, there are two fallible places.

First of all, novice programmers make mistakes. They don't know that strings are immutable, and that new strings hog 1 new chunk of memory, while the original string stays the same. In the example above, there are two strings before concatenation and three strings after concatenation.

Secondly, some experienced programmers make the mistake of assuming that the + hypen is faster than the other way around if the number of concatenations is less than 3 (ps: many Python tutorials suggest this), but there is no reasonable reason for this.

In fact, when concatenating short literals, these literals are converted to shorter forms due to the constant folding (constant folding) function in CPython, for example 'a'+'b'+'c' is converted to 'abc' and 'hello'+'world' is also converted to 'hello world'. This conversion is done at compile time, and no concatenation occurs at run time, thus speeding up the overall computation.

Constant folding optimization has one limit, which requires that the length of the stitching result should not exceed 20. So, when the concatenated final string is no longer than 20, the + operator is much faster than, say, join, mentioned later, regardless of the number of times the + is used.

As an aside: Do the number 20 sound familiar to you? That's right. What is a "privileged race" in Python? String class privileges are also limited to 20. There was also an example of the difference between compile-time and run-time, which I suggest you go back to.

6. join() splicing mode


str_list = ['Hello', 'world']
str_join1 = ' '.join(str_list)
str_join2 = '-'.join(str_list)
print(str_join1) >>>Hello world
print(str_join2) >>>Hello-world

The join() method of the str object takes 1 sequence parameter and can realize splicing. When concatenating elements that are not strings, the first conversion is required. As you can see, this approach works well for concatenating elements in a sequence object (such as a list) and setting a uniform 1 spacer.

When the splice length is over 20, this approach is generally preferred. However, it has the disadvantage that it is not suitable for piecemeal concatenation of elements that are not in a set of sequences.

7, f-string


name = 'world'
myname = 'python_cat'
words = f'Hello {name}. My name is {myname}.'
print(words)
>>> Hello world. My name is python_cat.

f-string method is from PEP 498 (Literal String Interpolation, literal string interpolation), and was introduced from Python3.6. It is characterized by the f identifier in front of the string and the curly braces {} around other string variables in the middle.

This approach beats ES119en () in readability and is comparable to ES120en () in handling concatenation of long strings.

However, this approach is less elegant than some other programming languages because it introduces an f flag. Some other programming languages can be more concise, such as shell:


name="world"
myname="python_cat"
words="Hello ${name}. My name is ${myname}."
echo $words
>>>Hello world. My name is python_cat.

In summary 1, when we talk about "string concatenation", we actually understand it in terms of the result. In terms of implementation principles, we can divide these methods into three types:

[

Formatting classes: %, format(), template

Splicing classes: +, (), join()

Interpolation class: f-string

]

When dealing with sequence structure such as string list, use join() mode; When the splicing length is less than 20, the + sign operator is used. If the length is over 20, f-string for high version, format() or join() for low version.

One more thing:

You think this is about to end?

Pattern! That's not my style!

My style is divergent thinking, systems thinking, and philosophies of programming.

Recently, I was reading Hackers and Painters, Paul. Graham addresses this question in his book:

Semantically, a string can be more or less understood as a subset of a list, where each element is a character. So, why do we need a single column of a string as a data structure?

I was blown away by the author's opinion that "programming language strings seem to be an example of premature optimization"! The seven methods of splicing strings mentioned above instantly turn into paper so thin that it seems to break at the touch of a hand.

But the author thinks that's not enough. He has an even more surprising idea:

There are even more alarming predictions. Logically, there is no need to set up a separate representation for integers, since they can also be considered lists, and the integer n can be represented as a list of n elements. ... Will programming languages go so far as to abandon integers of one of the basic data types?

I don't know what you think after reading this paragraph. As I read, despite the context, I was amazed.

The next book series will be titled "Hackers and Painters", and there will be a lucky draw to give away one copy of "Hackers and Painters". Please keep your eyes open.

There are several PEP links:

https://www.python.org/dev/peps/pep-0215/ https://www.python.org/dev/peps/pep-0292/ https://www.python.org/dev/peps/pep-3101/ https://www.python.org/dev/peps/pep-0498/

conclusion


Related articles: