Strings
Python strings are just pieces of text.
>>> our_string = "Hello World!"
>>> our_string
'Hello World!'
>>>
So far we know how to add them together.
>>> "I said: " + our_string
'I said: Hello World!'
>>>
We also know how to repeat them multiple times.
>>> our_string * 3
'Hello World!Hello World!Hello World!'
>>>
Python strings are immutable.
That's just a fancy way to say that they cannot be changed in-place, and we need to create a new string to change them. Even some_string += another_string
creates a new string.
Python will treat that as some_string = some_string + another_string
, so it creates a new string but it puts it back to the same variable.
+
and *
are nice, but what else can we do with strings?
Slicing
Slicing is really simple. It just means getting a part of the string.
For example, to get all characters between the second place between the characters and the fifth place between the characters, we can do this:
>>> our_string[2:5]
'llo'
>>>
So the syntax is like some_string[start:end]
.
But what happens if we slice with negative values?
>>> our_string[-5:-2]
'orl'
>>>
It turns out that slicing with negative values simply starts counting from the end of the string.
If we don't specify the beginning it defaults to 0, and if we do not specify the end it defaults to the length of the string. For example, we can get everything except the first or last character like this:
>>> our_string[1:]
'ello World!'
>>> our_string[:-1]
'Hello World'
>>>
Remember that strings can't be changed in-place.
>>> our_string[:5] = 'Howdy'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
>>>
There's also a step argument we can give to our slices, but I am not going to talk about it now.
Indexing
So now we know how slicing works. But what happens if we forget the :
?
>>> our_string[1]
'e'
>>>
That is interesting. We got a string that is only one character long. But the first character of Hello World!
should be H
, not e
, so why did we get an e?
Programming starts at zero. Indexing strings also starts at zero. The first character is our_string[0]
, the second character is our_string[1]
, and so on.
>>> our_string[0]
'H'
>>> our_string[1]
'e'
>>> our_string[2]
'l'
>>> our_string[3]
'l'
>>> our_string[4]
'o'
>>>
How about negative values?
>>> our_string[-1]
'!'
>>>
We got the last character.
But why didn't that start at zero? our_string[-1]
is the last character, but our_string[1]
is not the first character!
That's because 0 and -0 are equal, so indexing with -0 would do the same thing as indexing with 0.
Indexing with negative values works like this:
String methods
Python's strings have many useful methods.
The official documentation covers them all, but I'm going to just show some of the most commonly used ones briefly.
Python also comes with built-in documentation about the string methods and we can run help(str)
to read it. We can also get help about one string method at a time, like help(str.upper)
.
Again, nothing can modify strings in-place. Most string methods return a new string, but things like our_string = our_string.upper()
still work because the new string is assigned to the old variable.
Also note that all of these methods are used like our_string.stuff()
, not like stuff(our_string)
. The idea with that is that our string knows how to do all these things, like our_string.stuff()
, we don't need a separate function that does these things like stuff(our_string)
.
Here's an example with some of the most commonly used string methods:
>>> our_string.upper()
'HELLO WORLD!'
>>> our_string.lower()
'hello world!'
>>> our_string.startswith('Hello')
True
>>> our_string.endswith('World!')
True
>>> our_string.endswith('world!') # Python is case-sensitive
False
>>> our_string.replace('World', 'there')
'Hello there!'
>>> our_string.replace('o', '@', 1) # only replace one o
'Hell@ World!'
>>> ' hello 123 '.lstrip() # left strip
'hello 123 '
>>> ' hello 123 '.rstrip() # right strip
' hello 123'
>>> ' hello 123 '.strip() # strip from both sides
'hello 123'
>>> ' hello abc'.rstrip('cb') # strip c's and b's from right
' hello a'
>>> our_string.ljust(30, '-')
'Hello World!------------------'
>>> our_string.rjust(30, '-')
'------------------Hello World!'
>>> our_string.center(30, '-')
'---------Hello World!---------'
>>> our_string.count('o') # it contains two o's
2
>>> our_string.index('o') # the first o is our_string[4]
4
>>> our_string.rindex('o') # the last o is our_string[7]
7
>>> '-'.join(['hello', 'world', 'test'])
'hello-world-test'
>>> 'hello-world-test'.split('-')
['hello', 'world', 'test']
>>> our_string.upper()[3:].startswith('LO WOR') # combining multiple things
True
>>>
The things in square brackets that the split method gave us and we gave to the join method were lists.
String formatting
To add a string in the middle of another string, we can do something like this:
>>> name = 'Akuli'
>>> 'My name is ' + name + '.'
'My name is Akuli.'
>>>
But that gets complicated if we have many things to add.
>>> channel = '##learnpython'
>>> network = 'freenode'
>>> "My name is " + name + " and I'm on the " + channel + " channel on " + network + "."
"My name is Akuli and I'm on the ##learnpython channel on freenode."
>>>
Instead it's recommended to use string formatting. It means putting other things in the middle of a string.
Python has multiple ways to format strings. One is not necessarily better than others, they are just different. Here's a few ways to solve our problem:
-
.format()
-formatting, also known as new-style formatting. This
formatting style has a lot of features, but it's a little bit more
typing than%s
-formatting.>>> "Hello {}.".format(name) 'Hello Akuli.' >>> "My name is {} and I'm on the {} channel on {}.".format(name, channel, network) "My name is Akuli and I'm on the ##learnpython channel on freenode." >>>
-
%s
-formatting, also known as old-style formatting. This has less
features than.format()
-formatting, but'Hello %s.' % name
is
shorter and faster to type than'Hello {}.'.format(name)
. I like
to use%s
formatting for simple things and.format
when I need
more powerful features.>>> "Hello %s." % name 'Hello Akuli.' >>> "My name is %s and I'm on the %s channel on %s." % (name, channel, network) "My name is Akuli and I'm on the ##learnpython channel on freenode." >>>
In the second example we had
(name, channel, network)
on the right side of the%
sign. It was a tuple.If we have a variable that may be a tuple we need to wrap it in another tuple when formatting:
>>> thestuff = (1, 2, 3) >>> "we have %s" % thestuff Traceback (most recent call last): File "
", line 1, in TypeError: not all arguments converted during string formatting >>> "we have %s and %s" % ("hello", thestuff) 'we have hello and (1, 2, 3)' >>> "we have %s" % (thestuff,) 'we have (1, 2, 3)' >>> Here
(thestuff,)
was a tuple that contained nothing butthestuff
. -
f-strings are even less typing, but new in Python 3.6. Use this only if
you know that nobody will need to run your code on Python versions older
than 3.6. Here the f is short for "format", and the content of the
string is same as it would be with.format()
but we can use variables
directly.>>> f"My name is {name} and I'm on the {channel} channel on {network}." "My name is Akuli and I'm on the ##learnpython channel on freenode." >>>
All of these formatting styles have many other features also:
>>> 'Three zeros and number one: {:04d}'.format(1)
'Three zeros and number one: 0001'
>>> 'Three zeros and number one: %04d' % 1
'Three zeros and number one: 0001'
>>>
Other things
We can use in
and not in
to check if a string contains another string.
>>> our_string = "Hello World!"
>>> "Hello" in our_string
True
>>> "Python" in our_string
False
>>> "Python" not in our_string
True
>>>
We can get the length of a string with the len
function. The name len
is short for "length".
>>> len(our_string) # 12 characters
12
>>> len('') # no characters
0
>>> len('\n') # python thinks of \n as one character
1
>>>
We can convert strings, integers and floats with each other with str
, int
and float
. They are not actually functions, but they behave a lot like functions.
>>> str(3.14)
'3.14'
>>> float('3.14')
3.14
>>> str(123)
'123'
>>> int('123')
123
>>>
Giving an invalid string to int
or float
produces an error message.
>>> int('lol')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'lol'
>>> float('hello')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 'hello'
>>>
Summary
- Slicing returns a copy of a string with indexes from one index to another index.
- Indexing returns one character of a string. Remember that we don't need a
:
with indexing. - Python has many string methods. Use the documentation or
help(str)
when you don't rememeber something about them. - String formatting means adding other things to the middle of a string. There are multiple ways to do this in Python. You should know how to use at least one of these ways.
- The
in
keyword can be used for checking if a string contains another string. len(string)
returns string's length.- We can use
str
,int
andfloat
to convert values to different
types.