Python3 Standard Library Overview
Operating System Interface
The os
module provides many functions related to the operating system.
>>> import os
>>> os.getcwd() # Returns the current working directory
'C:\\Python34'
>>> os.chdir('/server/accesslogs') # Changes the current working directory
>>> os.system('mkdir today') # Executes the system command mkdir
0
It is recommended to use the "import os" style rather than "from os import *". This ensures that os.open()
will not override the built-in function open()
which varies across different operating systems.
When using large modules like os
, the built-in dir()
and help()
functions are very useful:
>>> import os
>>> dir(os)
<returns a list of all module functions>
>>> help(os)
<returns an extensive manual page created from the module's docstrings>
For everyday file and directory management tasks, the shutil
module provides an easy-to-use high-level interface:
>>> import shutil
>>> shutil.copyfile('data.db', 'archive.db')
>>> shutil.move('/build/executables', 'installdir')
File Wildcards
The glob
module provides a function to generate a list of files from directory wildcard searches:
>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']
Command Line Arguments
Common utility scripts often call command line arguments. These command line arguments are stored as a list in the argv
variable of the sys
module. For example, executing "python demo.py one two three" at the command line will yield the following output:
>>> import sys
>>> print(sys.argv)
['demo.py', 'one', 'two', 'three']
Error Output Redirection and Program Termination
The sys
module also has stdin
, stdout
, and stderr
attributes. The latter can be used to display warnings and error messages even when stdout
is redirected.
>>> sys.stderr.write('Warning, log file not found starting a new one\n')
Warning, log file not found starting a new one
Most scripts terminate using "sys.exit()".
String Pattern Matching
The re
module provides regular expression tools for advanced string processing. For complex matches and manipulations, regular expressions offer a concise and optimized solution:
>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'
If you only need simple functionalities, consider using string methods first because they are straightforward, easy to read, and debug:
>>> 'tea for too'.replace('too', 'two')
'tea for two'
Mathematics
The math
module provides access to the underlying C library functions for floating point operations:
>>> import math
>>> math.cos(math.pi / 4)
0.7071067811865476
>>> math.log(1024, 2)
10.0
random
provides tools to generate random numbers.
>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(range(100), 10) # sampling without replacement
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random() # random float
0.17970987693706186
>>> random.randrange(6) # random integer chosen from range(6)
4
Internet Access
The urllib.request
module provides tools for handling URLs:
>>> from urllib.request import urlopen
>>> with urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl') as response:
... for line in response:
... line = line.decode('utf-8') # Decoding the binary data to text
... if 'EST' in line or 'EDT' in line: # Look for Eastern Time
... print(line)
The smtplib
module provides tools for sending emails:
>>> import smtplib
>>> server = smtplib.SMTP('localhost')
>>> server.sendmail('[email protected]', '[email protected]',
... """To: [email protected]
... From: [email protected]
...
... Beware the Ides of March.
... """)
>>> server.quit()
Dates and Times
The datetime
module provides classes for manipulating dates and times in both simple and complex ways. While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for output formatting and manipulation.
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2023, 10, 5)
>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'10-05-23. 05 Oct 2023 is a Thursday on the 05 day of October.'
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
21268
Data Compression
Common data archiving and compression formats are directly supported by modules including: zlib
, gzip
, bz2
, lzma
, zipfile
and tarfile
.
>>> import zlib
>>> s = b'witch which has which witches wrist watch'
>>> len(s)
41
>>> t = zlib.compress(s)
>>> len(t)
37
>>> zlib.decompress(t)
b'witch which has which witches wrist watch'
>>> zlib.crc32(s)
226805979
Performance Measurement
Some Python users develop a deep interest in knowing the relative performance of different approaches to the same problem. Python provides a measurement tool that answers those questions immediately.
For example, it might be tempting to use the tuple packing and unpacking feature instead of the traditional approach to swapping arguments. The timeit
module quickly demonstrates a modest performance advantage:
>>> from timeit import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.02523441100000001
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.02395027999999999
Unlike the more sophisticated profiler detailed later, timeit
operates on a smaller scale, measuring individual statements or small code snippets.
Quality Control
One approach for developing high-quality software is to write tests for each function as it is developed and to run those tests frequently during the development process.
The doctest
module provides a tool for scanning a module and validating tests embedded in a program’s docstrings. Test construction is as simple as cutting-and-pasting a typical call along with its results into the docstring. This improves the documentation by providing the user with an example and it allows the doctest
module to make sure the code remains true to the documentation:
def average(values):
"""Computes the arithmetic mean of a list of numbers.
>>> print(average([20, 30, 70]))
40.0
"""
return sum(values) / len(values)
import doctest
doctest.testmod() # Automatically validate the embedded tests
The unittest
module is not as effortless as the doctest
module, but it allows a more comprehensive set of tests to be maintained in a separate file:
import unittest
class TestStatisticalFunctions(unittest.TestCase):
def test_average(self):
self.assertEqual(average([20, 30, 70]), 40.0)
self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
with self.assertRaises(ZeroDivisionError):
average([])
with self.assertRaises(TypeError):
average(20, 30, 70)
unittest.main() # Calling from the command line invokes all tests
Batteries Included
Python has a "batteries included" philosophy. This is best exemplified by the comprehensive functionality of its larger packages. For example,
- The
xmlrpc.client
andxmlrpc.server
modules make implementing remote procedure calls into an almost trivial task. Despite the modules' names, no direct knowledge or handling of XML is needed. - The
email
package is a library for managing email messages, including MIME and otherRFC 2822
-based message documents. Unlikesmtplib
andpoplib
which actually send and receive messages, the email package has a complete toolset for building or decoding complex message structures (including attachments) and for implementing internet encoding and header protocols. - The
json
package provides robust support for parsing this popular data interchange format. - The
csv
module supports direct reading and writing of files in Comma-Separated Value format, commonly supported by databases and spreadsheets. XML processing is supported via thexml.etree.ElementTree
,xml.dom
andxml.sax
packages. - The
sqlite3
module is a wrapper for the SQLite database library, providing a persistent database that can be updated and accessed using slightly nonstandard SQL syntax. - Internationalization is supported by a number of modules including
gettext
,locale
, and thecodecs
package. There are several modules for accessing the internet and handling network communication protocols. The simplest ones are urllib.request for handling data received from URLs and smtplib for sending emails:>>> from urllib.request import urlopen >>> for line in urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'): ... line = line.decode('utf-8') # Decoding the binary data to text. ... if 'EST' in line or 'EDT' in line: # look for Eastern Time ... print(line) <BR>Nov. 25, 09:43:32 PM EST >>> import smtplib >>> server = smtplib.SMTP('localhost') >>> server.sendmail('[email protected]', '[email protected]', ... """To: [email protected] ... From: [email protected] ... ... Beware the Ides of March. ... """) >>> server.quit()
Note that the second example requires a local mail server to be running.
Date and Time
The datetime module provides both simple and complex methods for date and time processing.
While supporting date and time arithmetic, the implementation focuses on more efficient processing and formatted output.
The module also supports timezone handling:
>>> # dates are easily constructed and formatted
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2003, 12, 2)
>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'
>>> # dates support calendar arithmetic
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
14368
Data Compression
The following modules directly support common data packing and compression formats: zlib, gzip, bz2, zipfile, and tarfile.
>>> import zlib
>>> s = b'witch which has which witches wrist watch'
>>> len(s)
41
>>> t = zlib.compress(s)
>>> len(t)
37
>>> zlib.decompress(t)
b'witch which has which witches wrist watch'
>>> zlib.crc32(s)
226805979
Performance Measurement
Some users are interested in understanding the performance differences between different methods of solving the same problem. Python provides a measurement tool that gives direct answers to these questions.
For example, using tuple packing and unpacking to swap elements seems more appealing than using traditional methods, and timeit demonstrates that the modern approach is faster.
>>> from timeit import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.57535828626024577
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.54962537085770791
Compared to the fine-grained timeit, the :mod:profile and pstats modules provide tools for timing larger blocks of code.
Testing Modules
One method of developing high-quality software is to write test code for each function and frequently run these tests during development.
The doctest module provides a tool that scans modules and executes tests embedded in the program's documentation strings.
The test construction is as simple as copying and pasting the output into the documentation string.
By providing user-supplied examples, it enhances documentation and allows the doctest module to verify that the code's results match the documentation:
def average(values):
"""Computes the arithmetic mean of a list of numbers.
```python
>>> print(average([20, 30, 70]))
40.0
"""
return sum(values) / len(values)
import doctest
doctest.testmod() # Automatically verifies embedded tests
The unittest
module is not as easy to use as doctest
, but it can provide a more comprehensive set of tests in a separate file:
import unittest
class TestStatisticalFunctions(unittest.TestCase):
def test_average(self):
self.assertEqual(average([20, 30, 70]), 40.0)
self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
self.assertRaises(ZeroDivisionError, average, [])
self.assertRaises(TypeError, average, 20, 30, 70)
unittest.main() # Calling from the command line invokes all tests