We will provide a suggested roadmap for developing an understanding of concurrency in general and its implementation in Python.
The following questions will be addressed:
- What is concurrency?
- What are the different flavors of concurrency?
- How does Python implement concurrency?
The road map will provide brief descriptions of key concepts and how they fit together. For those that want to delve deeper into any concept, links will be provided.
Motivation for Blog Creation
I was "volunteered" to give a presentation on Python concurrency at the Pittsburgh Python users group.
One of the questions that come up for people is whether or not to go to the presentation. It is hard to determine this from a brief description. So, my solution is to create a blog post for the presentation ahead of time.
Another question that comes up is how to manage the quantity of material presented in a short period of time. This will be especially true in the case of Python concurrency. So, having the blog post ahead of time will allow people to focus on their interactions with others and spend less time taking notes.
Also, some will take advantage of the blog post to prepare questions and comments for the presentation.
Definitions
This human slowness allows us to play tricks when implementing software. Some of the tricks center around the ability of the software to switch between "stuff" without the human ever noticing. The "stuff" goes by names like threads and tasks. These names will be explicitly defined later in the blog.
Number of Cores: Many - Multiprocessing
One flavor of concurrency is multiprocessing. Let's begin to create a definition for multiprocessing with a picture.
Here we have individuals juggling independently of each other. There is no interaction between them. So, a process is execution which shares no resources with another process. The operating system (OS) assigns a process resources and that is that. Depending on the computer load, the OS might suspend the process and assign those resources to another process. Note that there is no interaction between the suspended process and the new process. If more information is desired regarding multiprocessing, refer to the
Wikipedia multiprocessing article.
Working Only with Python Code - Use Python's Multiprocessing Module
If more information is desired regarding Python's implementation of multiprocessing when only Python code is involved, refer to its
multiprocessing module. It provides queues, events, worker pools, and other marshaling paradigms.
Working with Non-Python Code - Use Python's Subprocess Module
Use cases exist where have to concurrently invoke non-Python code. In such circumstances, the
subprocess module should be used. Unfortunately, it does not provide queues, events, worker pools, and other marshaling paradigms. This is to be expected because each external interface can be different.
Number of Cores: 1 - Multitasking
This other flavor of concurrency is called multitasking. In turn, multitasking has 2 sub-flavors: pre-emptive multitasking and cooperative multitasking.
Pre-Emptive Multitasking (Threading)
As before, we will begin to create a definition with a picture.
If you notice, there is one individual and he can; at will, pick up a ball and place it in another location. This is done without examining the state of the ball. Pre-emptive multitasking is the same thing except that the individual is replaced by the OS and the balls are replaced by
threads.
Below are the standard threading questions
- Is the code thread safe? For further details, refer to "Wikipedia: Thread safety."
- Another way of asking the above question but using different words:
- How many threads to create? Experimentation is usually required to answer this question.
Please note that the above is not "just 3 questions." Each question by itself involves a lot of complexity.
Cooperative Multitasking (Asyncio)
For the third time, we will begin to create a definition with a picture.
If you notice the 3 individuals have to cooperate very closely in order to successfully juggle. This cooperation is the key to cooperative multitasking. Regrettably, that is as far as the analogy will take us this time.
In the case of
asyncio, the entities doing work are called
tasks. Tasks cannot be interrupted. In other words, they are not thread-safe. However, the advantage of not being thread safe is that it simplifies them. Also, each task can be in a "ready to run" or "waiting" state. The coordination of the tasks is accomplished via the simple concept of an
Event Loop. This again reduces the complexity.
The advantage of cooperative multitasking is that don't have to worry about thread safety. The con is the added complexity of writing the software to make sure that everything cooperates.
Use Cases
Now that we have discussed the concepts and tools, let's determine when they should be used.
All code was executed using Python 3.6.7.
Use Case 1: I/O Bound
The I/O bound condition is created by the download of internet web pages. In this situation, the CPU spends a large portion of its time waiting for data to arrive over the internet.
Non-Concurrent Code
To go to the non-concurrent code from Jim Anderson's blog, click
here. The code requires that the
requests module is installed.
Pro: Code simplicity
Con: Slow compared to the other approaches.
Use Threading
To go to the code that uses threading from Jim Anderson's blog, click
here.
Notice the change in the download_all_sites function between the non-concurrent code and the code that uses threading. The corresponding code snippets are below.
Non-Concurrent Code
def download_all_sites(sites):
with requests.Session() as session:
for url in sites:
download_site(url, session)
Threading: Code Snippet 1
def download_all_sites(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
executor.map(download_site, sites)
Let's examine
ThreadPoolExecutor. As the name implies, you get a thread, a pool and an
executor. In this particular case, you get a pool of 5 threads. The executor is the guy that does the co-ordination.
Thread safety is addressed by using
Thread-Local Data in the get_session function. The associated code snippet is below. Notice that created the thread_local object once and not once for each thread.
Threading: Code Snippet 2
threadLocal = threading.local()
def get_session():
if not getattr(thread_local, "session", None):
thread_local.session = requests.Session()
return thread_local.session
Please note that
requests.Session() is
not thread-safe.
Pro: Faster than non-concurrent code
On my laptop, the code that used threading was about 0.2 seconds faster than the non-concurrent code. This may not seem like much; but at scale, it makes a difference. Note that, the elapsed time is a good way to start assessing performance. If more precise measurements are needed consider using
timeit with
process_time().
Con: Dramatic increase in complexity
We now have to deal with
- Executors
- Pools
- Thread Safety
- Threads
Use Asyncio
To go to the code that uses asyncio from Jim Anderson's blog, click
here. The code requires that the
aiohttp module is installed.
Once again, let's focus on the download_all_sites function.
Asyncio Code
async def download_all_sites(sites):
async with aiohttp.ClientSession() as session:
tasks = []
for url in sites:
task = asyncio.ensure_future(download_site(session, url))
tasks.append(task)
await asyncio.gather(*tasks, return_exceptions=True)
Lets start by noticing that the keyword async prefixes the function definition. This turns the function into a
coroutine. Coroutines are code blocks that have the ability to yield execution to other code blocks.
Next, notice that Session is created only once and shared among all the tasks. This is possible because a task can't be interrupted and leave the Session in a bad state. So, the tasks can share the Session.
The
await keyword is how the task hands control back to the event loop.
Now that we have the tasks, we need to start the event loop and give it the tasks. This is accomplished with the following in __main__
If you are using Python 3.7 or higher, you can use
asyncio.run() instead.
Pro: Noticeably faster than using threads
On my laptop, the code that used asyncio was about 5 seconds faster than the code using threads. Notice going from non-current code to threading only improved performance by about 0.2 seconds. Going from threading to asyncio, performance improved by about 5 seconds. This is over an order of magnitude faster!
Con: More complex syntax
The code becomes littered with "async." You need to create and start tasks. You then need to make sure that all the tasks complete before the session is terminated. Last but not least, you have to take care of the event loop.
Use asyncio when you can - Threading when you must
The following is a direct quote from Jim Anderson's article
- For I/O-bound problems, there’s a general rule of thumb in the Python community: “Use asyncio when you can, threading when you must.”
This is such a critical rule of thumb that it deserves its own section.
Use Multiprocessing
To go to the code that uses multiprocessing from Jim Anderson's blog, click
here.
As in the previous cases, let's focus on the download_all_sites function.
Multiprocessing Code Snippet 1
def download_all_sites(sites):
with multiprocessing.Pool(initializer=set_global_session) as pool:
pool.map(download_site, sites)
Lets start by noticing
multiprocessing.Pool(). It creates a pool of worker processes to which jobs can be submitted. Since the processes parameter was not specified, it defaults to the number of CPUs in the system. Picking the number of worker processes is just as hard as picking the number of threads. The other thing to notice is that it has a
parallel map implementation. So "pool.map(download_site, sites)", maps each URL to the function download_site.
Pro: Can take advantage of all the cores on the computer.
Con: Complexity of trying to align Session and processes results in the weirdness of "global session." See code snippet below.
Multiprocessing Code Snippet 2
session = None
def set_global_session():
global session
if not session:
session = requests.Session()
Since every process has its own memory, they can not share a Session object. At the other extreme, you don't want to create a Session each time the function is called. The goal is to create a Session for each process. This is accomplished by using the initializer parameter as shown in "Multiprocessing Code Snippet 1."
In terms of coding complexity, multiprocessing code is on par with the code that uses threading. In terms of performance, at least on my laptop, it was slower than asyncio but faster than the other 2 approaches. Below is a table of the run times.
Approach Run Time (seconds)
-------------- ------------------
No Concurrency 7.4
Threading 7.3
Multiprocessing 5.8
Asycnio 2.2
Notice how much faster asyncio is.
Use Case 2: CPU Bound
We will continue to use the code snippets from Jim Anderson's blog. These code snippets compute the sum of the square of each number from 1 to a passed in value to create a CPU bound condition.
Non-Concurrent Code
To go to the non-concurrent code from Jim Anderson's blog, click
here. Nothing much to say. We should be able to do better using multiple cores.
Use Threading
To go to the code that uses threading from Jim Anderson's blog, click
here. Again, nothing much to say. As expected, the threading approach takes longer than the non-current approach.
Use Multiprocessing
To go to the code that uses multiprocessing from Jim Anderson's blog, click
here.
Just like in the I/O bound multiprocessing case, multiprocessing.Pool() and pool.map() are used.
Multiprocessing Code Snippet
def find_sums(numbers):
with multiprocessing.Pool() as pool:
pool.map(cpu_bound, numbers)
As expected multiprocessing shines on CPU bound problems. It was over 3 times faster than the non-concurrent version of the code.
Pro: Significant performance improvement with little-added code complexity.
The very little code complexity arises because nothing is shared.
Con: Splitting up the problem so that each processor can work independently can be difficult.
This issue did not show up in the trivial example that we did. Sometimes, problems are not
embarrassingly parallel.
Use Case 3: Python Concurrent Interaction with Non-Python Code
To Do: Put together code that cycles through directories that invokes Java to convert
DocBook XML to HTML.
As an aside, the above points to the fact that you can use Python instead of bash.
Alternatives
Perhaps Concurrency Cost is Not Worth It?
As the preceding material indicates, concurrency is extremely complex with a lot of nuances. Which form of concurrency do I use? Is my code thread safe? Did I make sure that all my tasks completed prior to the Session terminating? And so on.
One option might be to just wait. The business people want to be able to run the report several times during the day but they aren't willing to allocate resources to make this happen. In that case, they will have to wait for the overnight run to complete.
Another option is to just to apply brute force. Buy a bigger machine or faster disks. There is nothing wrong with brute force.
Let me be very blunt: DO NOT USE CONCURRENCY UNTIL YOU HAVE EXHAUSTED ALL OTHER OPTIONS!
Summary
Concurrency is a big topic. Python's implementation of concurrency is an even bigger topic. The goal of the preceding material was to provide a roadmap for learning about concurrency and Python's implementation of it at a high level. The roadmap has embedded links which provide jumping off points for further detailed learning.
References