Python Multiprocessing

Multiprocessing is a programming technique that allows for the concurrent execution of multiple processes, enabling efficient use of system resources and improved performance, especially for CPU-bound tasks. It is particularly useful in Python and other programming languages that support this model.

Key Concepts of Multiprocessing

Processes
A process is an independent program running in its own memory space. Each process has its own data and state, which avoids issues related to shared memory and threading.

Concurrency
Multiprocessing enables concurrent execution, allowing multiple processes to run simultaneously, taking full advantage of multi-core processors.

Inter-Process Communication (IPC)
Since processes do not share memory, they need a way to communicate. Common IPC mechanisms include:

Pipes: Allow one-way communication between processes.

Queues: A thread- and process-safe way to pass messages between processes.

Shared Memory: A way for processes to access a common memory space.

Load Balancing Multiprocessing can help balance workloads across multiple CPU cores, improving performance for tasks that can be parallelized.

Task Distribution Tasks can be divided among multiple processes, each handling a part of the workload.

Advantages of Multiprocessing

True Parallelism: Unlike threading, multiprocessing can achieve true parallelism, utilizing multiple CPU cores effectively.

Avoiding Global Interpreter Lock (GIL): In Python, the GIL can limit the performance of multi-threaded programs; multiprocessing bypasses this limitation.

Disadvantages

Overhead: Creating and managing multiple processes can introduce overhead compared to threads.

Memory Consumption: Each process has its own memory space, which can lead to higher memory usage compared to threads sharing the same memory space.

Complexity: Managing communication and data sharing between processes can be more complex than in a multi-threaded environment.

Use Cases

Data Processing: Handling large datasets or computations that can be broken into smaller tasks.

Web Servers: Serving multiple client requests simultaneously.

Parallel Computing: Performing tasks that require heavy computation.

Multiprocessing is a powerful approach for maximizing resource utilization and improving performance in computationally intensive applications!

The multiprocessing module in Python enables the creation and management of multiple processes, allowing for true parallelism. This is especially useful for CPU-bound tasks, as each process can run on a separate CPU core, unlike multithreading, which is constrained by the Global Interpreter Lock (GIL) in Python.

1. Importing the multiprocessing Module

Start by importing the multiprocessing module, which provides classes and functions for creating and managing processes.
import multiprocessing

2. Creating and Starting a Process

To create a new process, define a target function that the process will execute. Then, instantiate a Process object, specifying the target function, and use start() to launch the process.
import multiprocessing
import time

def print_numbers():
    for i in range(1, 6):
        print(f"Number: {i}")
        time.sleep(1)

# Create and start the process
number_process = multiprocessing.Process(target=print_numbers)
number_process.start()

# Wait for the process to complete
number_process.join()
print("Main process finished execution.")

Output:

Number: 1
Number: 2
Number: 3
Number: 4
Number: 5
Main process finished execution.
Explanation: The print_numbers() function runs in a separate process, allowing the main process to run independently until the join() method waits for it to finish.

3. Creating Multiple Processes

Like threading, multiple processes can be created and executed in parallel. Each process runs independently and can execute different functions.
import multiprocessing

def task_1():
    for i in range(3):
        print("Task 1 - Step", i + 1)
        time.sleep(1)

def task_2():
    for i in range(3):
        print("Task 2 - Step", i + 1)
        time.sleep(1)

# Create processes for both tasks
process_1 = multiprocessing.Process(target=task_1)
process_2 = multiprocessing.Process(target=task_2)

# Start both processes
process_1.start()
process_2.start()

# Wait for both processes to complete
process_1.join()
process_2.join()
print("Both processes completed.")

Output:

Task 1 - Step 1
Task 2 - Step 1
Task 1 - Step 2
Task 2 - Step 2
Task 1 - Step 3
Task 2 - Step 3
Both processes completed.
Explanation: task_1 and task_2 execute in parallel. Output order may vary based on process scheduling.

4. Using Process Subclasses

A custom process class can be created by subclassing Process. This approach provides more control over process behavior.
class CustomProcess(multiprocessing.Process):
    def run(self):
        for i in range(3):
            print(f"{self.name} - Step {i + 1}")
            time.sleep(1)

# Create and start two custom processes
process_a = CustomProcess(name="Process A")
process_b = CustomProcess(name="Process B")

process_a.start()
process_b.start()

process_a.join()
process_b.join()
print("Custom processes finished.")

Output:

Process A - Step 1
Process B - Step 1
Process A - Step 2
Process B - Step 2
Process A - Step 3
Process B - Step 3
Custom processes finished.
Explanation: Each CustomProcess instance runs independently, with unique names for distinguishing actions.

5. Sharing Data Between Processes Using Queue

Since each process has its own memory space, a Queue is used to share data between processes. Queue provides a FIFO structure for data transfer.
def calculate_square(numbers, queue):
    for n in numbers:
        queue.put(n * n)

numbers = [1, 2, 3, 4]
queue = multiprocessing.Queue()

# Create process
p = multiprocessing.Process(target=calculate_square, args=(numbers, queue))
p.start()
p.join()

# Retrieve results from the queue
squares = []
while not queue.empty():
    squares.append(queue.get())

print("Squares:", squares)

Output:

Squares: [1, 4, 9, 16]
Explanation: calculate_square() adds each square to the queue, allowing the main process to retrieve results after p.join().

6. Using Pool for Parallel Task Management

The Pool class manages multiple worker processes to execute tasks concurrently. This is helpful for distributing tasks among processes automatically.
def square(n):
    return n * n

with multiprocessing.Pool(processes=4) as pool:
    results = pool.map(square, range(1, 6))

print("Squares:", results)

Output:

Squares: [1, 4, 9, 16, 25]
Explanation: The map() method distributes the calculation of squares across the Pool of 4 processes, handling task management efficiently.

7. Synchronizing Processes with Lock

A Lock prevents race conditions when multiple processes access a shared resource simultaneously. Only one process can acquire the lock at a time.
balance = 100
lock = multiprocessing.Lock()

def withdraw(amount):
    global balance
    with lock:
        if balance >= amount:
            balance -= amount
            print(f"Withdrew {amount}. Remaining balance: {balance}")
        else:
            print("Insufficient funds")

# Creating two processes that attempt to withdraw money simultaneously
p1 = multiprocessing.Process(target=withdraw, args=(50,))
p2 = multiprocessing.Process(target=withdraw, args=(80,))

p1.start()
p2.start()

p1.join()
p2.join()
print("Final balance:", balance)

Output:

Withdrew 50. Remaining balance: 50
Insufficient funds
Final balance: 50
Explanation: with lock ensures that only one process accesses the balance at a time, preventing data corruption.

8. Sharing Data Using Value and Array

Value and Array objects allow for shared data storage accessible by multiple processes.
def increment(shared_counter):
    for _ in range(1000):
        shared_counter.value += 1

counter = multiprocessing.Value('i', 0)  # Shared integer with initial value 0

# Create and start two processes
p1 = multiprocessing.Process(target=increment, args=(counter,))
p2 = multiprocessing.Process(target=increment, args=(counter,))

p1.start()
p2.start()

p1.join()
p2.join()

print("Final counter value:", counter.value)

Output:

Final counter value: 2000
Explanation: The Value object counter is shared, allowing both processes to increment it without losing data.

9. Managing Process Lifetime with join() and is_alive()

The join() method makes the main process wait for other processes to complete. is_alive() checks if a process is still running.
def task():
    time.sleep(1)
    print("Task complete.")

# Start a process
p = multiprocessing.Process(target=task)
p.start()

# Check if the process is alive
print("Process is alive:", p.is_alive())

# Wait for the process to complete
p.join()
print("Process is alive after join:", p.is_alive())

Output:

Process is alive: True
Task complete.
Process is alive after join: False
Explanation: is_alive() shows the process status, returning True when active and False after completion.

10. Using Manager for Shared Data Structures

The Manager object provides shared dictionaries, lists, and other data structures for use across processes.
def add_to_list(shared_list):
    for i in range(5):
        shared_list.append(i)
        time.sleep(0.1)

# Create a Manager list
with multiprocessing.Manager() as manager:
    shared_list = manager.list()

    # Create and start two processes
    p1 = multiprocessing.Process(target=add_to_list, args=(shared_list,))
    p2 = multiprocessing.Process(target=add_to_list, args=(shared_list,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    print("Final shared list:", list(shared_list))

Output:

Final shared list: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Explanation: The manager.list() allows both processes to append to the same list, which is shared across processes.

Summary

The multiprocessing module in Python provides robust tools for process management, allowing for true parallelism by utilizing separate CPU cores. By sharing data carefully and managing processes efficiently, it enables effective concurrency for CPU-bound tasks.

Previous: Python Multithreading | Next: Python IO

<
>