Python Multiprocessing
$count++; if($count == 1) { include "../mobilemenu.php"; } ?> if ($count == 2) { include "../sharemediasubfolder.php"; } ?>
Multiprocessing is a programming technique that allows for the concurrent execution of multiple processes, enabling efficient use of system resources and improved performance, especially for CPU-bound tasks. It is particularly useful in Python and other programming languages that support this model.
Key Concepts of Multiprocessing
ProcessesA process is an independent program running in its own memory space. Each process has its own data and state, which avoids issues related to shared memory and threading.
Concurrency
Multiprocessing enables concurrent execution, allowing multiple processes to run simultaneously, taking full advantage of multi-core processors.
Inter-Process Communication (IPC)
Since processes do not share memory, they need a way to communicate. Common IPC mechanisms include:
Pipes: Allow one-way communication between processes.
Queues: A thread- and process-safe way to pass messages between processes.
Shared Memory: A way for processes to access a common memory space.
Load Balancing Multiprocessing can help balance workloads across multiple CPU cores, improving performance for tasks that can be parallelized.
Task Distribution Tasks can be divided among multiple processes, each handling a part of the workload.
Advantages of Multiprocessing
True Parallelism: Unlike threading, multiprocessing can achieve true parallelism, utilizing multiple CPU cores effectively.Avoiding Global Interpreter Lock (GIL): In Python, the GIL can limit the performance of multi-threaded programs; multiprocessing bypasses this limitation.
Disadvantages
Overhead: Creating and managing multiple processes can introduce overhead compared to threads.Memory Consumption: Each process has its own memory space, which can lead to higher memory usage compared to threads sharing the same memory space.
Complexity: Managing communication and data sharing between processes can be more complex than in a multi-threaded environment.
Use Cases
Data Processing: Handling large datasets or computations that can be broken into smaller tasks.Web Servers: Serving multiple client requests simultaneously.
Parallel Computing: Performing tasks that require heavy computation.
Multiprocessing is a powerful approach for maximizing resource utilization and improving performance in computationally intensive applications!
The
multiprocessing
module in Python enables the creation and management of multiple processes, allowing for true parallelism. This is especially useful for CPU-bound tasks, as each process can run on a separate CPU core, unlike multithreading, which is constrained by the Global Interpreter Lock (GIL) in Python.1. Importing the multiprocessing Module
Start by importing themultiprocessing
module, which provides classes and functions for creating and managing processes.
import multiprocessing
2. Creating and Starting a Process
To create a new process, define a target function that the process will execute. Then, instantiate aProcess
object, specifying the target function, and use start()
to launch the process.
import multiprocessing
import time
def print_numbers():
for i in range(1, 6):
print(f"Number: {i}")
time.sleep(1)
# Create and start the process
number_process = multiprocessing.Process(target=print_numbers)
number_process.start()
# Wait for the process to complete
number_process.join()
print("Main process finished execution.")
Output:
Number: 1
Number: 2
Number: 3
Number: 4
Number: 5
Main process finished execution.
Explanation: The print_numbers()
function runs in a separate process, allowing the main process to run independently until the join()
method waits for it to finish.3. Creating Multiple Processes
Like threading, multiple processes can be created and executed in parallel. Each process runs independently and can execute different functions.import multiprocessing
def task_1():
for i in range(3):
print("Task 1 - Step", i + 1)
time.sleep(1)
def task_2():
for i in range(3):
print("Task 2 - Step", i + 1)
time.sleep(1)
# Create processes for both tasks
process_1 = multiprocessing.Process(target=task_1)
process_2 = multiprocessing.Process(target=task_2)
# Start both processes
process_1.start()
process_2.start()
# Wait for both processes to complete
process_1.join()
process_2.join()
print("Both processes completed.")
Output:
Task 1 - Step 1
Task 2 - Step 1
Task 1 - Step 2
Task 2 - Step 2
Task 1 - Step 3
Task 2 - Step 3
Both processes completed.
Explanation: task_1
and task_2
execute in parallel. Output order may vary based on process scheduling.4. Using Process Subclasses
A custom process class can be created by subclassingProcess
. This approach provides more control over process behavior.
class CustomProcess(multiprocessing.Process):
def run(self):
for i in range(3):
print(f"{self.name} - Step {i + 1}")
time.sleep(1)
# Create and start two custom processes
process_a = CustomProcess(name="Process A")
process_b = CustomProcess(name="Process B")
process_a.start()
process_b.start()
process_a.join()
process_b.join()
print("Custom processes finished.")
Output:
Process A - Step 1
Process B - Step 1
Process A - Step 2
Process B - Step 2
Process A - Step 3
Process B - Step 3
Custom processes finished.
Explanation: Each CustomProcess
instance runs independently, with unique names for distinguishing actions.5. Sharing Data Between Processes Using Queue
Since each process has its own memory space, aQueue
is used to share data between processes. Queue
provides a FIFO structure for data transfer.
def calculate_square(numbers, queue):
for n in numbers:
queue.put(n * n)
numbers = [1, 2, 3, 4]
queue = multiprocessing.Queue()
# Create process
p = multiprocessing.Process(target=calculate_square, args=(numbers, queue))
p.start()
p.join()
# Retrieve results from the queue
squares = []
while not queue.empty():
squares.append(queue.get())
print("Squares:", squares)
Output:
Squares: [1, 4, 9, 16]
Explanation: calculate_square()
adds each square to the queue, allowing the main process to retrieve results after p.join()
.6. Using Pool for Parallel Task Management
ThePool
class manages multiple worker processes to execute tasks concurrently. This is helpful for distributing tasks among processes automatically.
def square(n):
return n * n
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(square, range(1, 6))
print("Squares:", results)
Output:
Squares: [1, 4, 9, 16, 25]
Explanation: The map()
method distributes the calculation of squares across the Pool
of 4 processes, handling task management efficiently.7. Synchronizing Processes with Lock
ALock
prevents race conditions when multiple processes access a shared resource simultaneously. Only one process can acquire the lock at a time.
balance = 100
lock = multiprocessing.Lock()
def withdraw(amount):
global balance
with lock:
if balance >= amount:
balance -= amount
print(f"Withdrew {amount}. Remaining balance: {balance}")
else:
print("Insufficient funds")
# Creating two processes that attempt to withdraw money simultaneously
p1 = multiprocessing.Process(target=withdraw, args=(50,))
p2 = multiprocessing.Process(target=withdraw, args=(80,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Final balance:", balance)
Output:
Withdrew 50. Remaining balance: 50
Insufficient funds
Final balance: 50
Explanation: with lock
ensures that only one process accesses the balance at a time, preventing data corruption.8. Sharing Data Using Value and Array
Value
and Array
objects allow for shared data storage accessible by multiple processes.
def increment(shared_counter):
for _ in range(1000):
shared_counter.value += 1
counter = multiprocessing.Value('i', 0) # Shared integer with initial value 0
# Create and start two processes
p1 = multiprocessing.Process(target=increment, args=(counter,))
p2 = multiprocessing.Process(target=increment, args=(counter,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Final counter value:", counter.value)
Output:
Final counter value: 2000
Explanation: The Value
object counter
is shared, allowing both processes to increment it without losing data.9. Managing Process Lifetime with join() and is_alive()
Thejoin()
method makes the main process wait for other processes to complete. is_alive()
checks if a process is still running.
def task():
time.sleep(1)
print("Task complete.")
# Start a process
p = multiprocessing.Process(target=task)
p.start()
# Check if the process is alive
print("Process is alive:", p.is_alive())
# Wait for the process to complete
p.join()
print("Process is alive after join:", p.is_alive())
Output:
Process is alive: True
Task complete.
Process is alive after join: False
Explanation: is_alive()
shows the process status, returning True
when active and False
after completion.10. Using Manager for Shared Data Structures
TheManager
object provides shared dictionaries, lists, and other data structures for use across processes.
def add_to_list(shared_list):
for i in range(5):
shared_list.append(i)
time.sleep(0.1)
# Create a Manager list
with multiprocessing.Manager() as manager:
shared_list = manager.list()
# Create and start two processes
p1 = multiprocessing.Process(target=add_to_list, args=(shared_list,))
p2 = multiprocessing.Process(target=add_to_list, args=(shared_list,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Final shared list:", list(shared_list))
Output:
Final shared list: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Explanation: The manager.list()
allows both processes to append to the same list, which is shared across processes.Summary
Themultiprocessing
module in Python provides robust tools for process management, allowing for true parallelism by utilizing separate CPU cores. By sharing data carefully and managing processes efficiently, it enables effective concurrency for CPU-bound tasks.