Threading Methodology: Principles and Practices – Intel® Software Network

Today’s operating systems strive to make the most efficient use of a computer’s resources. Most of this efficiency is gained by sharing the machine’s resources among several tasks (multi-tasking). This “large-grained” resource sharing is enabled by all operating systems without any additional information from the applications themselves. Newer operating systems, however, also provide mechanisms that allow the application to control and share machine resources at a finer degree of granularity (threads). This document discusses how the use of threads can improve application performance, responsiveness, and throughput. It also presents a methodology that enables a developer to thread a serial application. Like most programming techniques, the primary goal of threading is to allow the user to take the best advantage of the system resources.

This white paper contains several references to the Intel® Pentium® 4 processor. When used in conjunction with Hyper-Threading Technology, the correct terminology is the Intel® Pentium® 4 Processor with HT Technology¹.

¹ Hyper-Threading Technology requires a computer system with an Intel® Pentium® 4 processor at 3.06 GHz or higher, a chipset and BIOS that utilize this technology, and an operating system that includes optimizations for this technology. Performance will vary depending on the specific hardware and software you use. See for information.

High Scalability – 8 Commonly Used Scalable System Design Patterns

Ricky Ho in Scalable System Design Patterns has created a great list of scalability patterns along with very well done explanatory graphics. A summary of the patterns are:

  1. Load Balancer – a dispatcher determines which worker instance will handle a request based on different policies.
  2. Scatter and Gather – a dispatcher multicasts requests to all workers in a pool. Each worker will compute a local result and send it back to the dispatcher, who will consolidate them into a single response and then send back to the client.
  3. Result Cache – a dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
  4. Shared Space – all workers monitors information from the shared space and contributes partial knowledge back to the blackboard. The information is continuously enriched until a solution is reached.
  5. Pipe and Filter – all workers connected by pipes across which data flows.
  6. MapReduce –  targets batch jobs where disk I/O is the major bottleneck. It use a distributed file system so that disk I/O can be done in parallel.
  7. Bulk Synchronous Parallel – a  lock-step execution across all workers, coordinated by a master.
  8. Execution Orchestrator – an intelligent scheduler / orchestrator schedules ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers.