Massive Parallel Processing
Massive Parallel Processing [MPP] in database
management systems refers to a single system with many independent
microprocessors, specifically for decision support, running in parallel.
In contrast, a distributed system uses a massive numbers of separate
computers a to solve a single problem.
Massively parallel architectures are cost effective, particularly
in high memory applications such as analytics and high definition
video processing.
Data warehouses that employ MPP include:
Terradata - the founder of parallel processing
DBMS and vendor of the largest customer database [as of 1999] of
130 terabytes of data on 176 nodes.
The effect of using MPP in a data warehouse environment is that
parallel efficiency means the effort for creating 100 records is
same as that for creating 100,000 records.
Running Aggregated Queries
To run aggregated queries faster on partitioned topology:
- Each query is parallelized such that it runs against each partition.
This leverages the CPU and Memory capacity in each partition to
truly parallelize the request. The client issuing the queries
needs no awareness of the physical separation of the partitions.
It receives aggregated results as if the query was run against
a single gigantic data store - except that it gets it much, much
faster.
- The query can run collocated with data in each partition, allowing
very complex tasks, that typically require a lot of data to traverse,
to run without moving the data back and forth. Again, this speeds
up the data access.
- Each partition contains a smaller data-set therefore contention
on the data is reduced, making queries per partition more effective.
- The data is stored in-memory. In-memory data storage is far
more efficient then disks, especially with concurrent access.
Affinity-Key Limitation
The limiting factor in this kind of scalability is the affinity-key.
The data affinity determines the level of granularity in which
data can be partitioned. Data is only partitioned to a level of
granularity necessary. This may be determined by customer-id, session-id
or by some other criteria.
The definition of the key is application specific and must be applied
during the analysis and design phase, according to the application
uses the data.
Next: Shared Nothing
Architecture
Back To Top
For
The World's Leading Guide To BI Strategy, Program & Technology
Database Index | Relational
Model | Object Model | Other
Models | DBMS | Contextual
Types | Microsoft SQL | Oracle
OODBMS | Data Warehouse | Teradata
ADW | Data Mart | MS
vs Teradata | Development Platforms
| ODBC | JDBC | SMP
| MPP | SN
| Glossary
|