The Business Intelligence Guide
   BI Strategy | BI Program | BI Projects | BI Data | BI Infrastructure | BI User Tools | BI Vendors | Resource Guides | Articles | BI Blog | BIG Bookstore

Get a FREE Sample of the
The Total BI Guide
THE LOGICAL ORGANIZATION

and receive the
TLO NEWSLETTER
Just enter your details below

Email:
Name:
 
HOME
 
Business Intelligence
BI Strategy
BI Program Guide
BI Tools
- Dashboards
- Scorecards
- Operational BI
- Analytics
BI Software Solutions
Data Management
Databases
DataWarehouses
Decision Support
Marketing Tools
Industry Solutions
Case Studies
BI Surveys & Awards
RESOURCES
ARTICLES
BI EVENTS
BI NEWS
Sitemap

About the Author

View Gail La Grouw's profile on LinkedIn

Google+ Gail La Grouw

Bookmark and Share

Massive Parallel Processing


Massive Parallel Processing [MPP] in database management systems refers to a single system with many independent microprocessors, specifically for decision support, running in parallel.

In contrast, a distributed system uses a massive numbers of separate computers a to solve a single problem.

Massively parallel architectures are cost effective, particularly in high memory applications such as analytics and high definition video processing.

Data warehouses that employ MPP include:

Terradata - the founder of parallel processing DBMS and vendor of the largest customer database [as of 1999] of 130 terabytes of data on 176 nodes.

 

The effect of using MPP in a data warehouse environment is that parallel efficiency means the effort for creating 100 records is same as that for creating 100,000 records.

 

Running Aggregated Queries

To run aggregated queries faster on partitioned topology:

  • Each query is parallelized such that it runs against each partition. This leverages the CPU and Memory capacity in each partition to truly parallelize the request. The client issuing the queries needs no awareness of the physical separation of the partitions. It receives aggregated results as if the query was run against a single gigantic data store - except that it gets it much, much faster.
  • The query can run collocated with data in each partition, allowing very complex tasks, that typically require a lot of data to traverse, to run without moving the data back and forth. Again, this speeds up the data access.
  • Each partition contains a smaller data-set therefore contention on the data is reduced, making queries per partition more effective.
  • The data is stored in-memory. In-memory data storage is far more efficient then disks, especially with concurrent access.

 

Affinity-Key Limitation

The limiting factor in this kind of scalability is the affinity-key.

The data affinity determines the level of granularity in which data can be partitioned. Data is only partitioned to a level of granularity necessary. This may be determined by customer-id, session-id or by some other criteria.

The definition of the key is application specific and must be applied during the analysis and design phase, according to the application uses the data.

Next: Shared Nothing Architecture

Back To Top

For The World's Leading Guide To BI Strategy, Program & Technology


Database Index | Relational Model | Object Model | Other Models | DBMS | Contextual Types | Microsoft SQL | Oracle OODBMS | Data Warehouse | Teradata ADW | Data Mart | MS vs Teradata | Development Platforms | ODBC | JDBC | SMP | MPP | SN | Glossary

Bookmark and Share
 


THE LOGICAL ORGANIZATION

A Strategic Guide To Corporate Performance Using Business Intelligence

THE COMPLETE BI MANUAL
FOR MANAGERS & CONSULTANTS

The Logical Organization Book Cover