The Dameng analytical large-scale parallel processing cluster software (DMMPP) is a peer-to-peer shared-nothing parallel cluster component developed based on Dameng’s database management system. It supports organizing multiple DM8 nodes into a parallel computing network to provide unified database services. It can support a maximum of 1,024 nodes, as well as TB- to PB-grade data storage and analysis. It also provides high availability and the ability to expand dynamically, making it a cost-effective general solution for very large database applications. DMMPP distributes loads to multiple database server masters to realize the storage and processing of large-scale data. Adopting peer-to-peer shared-nothing architecture, every database server is named an EP, and every EP is its own independent database. Within this architecture, every EP node’s functions are exactly the same, allowing users to connect to any EP node within the DMMPP system and conduct data manipulation.
DMMPP adopts peer-to-peer share-nothing architecture whereby every single DM database server instance is an execution node allowing the client-side to then connect to any node and conduct manipulation. Each node is only responsible for the read/write of its own portion of data and the execution plan is executed in parallel across all nodes, ensuring that the data is only transmitted between nodes when necessary via a high-speed email system, making full use of the advantages of large-scale parallel processing. Moreover, as the system grows in scale, linear performance improves.
Large-scale Data Analysis Requirement Scenarios
Large-scale data analysis is primarily complex statistical query requests with relatively low concurrency, relatively long request response times, usually taking minutes and even hours in some scenarios, but these application scenarios require vast volumes of data, reaching TB- to PB- grade, and the data must be copied/backed up.