SQL Server

Big Data Analytics with the Microsoft Analytics Platform System Appliance

Big Data Analytics with the Microsoft Analytics Platform System Appliance

Author: Basit A. Farooq


Data warehouses are the central repository containing all relevant data (internal or external) for enterprise information and decision support systems. These repositories provide high quality information, that is, subject-oriented, integrated, time-invariant and non-updatable, for the data analysis purposes to support enterprise’s management processes and business decisions. However, in today’s world of interconnected devices, such as smart phones, televisions, watches, laptops, tablets, desktops, gaming devices like Xbox and Play Station, and many more, and wide access to data obtained from various sources, such as Twitter, Facebook, LinkedIn, flat files, blogs, websites, system logs, sensors and so on, data growth is among one of the three main challenges companies are facing today. As a result of this exponentially growing data within the organization, traditional data warehouses have reached a critical point, which requires business-driven changes to the current system in place. Moreover, the traditional data warehousing systems are not able to meet the demands of increasing data volumes, which organizations are expected in the next five years, unless they make major investments in hardware, tuning, support and maintenance. Finally, the traditional data warehousing systems are not optimized for low-latency large volume data load and high-throughput complex analytical workloads. Its performance is well below user expectations (now and future). In today’s day and age, end users want results in real time to meet the current demands of the world of Big Data. However, traditional data warehousing systems are unable to meet these business demands. That’s because, they are only designed to updated once in 24 hours, and are not fast enough to provide near real-time analytics.

Microsoft Analytics Platform System

To meet the enterprise demands and help the organization transition to a modern data warehousing system that is optimized for low-latency large volume data load and high-throughput complex analytical workloads, Microsoft introduces Analytics Platform System (APS), also known as the Parallel Data Warehouse (PDW), in April 2014. Microsoft Analytics Platform System enables organizations to meet the challenges of serving the enterprises traditional data warehouses on a massive scale. That’s because, Microsoft Analytics Platform System is the high-performance and scalable parallel processing appliance built for modern data warehousing needs. Microsoft Analytics Platform System improves the data load and query response times (up to and beyond 50x over legacy data warehousing solutions, thus running queries in minutes instead of hours, and seconds instead of minutes). Microsoft Analytics Platform System runs on certified hardware platform that integrates SQL Server Parallel Data Warehouse (PDW) software (a massively parallel processing version of SQL Server that is designed to run within Microsoft Analytics Platform System) and optional HDInsight Hadoop solution (a Microsoft offering of Hadoop for Windows based on the Horton works Data Platform from Horton works) together in the same appliance. Big Data capabilities in Microsoft Analytics Platform System, with the included PolyBase, let you perform standard SQL queries to access and join Hadoop data with relational data. Microsoft Analytics Platform System also supports new scenarios for utilizing Power BI modeling, visualization, and collaboration tools over on premise data sets.

Microsoft co-engineered Microsoft Analytics Platform System with Dell, HP, and Quanta, and Microsoft is your single point of contact for hardware and software support. Unlike other vendors, Microsoft Analytics Platform System offers lowest price per terabyte for user-available storage (compressed) for data warehouse appliance. Microsoft Analytics Platform System Key features include:

Enterprise-ready big data – Big Data capability in Microsoft Analytics Platform System, with the included PolyBase, provides a fundamental breakthrough in data processing by enabling seamless integration between traditional data warehouses and “big data” deployments.

  • A high performance MPP relational data warehouse, SQL Server PDW, and the HDInsight Hadoop solution together in the same appliance
  • Standard SQL queries to access and join Hadoop data with relational data
  • Query Hadoop data without having to pre-load data first into the warehouse
  • Native Microsoft BI integration allowing analysis of relational and non-relational data with familiar tools like Excel

Next-generation performance at scale – Microsoft Analytics Platform System is a massively parallel processing appliance that can handle the extremes of your largest mission critical requirements.

  • Up to 100x faster than legacy warehouses with in-memory and updateable columnstore indexes
  • Massively parallel processing architecture that parallelizes and distributes computing for high query concurrency and complexity
  • Built-in hardware redundancies for fault tolerance
  • Microsoft as a single point of contact for hardware and software support

Engineered for optimal value Unlike other vendors in the data warehousing space who deliver a high-end appliance at a high price, Microsoft engineered Microsoft Analytics Platform System for optimal value through software innovations which result in a lower cost for the appliance.

  • Resilient, scalable, and high-performance storage features built into software, which lowers hardware costs.
  • Instead of over-acquiring capacity, you can start Microsoft Analytics Platform System with a quarter rack allowing you to correctly size the appliance and then scale-out the appliance at the later date with the same tools that you normally use when scaling-out traditional SQL Server systems.
  • Reduced data center and management costs by combining a relational data warehouse with Hadoop in one appliance
  • Data compression up to 15x with the in-memory updateable columnstore, saving up to 70% of storage requirements
  • Start small with a quarter rack allowing you to right-size the appliance rather than over-acquiring capacity
  • Use the same tools and knowledge as SQL Server for scale-out data warehouse or big data
  • Co-engineered with hardware partners offering the highest level of product integration, and shipped to your door offering fastest time-to-value
  • The lowest price per terabyte for a data warehouse appliance

Microsoft Analytics Platform System hardware components


As mentioned earlier, Microsoft Analytics Platform System is a pre-built data warehouse appliance that runs massively parallel processing version of SQL Server, that is,
SQL Server Parallel Data Warehouse (PDW), and third-party server hardware and networking components, such as HDInsight Hadoop solution, a Microsoft offering of Hadoop for Windows based on the Horton works Data Platform from Horton works. Microsoft Analytics Platform System is really a rack, and is not a region within the system but a system itself. Within an Microsoft Analytics Platform System rack, you have the Ethernet switches and the InfiniBand switches, mean two types of networks running inside the appliance. The InfiniBand is a 56 gigabytes a second network that allows for us to move data around quickly between nodes. So, whether you are loading data, backing up the data, or whether system has to shuffle data around for a query, InfiniBand is the pipe that handles all of that. On the other hand Ethernet handles all low volume type of queries such as user queries or a management chatter coming back and forth between all the servers and things. Microsoft Analytics Platform System also has the PDUs, the power supplies designed for a fully racked system. Next, you have PDW region that contains active and passive orchestration host (control node and failover node), and a data-scale unit. Data scale unit is a work horse of the Microsoft Analytics Platform System appliance containing compute nodes (master and failover nodes) and economical disk storage for HDInsight region and PDW Analytics Platform System region.