Loading...
In-database analytics improves business performance, efficiency and more.

Tech2Tech

Tech Knowledge

Inside Job

In-database analytics improves business performance, efficiency and more.

In the face of ever-growing data volumes and complexity, in-database analytics is the enterprise data ware­house (EDW) response to support high-throughput, operational analytics for such needs as fraud detection, credit scoring, risk management and online commerce.

Analytical computing often involves large numbers of data records and fields, carried out repeatedly. In-database execution avoids data movement and takes advantage of the power of modern, parallel database architectures. Users see dramatically improved performance and increased processing throughput.

Definition

In-database analytics describes moving analytical computations—key performance indicator (KPI) calculations, data aggregations, model scoring, predictive analytics, geospatial analysis and other functions—into the database system.

While the most visible advantage of database-embedded computations is higher performance, organizations that take this approach also gain greater manageability and reliability as byproducts of central­izing analytical processing. Further, such centralization facilitates version control and reduces duplication of data. It also extends the database’s management, security and auditing infrastructure to analytical data and functions.

How It Works

Execution of complex or compute-intensive calculations is shifted from a desktop program or application server to the database management system. The computations are coded in a programming language, either by the end user or a third-party provider, and made available through a database query lan­guage such as SQL. Possibilities go far beyond computations that can be written directly in industry-standard versions of SQL.

Database-embedded functions are avail­able to any user or program with access permission. They are invoked by SQL que­ries or through any front-end application or user interface that accesses the data­base. End users may not even know their analytical software is running calculations in the database, except indirectly through improved performance.

Must-Have Capability

With the growing popularity of ana­lytical database systems, in-database analytics has become a must-have capabil­ity. Not every system, however, will support it. A database that can be extended with custom computations is required. Although many vendors can extend their own systems, you do not want to be limited to vendor-created or vendor-sanctioned exten­sions. For generalized, robust in-database analytics, your best option is a framework that supports database management system (DBMS) extensions created by specialists, whether from your own company or third-party providers.

Whether they are user-defined functions or specialized routines such as Teradata input modules (INMODs), extensions become part of the database and part of the EDW. The DBMS automatically allows for these extensions within the scope of utilization, performance monitoring, user management and auditing tools. The DBMS will also cover them within the query execution planning, scheduling and optimization facilities.

Abundant Advantages

The business benefits of in-database analytics boil down to cost savings and improved execution through higher efficiency, security, competitive advantage, usability and manageability.

  • Efficiency. The performance boost means more bang for the hardware buck, and it means delivering faster results to users. The analytical development and scoring process can be streamlined, allowing businesses to better use their personnel: Instead of performing the iterative cycle of extract, consolidate, integrate, and analyze or score, staff can simply analyze or score directly in the database.
  • Security and control. The central­ization of analytical processing in the database eliminates the security vulnerability associated with data movement and adds the capability to control, monitor and audit the use of analytics.
  • Competitive advantage. The pos­sibility of extending the database system with proprietary or special­ized analytical methods creates a competitive advantage. These extensions may be created by the organization itself or through vendor partnerships with special­ized, industry-leading vendors that make proven methods available without forcing organizations to program the routines themselves.
  • Usability. Analytics routines become available to end users and developers in the form of extensions to industry-standard SQL and via common applica­tion programming interfaces. Users are shielded from expo­sure to complex algorithms without movement.
  • Manageability. Moving routines into the database eliminates duplicate programming and maintenance of analytics rou­tines and eases version control.

—S.G.

If you’re already using analytics tools and packages such as those from SAS, you’re well positioned to take advantage of new capabilities. Such vendors have been working with database companies, includ­ing Teradata, to move analytical processing from the application layer into the database. SAS has rewritten many basic procedures and functions, and complete applications, to run inside the Teradata Database.

These database-embedded functions sup­port data preparation, modeling and analysis tasks and are used automatically by some SAS user-facing applications. While SAS has created connectors for many database management systems, they do not cause SAS procedures or functions to be executed in the database.

Significant Gains

In-database analytics technology builds on end-user programming facilities that first emerged almost 20 years ago. In the mid-1990s, Teradata offered one of the first commercial systems to support end-user extensions for in-database mining. Object-relational database systems from other companies also provided extensions, but they did not focus on analytical processing or provide for analytics parallelization.

Parallelization decomposes a large workload into smaller tasks that are carried out simul­taneously. Running a single task in multiple locations, with distinct data subsets, speeds up the overall execution of large tasks and work­loads. Data-related tasks that can be parallel­ized include database loading, index building, queries over large data sets and many analyti­cal computations. The Teradata system is an example of a parallel database. Complemented by Teradata Parallel Transporter, it provides scalable, high-speed, parallel data extraction, loading and updating.

Performance gains due to eliminating data movement will be in proportion to data volume. The possibility of sig­nificant gains exists where calculations are concerned. That’s because modern databases have powerful query optimiza­tion and caching mechanisms, as well as the ability to parallelize operations not typically available to, or present with, analytics options other than in-database.

As a rule of thumb, expect improve­ments of 10 times to 100 times for big data computations that can be parallelized. For example, a Teradata customer had spent 312 hours to run five models against 250 million rows, extracting the data onto an analytic server. After the system was converted into SQL models leveraging in-database mining techniques, all five analyses ran in just 12 hours—a 97% savings in time.

Developing Database-Embedded Code

Problems Solved Using In-Database Analytics

In-database analytics is great for big data challenges:

  1. Data-intensive calculations. Not moving the data saves computing and network resources, offers greater security and takes less time.
  2. Recurrent calculations. These calculations repeatedly touch the same data tables and value. Moving small amounts of data repetitively is as costly as moving big data less frequently.
  3. Cross-functional analytics. These are applications that touch both geospatial and transactional or customer-profile information. With databases hosting a variety of analytics, joining them across applications becomes easy.

—S.G.

Most modern software development is conducted using a standard integrated development environment (IDE) that encap­sulates a continuous-integration methodology. However, it is almost always possible to develop and deploy code the old-fashioned way—using a text editor and text commands.

Teradata and other vendors support database-application development through the Eclipse open-source IDE, an open devel­opment platform composed of extensible frameworks, tools and runtimes for building, deploying and managing software across the life cycle. Database vendors usually provide a free, downloadable Eclipse plug-in that uses the Eclipse plug-in framework. Teradata offers a free plug-in that can be accessed from the Teradata Tool Kit on Teradata.com.

Expanding Possibilities

The value of in-database analytics is in addressing diverse business challenges. However, the technology can be applied wherever there is a large volume of data and a critical need to minimize data movement. It can also be used to reduce processing and response times, and when a desire to move toward managed enterprise analytics exists. The move in-database is not an all-or-nothing proposition. Organizations can exploit the technology for individual applica­tions or functions based on business priorities and on the availability of database-optimized versions of software from analytics providers. Migration can be gradual and should provide measurable performance and analytics-manageability return on investment (ROI) as well as enhanced analytics capabilities: the possibility of doing more with your data.


Your Comment:
  
Your Rating:

Comments