Loading...
The functionality of R combined with a Teradata Database provides an innovative solution for advanced analytics.

Tech2Tech

Applied Solutions

Plug in With Muscle

The functionality of R combined with a Teradata Database provides an innovative solution for advanced analytics.

The “R project,” commonly known as “R,” is a powerful solution to implement analytic methods for business applications such as churn, cross-selling and credit risk analysis. With this solution, users can easily enact all of the required steps to prepare, run and interpret statistical analysis.

Teradata supports R as a cost-effective option for companies. Like Linux, Apache and Firefox, it is an open-source program—free for anyone to use and modify—encouraging organizations to explore analytic techniques and experiment with analytic applications without procurement and licensing of software.

Leverage In-Database Functions

Analysts are required to move data into the R environment, which can be a chal­lenge depending on the volume and data source. To address this, Teradata developed an add-on that enables users to push key analytic tasks directly into the database for processing. This eliminates the need to move information from the data warehouse into an R data frame. The R add-on allows users to easily connect to the Teradata Database, establish data frames to tables within the database, and use the more than 45 in-database analytic functions callable from R.

Definition

Data frames are the R matrix or table-like structures in which the columns can be of different types and rows represent an observation. R data frames closely resemble the SAS or SPSS data set.

The add-on takes a unique approach to data frames by establishing a pointer (virtual table) to Teradata Database tables, which eliminates the need to move the entire table into the R environment.

The add-on also provides the programmer with the opportunity to leverage the processing power of the Teradata Database with the R interface. The advantages of using R in-database include:

  • Keeping data movement to a minimum
  • Supporting big data processing
  • Executing R process steps in parallel

R at Work

The end-to-end process of analytical mod­eling starts with business specifications. The process addresses statistical data prepara­tion, the actual modeling, and preparation of (recurrent) scoring after the modeling.

To be understandable for statistical methods, information mostly needs to be organized in a data set or matrix form. In the case of churn prediction, for example, the most basic element of information is the line level Customer Analytic Record (CAR). Data preparation behavioral details, such as the number of calls or minutes of use, are aggregated to a weekly or monthly level. Finally, information about whether the individual line has churned or not is attached. Typically, a CAR covers 100 to 300 attributes per line.

To be used by regression analysis, which is usually the preferred option for analyz­ing churn, a certain number of churned lines are combined with a number of lines still active in order to prepare modeling. The resulting sample of records is called an analytic data set (ADS).

An essential part of the modeling process is the preparation of the ADS. Data preparation is recommended to take place completely in-database. The best practice is to use Teradata ADS Generator. To initiate modeling with R, users can create the modeling ADS R Teradata data frame.

For example, if the model will be used for scoring, R offers predictive model markup language (PMML) to import the regression model into the Teradata Database. Other options include parameter handover using command coefficients, plus code parsing or scoring with R.

Get Results

This code example shows the syntax used to generate histograms (see figure) for a churn analysis and to create the results in table 1:

Figure: Histograms for a Churn Analysis

Click to enlarge

Table 1: Summary of Analysis

Click to enlarge

library(teradataR)
tdConnect("Teradata")
tdf <- td.data.frame("CHURN_ADS", "CHURN_SOURCE_DB")
td.hist(tdf, "minutes_of_use")td.hist(tdf, "age")
summary(tdf[c("minutes_of_use", "age")])

Table 1 provides an analysis summary. This code provides the corresponding R output for the churn model:

A_churn_model <- glm(formula = churn_event ~ minutes_of_use +
    age, family = binomial, data = tdf)
summary(A_churn_model)


Table 2: Attribute Coefficients

Click to enlarge

Table 2 shows the estimated regression coefficients for minutes of use and age attributes. A regression analysis typically uses many more types of attributes.

Increased Benefits, Reduced Costs

The functionality of R and its free access to numerous statistical techniques gives users of Teradata systems a powerful environment for advanced analytics. Teradata’s add-on package allows users to capitalize on the ben­efits of R and leverage in-database processing for analytical experimentation, prototyping and development, then deploy models using commercial tools. This reduces development costs, delivers emerging analytic techniques and accelerates delivery with reduced risk.


Your Comment:
  
Your Rating:

Comments
 
The following that pertains to Teradata is part of a letter we are sending out to gather support for an Occupy Dental Office protest in Las Vegas. ...We further clarify the cause of the Rush Limbaugh meanness in today's America with the story on our website of a sadistic fraternity brother I knew (now a senior manager at Teradata) who got down on his knees to suck for his position when he was young and never got off them. Michael is your typical power mad corporate conservative, closet homosexual bully who gives a clear picture of what Newt Gingrich and his ilk are all about. You have to have known one of these jerks on the right close up to make sense of their vicious idiocy. The key to these double-talking, ever sneaky robots is their inability to get excited about anything but the cruelty they impose on others that partially relieves their humiliation in life from having had to submit to get power in the slave colony America horror movie all of us are forced to live in or run like hell from. We

12/10/2011 12:42:25 PM
— Anonymous
 
Karl Krycha's approach to adding advanced analytics is truly inspiring and should be of high value to many companies. This approach using varied BI tools and DW data should provide enhanced contributions to business and governments for previously unknown decision-making data, selection criteria, complex queries, predictions, enhanced use of meaningful visualization, and also decisioons regarding a company's resource allocations. The hallmark of Dr. Krycha's approach is the flexibility and in-depth opportunity of using the Teradata Data Warehouses for resolving BIG DATA issues, multi-variant equations, and previously superficial or static outputs. Multiple Iterations, historical and behavioral analysis, and suggestive/predictive outputs will enhance this approach to new levels. This contribution should be patented and Krycha's group should be listened to much more by the CRM, DCM, and Vertical-Industry Consultants and Developers... in order to provide greater VALUE TO THEIR CUSTOMERS. It is a pleasure to see s

11/4/2011 3:44:52 PM
— Anonymous