Sunday, December 13, 2015

Perorming SQL Selects on R Data frames


You can write SQL queries to manipulate data frames in R with sqldf package.
Its like treating the Dataframe as if they were on Relation Database.

So you can import the Spotfire Data table into R as Data frame and on top of that you can perform SQL Queries.


Example:

library(sqldf)
otable<-sqldf('select * from itable')


Below are the packages needs to be installed on R server.

  • chron_2.3-47
  • DBI_0.3.1
  • gsubfn_0.6-6
  • proto_0.3-10
  • RSQLite_1.0.0
  • sqldf_0.4-10

RODBC connection R - Spotfire



Below is the R Script for connecting to ODBC using R language:-

library("RODBC")
ch<-odbcConnect("smart", uid = "user_id", pwd = "pwd",believeNRows = FALSE)
sqlTables(ch)
odbcGetInfo(ch)
sqlStr<-"select * from `Test`"
sqlQuery(ch, paste(sqlStr))

Now you can export this data frame into Sptofire Data Table and can be used for Reporting.

Fuzzy Search Join DataTables in Spotfire - Unstructured Data


Lets assume we have Two different sets of data from two different sources which are imported into Spotfire and would like to make join between these data sets. Assuming one data set is Structured data and the other is unstructured data (data from Web).

Well, we know that to combine the two different sets, would need to identify the matching key. Since we are working with Structured and Unstructured data, in most of the cases KEY will not match exactly.The only thing you can combine the two sets is by trying to match based on similarity of the column values.

For example:-
You have a company name called "Google Inc Company" on one source and 
"Google Inc Co." on other hand... You know its the same company name but how can you combine the data in Spotfire?

How to solve?
- Manually Fix the data and prepare Mapping Table
- Making use of Regex in Spotfire which we would be difficult to identify the pattern and come up with expression as the data is unstructured.
- Fuzzy String Matching using R: Easy to implement

Fuzzy String Search:
Fuzzy string approach is basically comparing the two strings based on the similarity.

Using agrep function in R, we can combine the data.

Fuzzy string matching doesn't know anything about your data but you might do.
Based on your data, we would need to set the values for the attributes like ignore.case , max distance, etc.

Details for these Attributes can be found at below location:-
https://stat.ethz.ch/R-manual/R-devel/library/base/html/agrep.html




Below is the R Script to join Two Data Tables based on Fuzzy Search String.


sec<-Table1     # Data Set1
dwh<-Table2 # Data Set2
dsecurity<-dwh[,6]  # Key column in Data Set1
ciknames<-sec[,2] # Key column in Data Set2
len<-length(dsecurity) # Number of Rows
for(i in 1:len) # For loop to search for each string
{
matches<- agrep(dsecurity[i],ciknames,max=0.2, value=TRUE, ignore.case = TRUE) # Capturing the Strings which are similar by using agrep function
matcount<-length(matches) # Count of matching strings from Data Set2 
concat<- paste(matches,collapse="|") # Concatenating the matchings Strings from Data Set 2
dwh$SECLOOKUP[i]<-concat # Adding a column to the data frame
for (j in 1:matcount)  # For loop to implement the logic for Left Outer Join
    {
        row <- matches[j]
        if(j==1)
        {
        dwh$SECLOOKUP[i]<-row
        }
        else
        {
         newRow<-dwh[i,]
         newRow[8]<-matches[j]
         dwh<-rbind(dwh,newRow)       
        }
    }
}
Otable<-dwh  # Final Data Frame with left outer join