Lets assume we have Two different sets of data from two different sources which are imported into Spotfire and would like to make join between these data sets. Assuming one data set is Structured data and the other is unstructured data (data from Web).
Well, we know that to combine the two different sets, would need to identify the matching key. Since we are working with Structured and Unstructured data, in most of the cases KEY will not match exactly.The only thing you can combine the two sets is by trying to match based on similarity of the column values.
For example:-
You have a company name called "Google Inc Company" on one source and
"Google Inc Co." on other hand... You know its the same company name but how can you combine the data in Spotfire?
How to solve?
- Manually Fix the data and prepare Mapping Table
- Making use of Regex in Spotfire which we would be difficult to identify the pattern and come up with expression as the data is unstructured.
- Fuzzy String Matching using R: Easy to implement
Fuzzy String Search:
Fuzzy string approach is basically comparing the two strings based on the similarity.
Using agrep function in R, we can combine the data.
Fuzzy string matching doesn't know anything about your data but you might do.
Based on your data, we would need to set the values for the attributes like ignore.case , max distance, etc.
Details for these Attributes can be found at below location:-
https://stat.ethz.ch/R-manual/R-devel/library/base/html/agrep.html
Below is the R Script to join Two Data Tables based on Fuzzy
Search String.
sec<-Table1 # Data Set1
dwh<-Table2 # Data Set2
dsecurity<-dwh[,6] # Key column in Data Set1
ciknames<-sec[,2] # Key column in Data Set2
len<-length(dsecurity) # Number of Rows
for(i in 1:len) # For loop to search for each string
{
matches<- agrep(dsecurity[i],ciknames,max=0.2, value=TRUE, ignore.case = TRUE) # Capturing the Strings which are similar by using agrep function
matcount<-length(matches) # Count of matching strings from Data Set2
concat<- paste(matches,collapse="|") # Concatenating the matchings Strings from Data Set 2
dwh$SECLOOKUP[i]<-concat # Adding a column to the data frame
for (j in 1:matcount) # For loop to implement the logic for Left Outer Join
{
row <- matches[j]
if(j==1)
{
dwh$SECLOOKUP[i]<-row
}
else
{
newRow<-dwh[i,]
newRow[8]<-matches[j]
dwh<-rbind(dwh,newRow)
}
}
}
Otable<-dwh # Final Data Frame with left outer join
How did you get this to work with Spotfire? I see agrep as unsupported in TERR.
ReplyDeletehttps://docs.tibco.com/pub/enterprise-runtime-for-R/4.0.2/doc/html/TIB_terr_Unavailable-Functions-list/GUID-4530EA87-B49C-4C8C-BF5C-C74C71F66A80.html
agrep is supported in newer TERR versions
ReplyDeleteI appreciate you for the time you spent to write this whole piece of article. If you want to become pro in Tibco Spotfire and establish your carrer, You can reach here
ReplyDelete