Answered: Determining If Common Values Exist Between Two Pieces of Data
I have an interesting problem that NEEDS an interesting solution. For this example, assume that I have two sets of data that contains words separated by spaces. Set 1 contains 'green yellow orange red silver' set 2 contains 'black blue red white pink green'. I need to find out if any of the values in set 1 are contained in the data in set 2 - in this example, the subset I am looking for contains 'green' and 'red' as these two individual values are present in both set 1 and set 2. The number of exact matches themselves are not important for this project, all I need to know is if there are ANY common values between the two data sets (I think the proper terminology is 'intersection of sets').
I can do this by breaking down one of the sets into its individual components and scripting to check if this information is contained in the other set. For example the scripting logic would be like this 'If [$set 1] Contains 'black'... do this' yet I would have to use a 'For Each... Next' command to loop back a total of six times to check each of the individual values in set 2 via establishing a variable and using 'Regular Expressions' ('... blue red white pink green').
I know that this method will work but it is slow and I have thousands of set 1 data that must be compared to set 2 data (I just scripted this somewhat convoluted logic this morning... it works but it takes some time!). I am curious if any of the EXCEPTIONALLY bright people here know of a more efficient method to compare two sets of data together to see if any common values are present. All I need to know is that either 'Yes' (1) or 'No' (0) concerning the existence of common values between the two sets. If so, how can I go about doing this in Brilliant Database?
The information contained within each data set changes with each comparison (I failed to include that vital piece of information). Because of that dynamic nature, I do not believe that a query would work.
Unfortunately, Brilliant Database will NOT allow you to drop out of a loop when a match is made - that ability would work out extremely nice but once you start a loop, it will run through its entirety regardless of any 'Break', 'Goto... Labels', or usage of 'If' statements. I have tried every trick I know for exiting a 'For Each.. Next' loop prematurely/early and that ability is not present. If you have been able to make this work, please share!
Both sets of data change - there is never a constant or repeatable amount of data. Set 1 may have only one word whereas set 2 may have 12 words. The next comparison could have eight words in set 1 and three words in set two. Because of this dynamic nature of the data, I do not think that a query could be used successfully for this analysis. Looping via a 'For Each... Next' does work but it takes too long for my liking.
If you make a simple 'For Each... Next' loop and step through the script line by line I suspect that you will see that it will loop the whole original number of loops regardless of any attempts at exiting the loop early. I have tried all options that I am aware of and NOTHING would change that. Changing the value of the variable controlling the number of loops did nothing, 'Goto Labels' was ignored, and various attempts at using VBScripts (which Brilliant Database claims to support) had no affect.
I would love to see this ability actually work within Brilliant Database but it does not. I reported this bug to Brilliant Database last year and received a rely that they, too, were able to replicate the problem and said that it would be corrected in future releases (this was prior to V10 coming out) yet it still exists. I have not tried it in v10.54... maybe it has been fixed but I know it does not work in v10.5. If you were able to make it work in the current release version I would greatly appreciate knowing what you did!
For argument's sake, assume 12 words as a maximum for set 1 and set 2. In actual practice, 12 words should be the maximum for one set while the other set could have hundreds (limited only to the number that can be held in a Recordset). That would be a massive query (if that is what you are thinking). Yes, I realize that this is an odd problem but that is what I am trying to tackle! Given the constraints of the problem, I do not think that it can be dealt with in a simpler fashion.
I have tried that and it does not work. I will post an example later to demonstrate that once you start a 'For Each... Next' loop in Brilliant Database that it cannot be stopped early regardless of what instructions/scripting are included instructing it to do otherwise. I have fought this issue for a very long time.
I seem to have completely misunderstood what you are trying to do.
I assumed you have a record selected that contains a field which could contain upwards of 12 words, and you wanted to run a search to see if any other records from a different folder contained any of the 12 values from the selected record.
Sounds to me now like your trying to compare a batch of records against another batch of records.
The information I need to know is whether or not both data sets contain any of the same 'pieces' (in this example, 2851 and 2768 appear in both variables). There are slightly more than 105K records that will be compared to the second variable (note that the second variable may change each time the analysis is run so it, too, is dynamic). I am able to get correct answers now using my method but it is slow. I am going to test out MacHeuS's suggestion shortly.