So I've got some very large (a few thousand elements long) arrays of numbers. Each element in the array is a number from 1 to 6. I then have a lot of smaller arrays (maybe 20 or 30 elements long) which also consist of the numbers 1 to 6. For each of these smaller arrays I have to find the location on the larger array that is similar enough to pass my test if there is one.
The similarity for any one of the smaller arrays is found as follows:
1. Tally up the differences between each pair of numbers: one from the smaller array, and one from the larger, going for the whole length of the smaller array.
2. Sum the total difference between all pairs together.
3. Raise it to a high power because I need the results to follow a curve. (e.g. 40/40 has a similarity of 1, but 39/40 is only 0.81 similar, and 35/40 is 0.21 similar)
4. Advance the offset of the larger array by one. Starting over a 1, but this time the first element of the smaller array is compared to the second element of the larger array, and so on.
Unfortunately this means that for each smaller array I have to do an exponentiation operation for each element of the larger array. (e.g. to compare a smaller array to an array with 1000 characters, I have to call the 'pow' method over 1000 times) I also have find the difference between an amount of pairs of numbers equal to the number of elements in the smaller array multiplied by the number of elements in the larger array for each of the smaller arrays. (e.g. I have to find out if a 20 element smaller array matches elements 0-19 of the larger array, then elements 1-20 of the larger array, then elements 2-21 ... until the end of the larger array)
Can anyone suggest some optimizations I can do here?
Here's a snippet of code as an example:
void activator::calculateFree()
{
if(attachSize < 1)
return;
//attach size is the size of the binding domain of the activator
//e.g. the size of the 'smaller array'
calculateLigand();
if(!boundLigand)
{
//parent->mainDNA->size is the size of the 'larger array'
//a is the offset as we work our way through the larger array
for(unsigned int a = 0; a<=parent->mainDNA->size-attachSize; a++)
{
float totalDifference = 0;
float actualDifference = 0;
bool inhibited = false;
for(unsigned int b = 0; b<attachSize; b++)
{
totalDifference += 6;
//parent->mainDNA->code is the larger array
//attachBinding is the smaller array
actualDifference += abs(parent->mainDNA->code[a+b]-attachBinding[b]);
if(parent->mainDNA->inhibitedCode[a+b])
{
inhibited = true;
break;
}
}
if(inhibited)
continue;
//here (totalDifference-actualDifference)/totalDifference is the raw similarity between the two arrays
geneAffinity = (totalDifference-actualDifference)/totalDifference;
//try and avoid doing a pow if we don't have to
if(geneAffinity > 0.61324)
{
geneAffinity = pow(geneAffinity,8);
//For example, if we have a smaller array length of 20
//And at a particular offset 19 of its elements are equal to the larger arrays elements
//and the last element is only 1 different from the corresponding larger array element
//that grants a similarity of 119/120
//(totalDifference-actualDifference)/totalDifference in this case would be 0.99166
//raised to the 8th power, that's 0.935
//here we do a check to see if the similarity is close enough to do something with
float bindingAffinity = 1-geneAffinity;
//geneAffinity > 0.02
if(bindingAffinity < 0.98)
{
unsigned int randMax = 1+(bindingAffinity*1000);
if(rand() % randMax == 0)
{
boundDNA = parent->mainDNA;
boundPosition = a;
for(unsigned int b = 0; b<parent->unboundActivators.size(); b++)
{
if(parent->unboundActivators[b] == this)
{
parent->unboundActivators.erase(parent->unboundActivators.begin() + b);
break;
}
}
boundDNA->activators.push_back(this);
break;
}
}
}
}
}
}