Recipe 2.5 Finding the Location of All Occurrencesof a String Within Another String
Problem
You need to search a string for every
occurrence of a specific string. In addition, the case-sensitivity,
or insensitivity, of the search needs to be controlled.
Solution
Using IndexOf or
IndexOfAny in a loop, we can determine how many
occurrences of a character or string exist as well as their locations
within the string. To find each occurrence of a case-sensitive string
in another string, use the following code:
using System;
using System.Collections;
public static int[] FindAll(string matchStr, string searchedStr, int startPos)
{
int foundPos = -1; // -1 represents not found
int count = 0;
ArrayList foundItems = new ArrayList( );
do
{
foundPos = searchedStr.IndexOf(matchStr, startPos);
if (foundPos > -1)
{
startPos = foundPos + 1;
count++;
foundItems.Add(foundPos);
Console.WriteLine("Found item at position: " + foundPos.ToString( ));
}
}while (foundPos > -1 && startPos < searchedStr.Length);
return ((int[])foundItems.ToArray(typeof(int)));
}
If the FindAll method is called with the following
parameters:
int[] allOccurrences = FindAll("Red", "BlueTealRedredGreenRedYellow", 0);
the string "Red" is found at locations 8 and 19 in
the string searchedStr. This code uses the
IndexOf method inside a loop to iterate through
each found matchStr string in the
searchStr string.
To find a case-sensitive character in a string, use the following
code:
public static int[] FindAll(char MatchChar, string searchedStr, int startPos)
{
int foundPos = -1; // -1 represents not found
int count = 0;
ArrayList foundItems = new ArrayList( );
do
{
foundPos = searchedStr.IndexOf(MatchChar, startPos);
if (foundPos > -1)
{
startPos = foundPos + 1;
count++;
foundItems.Add(foundPos);
Console.WriteLine("Found item at position: " + foundPos.ToString( ));
}
}while (foundPos > -1 && startPos < searchedStr.Length);
return ((int[])foundItems.ToArray(typeof(int)));
}
If the FindAll method is called with the following
parameters:
int[] allOccurrences = FindAll('r', "BlueTealRedredGreenRedYellow", 0);
the character 'r' is found at locations 11 and 15
in the string searchedStr. This code uses the
IndexOf method inside a do loop
to iterate through each found matchChar character
in the searchStr string. Overloading the
FindAll method to accept either a
char or string type avoids the
performance hit of boxing the char type to a
string type.
To find each case-insensitive occurrence of a string in another
string, use the following code:
public static int[] FindAny(string matchStr, string searchedStr, int startPos)
{
int foundPos = -1; // -1 represents not found
int count = 0;
ArrayList foundItems = new ArrayList( );
// Factor out case-sensitivity
searchedStr = searchedStr.ToUpper( );
matchStr = matchStr.ToUpper( );
do
{
foundPos = searchedStr.IndexOf(matchStr, startPos);
if (foundPos > -1)
{
startPos = foundPos + 1;
count++;
foundItems.Add(foundPos);
Console.WriteLine("Found item at position: " + foundPos.ToString( ));
}
}while (foundPos > -1 && startPos < searchedStr.Length);
return ((int[])foundItems.ToArray(typeof(int)));
}
If the FindAny method is called with the following
parameters:
int[] allOccurrences = FindAll("Red", "BlueTealRedredGreenRedYellow", 0);
the string "Red" is found at locations 8, 11, and
19 in the string searchedStr. This code uses the
IndexOf method inside a loop to iterate through
each found matchStr string in the
searchStr string. The search is rendered
case-insensitive by using the ToUpper method on
both the searchedStr and the
matchStr strings.
To find a character in a string, use
the following code:
public static int[] FindAny(char[] MatchCharArray, string searchedStr, int startPos)
{
int foundPos = -1; // -1 represents not found
int count = 0;
ArrayList foundItems = new ArrayList( );
do
{
foundPos = searchedStr.IndexOfAny(MatchCharArray, startPos);
if (foundPos > -1)
{
startPos = foundPos + 1;
count++;
foundItems.Add(foundPos);
Console.WriteLine("Found item at position: " + foundPos.ToString( ));
}
}while (foundPos > -1 && startPos < searchedStr.Length);
return ((int[])foundItems.ToArray(typeof(int)));
}
If the FindAll method is called with the following
parameters:
int[] allOccurrences = FindAll(new char[] MatchCharArray = {'R', 'r'},
"BlueTealRedredGreenRedYellow", 0);
the characters 'r' or 'R' are
found at locations 8, 11, 15, and 19 in the string
searchedStr. This code uses the
IndexOfAny method inside a loop to iterate through
each found matchStr string in the
searchStr string. The search is rendered
case-insensitive by using an array of char
containing all characters, both upper- and lowercase, to be searched
for.
Discussion
In the example code, the foundPos variable
contains the location of the found character/string within the
searchedStr string. The
startPos variable contains the next position in
which to start the search.
The IndexOf or
IndexOfAny method is used to perform the actual
searching. The count variable simply counts the
number of times the character/string was found in the
searchedStr string.
The example used a do loop so that the
IndexOf or IndexOfAny operation
would be executed at least one time before the check in the
while clause is performed to determine whether
there are any more character/string matches to be found in the
searchedStr string. This loop terminates when
foundPos returns -1 (meaning
that no more character/strings can be found in the
searchedStr string) or when an out-of-bounds
condition exists. When foundPos equals
-1, there are no more instances of the match value
in the searchedStr string; therefore, we can exit
the loop. If, however, the startPos overshoots the
last character element of the searchedStr string,
an out-of-bounds condition exists and an exception is thrown. To
prevent this, always check to make sure that any positioning
variables that are modified inside of the loop, such as the
startPos variable, are within their intended
bounds.
Once a match is found by the IndexOf or
IndexOfAny method, the if
statement body is executed to increment the count
variable by one and to move the startPos up past
the previously found match. The count variable is
incremented by one to indicate that another match was found. The
startPos is increased to the starting position of
the last match found plus 1. Adding
1 is necessary so that we do not keep matching the
same character/string that was previously matched, which would cause
an infinite loop to occur in the code if at least one match was found
in the searchedStr string. To see this behavior,
remove the +1 from the code.
There is one potential problem with this code. Consider the case
where:
searchedStr = "aa";
matchStr = "aaaa";
The code contained in this recipe would match "aa"
three times.
(aa)aa
a(aa)a
aa(aa)
This situation may be fine for some applications, but not if you need
it to return only the following matches:
(aa)aa
aa(aa)
To do this, change the following line in the while
loop:
startPos = foundPos + 1;
to this:
startPos = foundPos + matchStr.Length;
This code moves the startPos pointer beyond the
first matched string, disallowing any internal matches.
To convert this code to use a while loop rather
than a do loop, the foundPos
variable must be initialized to 0 and the
while loop expression should be as follows:
while (foundPos >= 0 && startPos < searchStr.Length)
{
foundPos = searchedStr.IndexOf(matchChar, startPos);
If (foundPos > -1)
{
startPos = foundPos + 1;
count++;
}
}
See Also
See the "String.IndexOf Method" and
"String.IndexOfAny Method" topics
in the MSDN documentation .
|