Recipe 8.2 Extracting Groups from a MatchCollection
Problem
You have a regular expression that
contains one or more named groups, such as the following:
\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\
where the named group TheServer will match any
server name within a UNC string, and TheService
will match any service name within a UNC string.
You need to store
the groups that are returned by this regular expression in a keyed
collection (such as a Hashtable) in which the key
is the group name.
Solution
The
RegExUtilities class contains a method,
ExtractGroupings, that obtains a set of
Group objects keyed by their matching group name:
using System;
using System.Collections;
using System.Text.RegularExpressions;
public static ArrayList ExtractGroupings(string source,
string matchPattern,
bool wantInitialMatch)
{
ArrayList keyedMatches = new ArrayList( );
int startingElement = 1;
if (wantInitialMatch)
{
startingElement = 0;
}
Regex RE = new Regex(matchPattern, RegexOptions.Multiline);
MatchCollection theMatches = RE.Matches(source);
foreach(Match m in theMatches)
{
Hashtable groupings = new Hashtable( );
for (int counter = startingElement;
counter < m.Groups.Count; counter++)
{
// If we had just returned the MatchCollection directly, the
// GroupNameFromNumber method would not be available to use
groupings.Add(RE.GroupNameFromNumber(counter),
m.Groups[counter]);
}
keyedMatches.Add(groupings);
}
return (keyedMatches);
}
The ExtractGroupings method can be used in the
following manner to extract named groups and organize them by name:
public static void TestExtractGroupings( )
{
string source = @"Path = ""\\MyServer\MyService\MyPath;
\\MyServer2\MyService2\MyPath2\""";
string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";
foreach (Hashtable grouping in
ExtractGroupings(source, matchPattern, true))
{
foreach (DictionaryEntry DE in grouping)
Console.WriteLine("Key / Value = " + DE.Key + " / " +
DE.Value);
Console.WriteLine("");
}
}
This test method creates a source string and a
regular expression pattern in the MatchPattern
variable. The two groupings in this regular expression are
highlighted here:
string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";
The names for these two groups are: TheServer and
TheService. Text that matches either of these
groupings can be accessed through these group names.
The source and matchPattern
variables are passed in to the ExtractGroupings
method, along with a Boolean value, which we will discuss shortly.
This method returns an ArrayList containing
Hashtable objects. These Hashtable objects
contain the matches for each of the named groups in the regular
expression, keyed by their group name.
This test method, TestExtractGroupings, returns
the following:
Key / Value = 0 / \\MyServer\MyService\
Key / Value = TheService / MyService
Key / Value = TheServer / MyServer
Key / Value = 0 / \\MyServer2\MyService2\
Key / Value = TheService / MyService2
Key / Value = TheServer / MyServer2
If the last parameter to the ExtractGroupings
method were to be changed to false, the following
output would result:
Key / Value = TheService / MyService
Key / Value = TheServer / MyServer
Key / Value = TheService / MyService2
Key / Value = TheServer / MyServer2
The only difference between these two outputs are that the first
grouping is not displayed when the last parameter to
ExtractGroupings is changed to
false. The first grouping is always the complete
match of the regular expression.
Discussion
Groups within a regular expression can be defined in one of two ways.
The first way is to add parentheses around the subpattern that you
wish to define as a grouping. This type of grouping is sometimes
labeled as unnamed. This
grouping can later be easily extracted from the final text in each
Match object returned by running the regular
expression. The regular expression for this recipe could be modified,
as follows, to use a simple unnamed group:
string matchPattern = @"\\\\(\w*)\\(\w*)\\";
After running the regular expression, you can access these groups
using a numeric integer value starting with 1.
The second way to define a group within a regular expression is to
use one or more named groups. A
named group is defined by adding parentheses around the subpattern
that you wish to define as a grouping and, additionally, adding a
named value to each grouping, using the following syntax:
(?<Name>\w*)
The Name portion of this syntax is the
name you specify for this group. After executing this regular
expression, you can access this group by the name
Name.
To access each group, you must first use a loop to iterate each
Match object in the
MatchCollection. For each Match
object, you access the
GroupCollection's indexer, using
the following unnamed syntax:
string group1 = m.Groups[1].Value;
string group2 = m.Groups[2].Value;
or the following named syntax where m is the
Match object:
string group1 = m.Groups["Group1_Name"].Value;
string group2 = m.Groups["Group2_Name"].Value;
If the Match method was used to return a single
Match object instead of the
MatchCollection, use the following syntax to
access each group:
// Un-named syntax
string group1 = theMatch.Groups[1].Value;
string group2 = theMatch.Groups[2].Value;
// Named syntax
string group1 = theMatch.Groups["Group1_Name"].Value;
string group2 = theMatch.Groups["Group2_Name"].Value;
where theMatch is the Match
object returned by the Match method.
See Also
See the ".NET Framework Regular
Expressions" and "Hashtable
Class" topics in the MSDN documentation.
|