Recipe 2.22 Improving StringBuilder Performance
Problem
In an attempt to
improve string-handling performance, you have converted your code to
use the StringBuilder class. However, this change
has not improved performance as much as you had hoped.
Solution
The chief advantage of a StringBuilder object over
a string object is that it preallocates a default
initial amount of memory in an internal buffer in which a string
value can expand and contract. When that memory is used, however,
.NET must allocate new memory for this internal buffer.
You can reduce the
frequency with which this occurs by explicitly defining the size of
the new memory using either of two techniques. The first approach is
to set this value when the StringBuilder class
constructor is called. For example, the code:
StringBuilder sb = new StringBuilder(200);
specifies that a StringBuilder object can hold
200 characters before new memory must be
allocated.
The second approach is to change the value after the
StringBuilder object has been created, using one
of the following properties or methods of the
StringBuilder object:
sb.Capacity = 200;
sb.EnsureCapacity(200);
Discussion
As noted in previous recipes in this chapter, the
string class is immutable; once a string is
assigned to a variable of type string, that
variable cannot be changed in any way. So changing the contents of a
string variable entails the creation of a new
string containing the modified string. The
reference variable of type string must then be
changed to reference this newly created string
object. The old string object will eventually be
marked for collection by the garbage collector, and, subsequently,
its memory will be freed. Because of this intensive behind-the-scene
action, code that performs intensive string manipulations using the
string class suffers greatly from having to create
new string objects for each string modification,
and greater pressure is on the garbage collector to remove unused
objects from memory more frequently.
The StringBuilder class solves this problem by
preallocating an internal buffer to hold a string. The contents of
this string buffer are manipulated directly. Any operations performed
on a StringBuilder object do not carry with it the
performance penalty of creating a whole new string
or StringBuilder object and, consequently, filling
up the managed heap with many unused objects.
There is one
caveat with using the StringBuilder class, which,
if not heeded, can impede performance. The
StringBuilder class uses a default initial
capacity to contain the characters of a string, unless you change
this default initial capacity through one of the
StringBuilder constructors. Once this space is
exceeded, by appending characters, for instance, a new string buffer
is allocated double the size of the original buffer. For example, a
StringBuilder object with an initial size of 20
characters would be increased to 40 characters, then to 80
characters, and so on. The string contained in the original internal
string buffer is then copied to this newly allocated internal string
buffer along with any appended or inserted characters.
The default capacity for a StringBuilder object is
16 characters; in many cases, this is much too small. To increase
this size upon object creation, the StringBuilder
class has an overloaded constructor that accepts an integer value to
use as the starting size of the preallocated string. Determining an
initial size value that is not too large (thereby allocating too much
unused space) or too small (thereby incurring a performance penalty
for creating and discarding a large number of
StringBuilder objects) may seem like more of an
art than a science. However, determining the optimal size may prove
invaluable when your application is tested for performance.
 |
In cases where good values for the initial size of a
StringBuilder object cannot be obtained
mathematically, try running the applications under a constant load
while varying the initial StringBuilder size. When
a good initial size is found, try varying the load while keeping this
size value constant. You may discover that this value needs to be
tweaked to get better performance. Keeping good records of each run,
and committing them to a graph, will be invaluable in determining the
appropriate number to choose. As an added note, using PerfMon
(Administrative Tools Performance Monitor) to detect and
graph the number of garbage collections that occur might also provide
useful information in determining whether your
StringBuilder initial size is causing too many
reallocations of your StringBuilder objects.
|
|
The most efficient method of setting the capacity of the
StringBuilder object is to set it in the call to
its constructor. The overloaded constructors of a
StringBuilder object that accept a capacity value
are defined as follows:
public StringBuilder(int capacity)
public StringBuilder(string str, int capacity)
public StringBuilder(int capacity, int maxCapacity)
public StringBuilder(string str, int startPos, int length, int capacity)
In addition to the constructor parameters, one property of the
StringBuilder object allows its capacity to be
increased (or decreased.) The Capacity property
gets or sets an integer value that determines the new capacity of
this instance of a StringBuilder object. Note that
the Capacity property cannot be less than the
Length property.
A second way to change the capacity is
through the EnsureCapacity method, which is
defined as follows:
public int EnsureCapacity(string capacity)
This method returns the new capacity for this object. If the capacity
of the existing object already exceeds that of the value in the
capacity parameter, the initial capacity
is retained, and this value is also returned by this method.
There is one problem with using these last two members. If any of
these members increases the size of the
StringBuilder object by even a single character,
the internal buffer used to store the string has to be reallocated.
However, minimizing the capacity of the object does not force a
reallocation of a new, larger internal string buffer. These methods
are useful if they are used in exceptional cases when the
StringBuilder capacity may need an extra boost, so
that fewer reallocations are performed in the long run.
The
StringBuilder object also contains a
Length property, which, if increased, appends
spaces to the end of the existing StringBuilder
object's string. If the Length is
decreased, characters are truncated from the
StringBuilder object's string.
Increasing the Length property can increase the
Capacity property, but only as a side effect. If
the Length property is increased beyond the size
of the Capacity property, the
Capacity property value is set to the new value of
the Length property. This property acts similarly
to the Capacity property:
sb.Length = 200;
 |
The
string and StringBuilder
objects are considered nonblittable, which
means that they must be marshaled across any managed/unmanaged
boundaries in your code. The reason is that strings have multiple
ways of being represented in unmanaged code, and there is no
one-to-one correlation between these representations in unmanaged and
managed code. In contrast, types such as byte,
sbyte, short,
ushort, int,
uint, long,
ulong, IntPtr, and
UIntPtr are blittable types
and do not require conversion between managed and unmanaged code.
One-dimensional arrays of these blittable types, as well as
structures or classes containing only blittable types, are also
considered blittable and do not need extra conversion when passed
between managed and unmanaged code.
The string and StringBuilder
objects take more time to marshal, due to conversion between managed
and unmanaged types. Performance will be improved when calling
unmanaged code through P/Invoke methods if only blittable types are
used. Consider using a byte array instead of a
string or StringBuilder object,
if at all possible.
|
|
See Also
See the "StringBuilder Class" topic
in the MSDN documentation.
|