Find a Substring

This function finds a substring within a string. It returns a tuple with two elements:

The first element is the part of the outer string before the substring.
The second element is the rest of the outer string (beginning with the substring).

If the substring is not found, the first element of the tuple is the entire original string and the second element is an empty string.

Note

The standard Python string library has a function, find(), that does essentially the same operation. (It returns the index of the first occurrence of the substring, not a tuple.) This book doesn't use that here.

Source Code


 1.     def findSubstring ( outerString, subString ):

 2.

 3.       """Finds the first occurrence of subString within

 4.          outerString.

 5.

 6.          Returns a tuple of the part of the string

 7.          before the first occurrence of subString,

 8.          and the part starting at the first occurrence

 9.          of subString. If not found, the second tuple

10.          is empty.

11.

12.       """

13.

14.       outerLen = len(outerString)

15.       subLen = len(subString)

16.       flag = 1

17.

18.       for i in range(outerLen):

19.

20.           for j in range(subLen):

21.

22.                 if outerString[i+j] != subString[j]:

23.                    break

24.

25.           else:

26.

27.               # wind up here if for j loop terminates

28.               # naturally.

29.

30.               flag = 0

31.               break    # out of for i loop

32.

33.       return \

34.           ( outerString[:(i+flag)], outerString[(i+flag):] )

Suggestions

Consider the variable flag, which has a fairly unhelpful name. Determine the goal of flag and a better name for it.
What are the various situations that the code could encounter (for example, a substring that is not found at all)?
Look at the places where the code indexes into outerString and subString. What are the restrictions on indexing into those strings, and what does that imply about how the variables used need to be restricted?

Hints

Walk through the code with the following inputs:

Part of the substring matches, but not all:
outerString == "Hello", subString == "Hi"
The substring is found in the outer string:
outerString == "blue", subString == "l"
The beginning of the substring matches the end of the outer string:
outerString = "ball", subString == "llama"

Explanation of the Bug

The problem occurs on line 22:


if outerString[i+j] != subString[j]:

Because i is indexing through a range that goes up to (but not including) the length of outerString, adding j to the index might make it invalid, which is a D.index error. This happens in the situation in the third hint, where the first N characters of subString match the last N characters of outerString, and subString then has at least one more character in it. When i is 2, the if on line 22 will be false (outerString[i+j] is equal to subString[j]) when j is 0 and 1 (the last two l's in "ball" matching the first two l's in "llama"). Because of this the inner loop will iterate again with j equal to 2. At that point, i+j is 4 and the expression outerString[i+j] will result in an "index out of range" error.

This could be fixed in several ways. One way would be to fix the range() used in the inner for loop. A more obvious fix would be to directly guard against the subscript access by checking for it just before line 22, adding a couple of lines:


if (i+j) >= outerLen:

    break

Table of Contents