Alphabetize Words
This function is passed a string buffer. It splits the string into words and alphabetizes them. The return value is a list containing the words in alphabetical order.
For the purposes of alphabetizing them, strings are compared with the Python < operator, which compares strings using the built-in ord() function that winds up comparing them based on ASCII values. Thus, upper- and lowercase letters are unequal; the uppercase letters are considered "earlier" than the lowercase ones.
Source Code
1. def alphabetize ( buffer ):
2.
3. """Split buffer into words and return a list of them
4. in alphabetical order.
5.
6.
7. """
8.
9. alphaList = [ ]
10. wordBegin = 0
11.
12. """ NOTE: The code does not check the indexing into
13. buffer to prevent it from pointing past the end;
14. instead it catches the IndexError exception
15. that is raised when it does.
16. """
17.
18. while True:
19.
20. """ Find start of next word.
21. """
22.
23. try:
24. while buffer[wordBegin] == " ":
25. wordBegin = wordBegin + 1
26. except IndexError:
27. break
28.
29. """ Find end of the word.
30. """
31.
32. wordEnd = wordBegin+1
33. try:
34. while buffer[wordEnd] != " ":
35. wordEnd = wordEnd + 1
36. except IndexError:
37. pass
38.
39. newWord = buffer[wordBegin:wordEnd-1]
40.
41. alphaLen = len(alphaList)
42. for j in range(alphaLen):
43. if alphaList[j] > newWord:
44. alphaList[j:j] = [ newWord ]
45. break;
46. else:
47. alphaList[alphaLen:] = [ newWord ]
48.
49. wordBegin = wordEnd + 1
50.
51.
52. return alphaList
Suggestions
What are the empty and trivial inputs for this function? Is there the equivalent of an already solved input? What is the goal of alphaList and wordBegin after one iteration through the while loop? The blocks on lines 23-27 and lines 33-37 are similar, except for a couple of differences. Because code like this might have been pasted and modified, ensure that the differences are correct. Because a while True loop is infinite, unless a break statement is executed, verify that this function terminates on all inputs.
Hints
Walk through the function with the following parameters:
The buffer has blanks at the beginning: buffer == " Help" The buffer has blanks at the end: buffer == "one two " The buffer has one-character words with no extra spaces: buffer == "A B C D"
Explanation of the Bug
The bug is in the calculation of newWord on line 39:
newWord = buffer[wordBegin:wordEnd-1]
It is true that wordEnd points to the space after the word. However, in the slice notation in Python, the final index is not included in the slice, so the code has an A.off-by-one error. The code should read as follows:
newWord = buffer[wordBegin:wordEnd]
wordEnd itself is calculated correctly, so wordBegin is assigned to point to the right place on line 49, and the search for the next word begins at the correct spot. The only problem is that every word in alphaList has its last character truncated. (This happens even before alphaList is scanned to find the proper place for the new word, so the list is alphabetized correctly, just with words that are too short.) Using the input from any of the three hints shows this effect. The third hint shows the effect most dramatically because all the words are only one character long to begin with, and thus, all get truncated to empty strings.
|