Check if Two Words Are Anagrams
This function checks if two words are anagrams (which means that they have the same letters, but possibly in a different order). It ignores case differences in letters.
The program counts the letters in a string by allocating a 256-byte array and storing, in index n of the array, the number of times that the letter with an ASCII value of n appears (with an adjustment for case insensitivity-all letters are upper-cased first). This means that the program can't handle more than 256 of any one letter-that is not the bug. After the letter totals for the two words are counted, they are compared to see if they match, in which case the two words are anagrams.
The main procedure is called check_anagram. It calls a procedure do_count to count the letters in a single word. The address of the two words (strings terminated with a '\0' character) are pushed on the stack as parameters to check_anagram.
The procedures are not responsible for popping their parameters off the stack. (When check_anagram calls do_count, it is the responsibility of check_anagram, as the caller, to clean up the stack.) Because do_count is only called by check_anagram and, at a point where no registers need to be saved, do_count does not preserve passed-in registers, but check_anagram does.
The program uses the following string instructions:
stosd.
Store eax into dword ptr [edi]
scasb.
Compare al to byte ptr [edi]
lodsb.
Load byte ptr [esi] into al
cmpsd.
Compare dword ptr [esi] to dword ptr [edi]
Keep in mind how the rep prefix works on each iteration. If ecx is non-zero, it performs the primitive string instruction, decrements ecx without changing flags, and then (for cmps and scas) checks the zero flag result from the primitive operation.
The following instruction
sete al
sets al to 1 if the zero flag is 1; otherwise, it sets al to 0. (There is an entire collection of setXX instructions that match up with the jXX status-flag-based conditional jump instructions: setne, setl, and so on. They all set a byte to 1 if the condition is true, and 0 otherwise.)
Source Code
1. ; helper procedure called by check_anagram
2. do_count:
3.
4. ; do_count expects word, then count array
5. ; to be pushed
6. mov edi, [esp + 4] ; count array
7. mov ecx, 64
8. mov eax, 0
9. rep stosd ; clear count array
10.
11. ; find the length of word
12. mov edi, [esp + 8] ; word
13. mov ecx, -1 ; start it at -1 so it won't cause
14. ; the repne scasb to end
15. repne scasb ; terminates when [edi] points to a 0
16. ; ecx will be -(length of word + 2)
17. not ecx ; ecx is now (length of word + 2)
18. sub ecx, 2 ; ecx is now length of word
19.
20. jecxz countdone ; if length is zero, skip word
21.
22. mov edi, [esp + 4] ; count array
23. mov esi, [esp + 8] ; word
24.
25. startcount:
26. lodsb ; put next character in al
27. cmp al, 'a' ; if less than 'a'...
28. jl dontupper
29. cmp al, 'z' ; ...or greater than 'z'
30. jg dontupper ; ...don't uppercase it
31. sub al, 'a'
32. add al, 'A'
33.
34. dontupper:
35. inc byte ptr [edi+eax] ; update count array
36. loop startcount
37. countdone:
38. ret
39.
40. ; the main procedure
41. check_anagram:
42.
43. push ebp
44. mov ebp, esp
45. push ebx
46. push ecx
47. push esi
48. push edi
49. sub esp, 512 ; allocate count arrays
50.
51. mov ebx, [ebp + 8] ; first word
52. push ebx
53. lea ebx, [ebp - 528]
54. push ebx
55. call do_count
56. add esp, 8 ; clean parameters off stack
57.
58. mov ebx, [ebp + 12] ; second word
59. push ebx
60. lea ebx, [ebp - 272]
61. push ebx
62. call do_count
63. add esp, 8 ; clean parameters off stack
64.
65. mov eax, 0
66. lea esi, [ebp - 528]
67. lea edi, [ebp - 272]
68. mov ecx, 64
69. rep cmpsd
70.
71. sete al ; eax will be 1 if it matched, else 0
72.
73. add esp, 512 ; remove local variables
74. pop edi
75. pop esi
76. pop ecx
77. pop ebx
78. pop ebp
79. ret
Suggestions
It's best to focus on the do_count procedure first because it is called by check_anagram. Think of a trivial input to do_count, as well as an input that exercises all the code paths. The statement on line 35 is indexing into the count array that was passed as a parameter to do_count. What ensures that this will not index past the end of the array? Check that ecx is used correctly in all the rep instructions. Check that the count arrays allocated as local variables to check_anagram on line 49 are correctly addressed.
Hints
Walk through the code with the following parameters to check_anagram:
Anagram, case differs: "a" "A" Anagram, more than one character: "123" "321" Not an anagram, one string is blank: "" "hello" Not an anagram, differ only in count of one letter: "abca" "abc"
Explanation of the Bug
The calculation of the length of the word on lines 13-18 is incorrect:
mov ecx, -1 ; start it at -1 so it won't cause
; the repne scasb to end
repne scasb ; terminates when [edi] points to a 0
; ecx will be -(length of word + 2)
not ecx ; ecx is now (length of word + 2)
sub ecx, 2 ; ecx is now length of word
After the repne scasb, ecx will indeed be -(length of word + 2), as the comment states. The easiest word to check this against is an empty one, where byte ptr [edi] is 0 the first time the scasb instruction on line 15 executes. In this case, based on the rules of how the repne prefix works, it still decrements ecx once, so after line 16 ecx ends up as -2 for a zero-length string.
The bug is on line 17. The not instruction inverts every bit in ecx, which is not the same as negating it. To negate a number in two's complement, you invert every bit and then add one. Since the "add one" step is missing, ecx winds up as one less than it should be.
This could be classified as an A.off-by-one or a D.limit error because the behavior is that the code skips counting the last character in the word-unless the string is zero-length, in which case, ecx winds up as -1 and the startcount loop on lines 23-34 iterates (almost) forever (until it accesses bad memory).
The fix is to change line 17 to use the correct opcode to negate ecx, as follows:
neg ecx ; ecx is now (length of word + 2)
|