Parse a String into Substrings
This function parses a string into substrings delimited by a specified character. The function takes as input a string, a delimiter, and an index. It splits the string into substrings that are separated by the delimiter, and returns the appropriate substring based on the index, where the first substring would be indexed as 1, the second as 2, and so on.
For example, with an input string "This#is#a#test" and a delimiter of '#', index 2 would point to the substring "is". (C strings are just arrays of type char; normally, the end of a string is marked by having a character with value 0, also written as '\0', but in this case, to avoid having to copy the data or modify the input string, the substring is returned by indicating its starting point and length.)
Source Code
1. char *
2. parse_string(
3. char * input_string,
4. char delimiter,
5. int index,
6. int * return_length) {
7.
8. char * curr_substring = input_string;
9. char * curr_character = input_string;
10. int found_so_far = 0;
11.
12. while (1) {
13.
14. ++found_so_far;
15.
16. // At end of input string, exit the loop.
17.
18. if (*curr_substring == '\0')
19. break;
20.
21. // If at delimiter, no need to loop through.
22.
23. if (*curr_substring != delimiter) {
24. curr_character = curr_substring;
25.
26. // Not at delimeter, so loop to find it.
27.
28. while ((*curr_character != '\0') &&
29. (*curr_character != delimiter)) {
30. ++curr_character;
31. }
32. }
33.
34. if ((found_so_far == index) ||
35. (*curr_character == '\0'))
36. break;
37.
38. // skip over the delimiter character
39.
40. curr_substring = curr_character+1;
41. }
42.
43. if (found_so_far == index) {
44. *return_length =
45. (curr_character - curr_substring);
46. return curr_substring;
47. } else {
48. return NULL;
49. }
50. }
Suggestions
At line 33, what are the possible values for *curr_character (the character pointed to by curr_character)? Because the while (1) loop starting on line 12 has a condition that is always true, it depends on other variables changing for it to exit. What comparisons cause the loop to exit? Are the variables being tested guaranteed to change each time through the loop? What is the "implied else" of the if statement on line 23? found_so_far is incremented on line 14 but is not tested until line 34. Is this correct?
Hints
Walk through the code with the following parameters to the function:
Normal operation that finds a substring: s is "ab/cd/ef", delimeter is '/', and index is 2. The index is well past the last substring: s is "t-u-v", delimeter is '-', index is 5. The index is just after the last substring: s is "hello$", delimeter is '$', index is 2.
Explanation of the Bug
The code on line 24 to initialize curr_character
curr_character = curr_substring;
has an F.init error; it will not execute if the if statements on line 18 or line 23 are false, yet curr_character can still be used in those situations.
The solution is to move line 24 up to around line 15, so it is always set correctly if the while loop exits.
The mistake can cause two different errors:
If the if on line 18 is true and the while loop exits because of the break statement on line 19, curr_character remains set from the assignment on line 35 from the previous iteration of the while loop; thus, the calculation of return_length on line 44 evaluates to -1. This happens when the input string ends in a delimiter character and the index asks for the substring that goes just after that final delimiter (which is the case in the third hint). If the if on line 23 is false, curr_character remains unchanged for that iteration of the while loop. This means that the assignment of curr_substring on line 40 keeps its current value, so the while loop keeps looping until the if on line 34 is true because found_so_far == index (the other part of that if expression, *curr_character == '\0', won't ever be true because curr_character continues to point 1 byte before curr_substring; thus not to a '\0' character). After the while loop exits, the calculation of return_length on line 44 again equals -1. This happens if the input string has two delimiter characters in a row.
|