How to isolate text between two HTML tags (Views: 100)


I have a TRichEdit.Lines (TStrings) where I want to extract a string and copy it to another string. I use ScanF to find begining of string which is ''. Then I need to find either next '<' or end of Line. Once I do all this, how do I extract this string and copy it to another string?


See the Copy function. Perhaps the following routine can be of use for you, it uses the diverse PChar-based string functions instead of the standard String Pos and Copy, basically because it is a bit easier in this case to work with pointers.

procedure IsolateTextBetweentags(const S: string; Tag1, Tag2: string; list: TStrings);
  pScan, pEnd, pTag1, pTag2: PChar;
  foundText: string;
  searchtext: string;
  {Set up pointers we need for the search. HTML is not case sensitive, so
  we need to perform the search on a uppercased copy of S}
  searchtext := Uppercase(S);
  Tag1 := Uppercase(Tag1);
  Tag2 := Uppercase(Tag2);
  pTag1 := PChar(Tag1);
  pTag2 := PChar(Tag2);
  pScan := PChar(searchtext);
    {Search for next occurence of Tag1}
    pScan := StrPos(pScan, pTag1);
    if pScan <> nil then
      {Found one, hop over it, then search from that position forward for the
                        next occurence of Tag2}
      Inc(pScan, Length(Tag1));
      pEnd := StrPos(pScan, pTag2);
      if pEnd <> nil then
        {Found start and end tag, isolate text between, add it to the list. We need to
        get the text from the original S, however, since we
                                want the un-uppercased version!}
        SetString(foundText, Pchar(S) + (pScan - PChar(searchtext)), pEnd - pScan);
        {Continue next search after the found end tag}
        pScan := pEnd + Length(tag2);
        {Error, no end tag found for start tag, abort}
        pScan := nil;
    pScan = nil;

procedure TForm1.Button1Click(Sender: TObject);
  with opendialog1 do
    filter := 'HTML files|*.HTM; *.HTML';
    if execute then
      richedit1.PlainText := true;
      IsolateTextBetweenTags(richedit1.text, '

', '

', memo2.lines);

<< Back to main page