Mirror

Splitting Files (Small Code & Blazing Fast) (Views: 706)


Problem/Question/Abstract:

How can I split up a file into smaller pieces of specified size and have the source code simple at the same time.

Answer:

Now why would one want to split up files? A reason could be that it is too large to be transferred reliably to another computer. Hence you chop it up into snmaller manageable pieces, transfer the pieces and re-assemble them in the target computer. Here is a very small, simple and very fast function for splitting a specified file into smaller files of specified size (in bytes). The function uses streams & is more or less self explainatory. Error handling is currently minimal & can be extended. The function does not modify the original file in any manner, but merely creates new files in the same directory as the original file with sequenced extensions (.001, .002, ...).

What's the use of splitting if you cannot put them together again? To join up the split files, you can use the command line:

Copy /B File1 + File2 + File3 ... TargetFile

Save the following code to a file named "SplitFl.pas", use it in your source with the "Uses SplitFl" clause and you are ready to split (hopefully not of laughter)!

{******************************************************}
{* Description: Splits a specified file into pieces   *}
{*              of specified size.                    *}
{******************************************************}
{* Last Modified : 12-Mar-2001                        *}
{* Author        : Paramjeet Reen                     *}
{******************************************************}
{* I do not gurantee the fitness of this program.     *}
{* Please use it at your own risk.                    *}
{******************************************************}
{* Category :Freeware.                                *}
{******************************************************}

unit SplitFl;

interface

procedure SplitFile(const pFileName: AnsiString; const pSplitSize: LongInt);

implementation

uses
  Classes, SysUtils, Dialogs;

function Smaller(const a, b: LongInt): LongInt;
begin
  if (a < b) then
  begin
    Result := a;
  end
  else if (b > 0) then
  begin
    Result := b
  end
  else
    Result := 0;
end;

procedure SplitFile(const pFileName: AnsiString; const pSplitSize: LongInt);
var
  vInpFl: TFileStream;
  vOutFl: TFileStream;
  vCtr: Integer;
begin
  vInpFl := TFileStream.Create(pFileName, fmOpenRead);

  if (vInpFl.Size > pSplitSize) then
  begin
    vCtr := 0;
    while (vInpFl.Position < vInpFl.Size) do
    begin
      Inc(vCtr);
      vOutFl := TFileStream.Create(pFileName + '.' + FormatFloat('000', vCtr),
        fmCreate);
      vOutFl.CopyFrom(vInpFl, Smaller(pSplitSize, vInpFl.Size - vInpFl.Position));
      vOutFl.Free;
    end;
  end
  else
    MessageDlg('File too small to split!', mtInformation, [mbOk], 0);

  vInpFl.Free;
end;

end.

= = = = = = = = = = = = = = file Split Act - I Scene - II = = = = = = = = = = = = = =

The story so far was that I believed that I had made a decent file

splittingfunction that was both small & fast.However, it was pointed out that it is not  fast when it comes to handling HUGE files.I then  discovered the $F000 limit to the intermediate memory buffer & thought it to be the  cause.Also another suggestion of using the "FILE_FLAG_SEQUENTIAL_SCAN" flag for opening  the input & output files would yield performance benefits.Keeping all the above in  mind, I re - worked my original code to the one given below.However, surprisingly,  there is no appreciable speed benefit!! Perhaps someone can tell me why and suggest  improvements...

unit SplitFl;

interface

procedure SplitFile(const pFileName: AnsiString; const pSplitSize: LongInt);

implementation

uses
  Classes, SysUtils, Dialogs, Windows;

function Smaller(const a, b: LongInt): LongInt;
begin
  if (a < b) then
  begin
    Result := a;
  end
  else if (b > 0) then
  begin
    Result := b
  end
  else
    Result := 0;
end;

procedure SplitFile(const pFileName: AnsiString; const pSplitSize: LongInt);
var
  vInpFlHandle: Integer;
  vOutFlHandle: Integer;
  vInpBytesLft: Integer;
  vOutBytesLft: Integer;
  vBufferSize: Integer;
  vBytesDone: Integer;
  vBuffer: Pointer;
  vCtr: Integer;
begin

  //Use one of the following options to open the file.
  //vInpFlHandle := Integer(CreateFile(PChar(pFileName),GENERIC_READ,FILE_SHARE_READ,nil,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,FILE_FLAG_SEQUENTIAL_SCAN));
  vInpFlHandle := FileOpen(pFileName, 0);

  vInpBytesLft := FileSeek(vInpFlHandle, 0, 2);

  if (vInpBytesLft > pSplitSize) then
  begin
    vBufferSize := Smaller(GetHeapStatus.TotalUncommitted, pSplitSize);
    GetMem(vBuffer, vBufferSize);

    FileSeek(vInpFlHandle, 0, 0);
    vCtr := 0;

    while (vInpBytesLft > 0) do
    begin
      Inc(vCtr);

      //Use one of the following options to open the file.
      //vOutFlHandle := Integer(CreateFile(PChar(pFileName + '.' + FormatFloat('000', vCtr)),GENERIC_READ or GENERIC_WRITE,0,nil,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,FILE_FLAG_SEQUENTIAL_SCAN));
      vOutFlHandle := FileCreate(pFileName + '.' + FormatFloat('000', vCtr));

      vOutBytesLft := Smaller(vInpBytesLft, pSplitSize);

      while (vOutBytesLft > 0) do
      begin
        vBytesDone := FileRead(vInpFlHandle, vBuffer^, Smaller(vOutBytesLft,
          vBufferSize));
        FileWrite(vOutFlHandle, vBuffer^, vBytesDone);
        Dec(vInpBytesLft, vBytesDone);
        Dec(vOutBytesLft, vBytesDone);
      end;

      FileClose(vOutFlHandle);
    end;

    FreeMem(vBuffer);
  end
  else
    MessageDlg('File too small to split!', mtInformation, [mbOk], 0);

  FileClose(vInpFlHandle);
end;

end.

The TFileStream.Create calls the FileCreate in SysUtils, I've had some success, by creating a separate TFileStream constructor called TFileStream.CreateSeqScan that calles this SeqScanFileCreate instead of FileCreate, and adds the FILE_FLAG_SEQUENTIAL_SCAN to the windows API CreateFile

function SeqScanFileCreate(const FileName: string): Integer;
begin
  Result := Integer(CreateFile(PChar(FileName), GENERIC_READ or GENERIC_WRITE,
    0, nil, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL or FILE_FLAG_SEQUENTIAL_SCAN, 0));
end;

this allows the operating system to read ahead as much as memory allows and write in larger chunks than the $F000

<< Back to main page