Pre-allocating disk space? (3 Viewers)

rekenaar

Retired Team Member
  • Premium Supporter
  • December 17, 2006
    4,421
    805
    Home Country
    South Africa South Africa
    Nice post, And-81

    Will it be possible to calculate the size depending on the sheduled length + pre + post time?
    Not sure, but I recall my PC uses just over a gig per 30 minutes. Is this standard or does it change from PC to PC?
     

    and-81

    Retired Team Member
  • Premium Supporter
  • March 7, 2005
    2,257
    183
    Melbourne
    Home Country
    Australia Australia
    The problem is that it all depends on your tv system ... here in Australia we have not only analogue TV (the bitrate will depend on your tv card/quality settings) but also SD and HD Digital TV (DVB-T mpeg2).

    Adding to that problem is that different channels run at slightly different bitrates. All up the differences are enough that any estimated size could be accurate for one, but ridiculously far out for the other.

    I think incrementally stepping up the size of the recording file (512mb at a time or a similar amount) is probably the best way to go ...

    I might try implementing a proof of concept ... something that just files a file full of rubbish but steps it's size by predefined increments ... Multi-threaded so that it can have many of these fake recordings occurring at the same time. This would let me see how the OS handles the files and fragmentation.
     

    and-81

    Retired Team Member
  • Premium Supporter
  • March 7, 2005
    2,257
    183
    Melbourne
    Home Country
    Australia Australia
    ok, I've done a little proof of concept.

    I ran it on my htpc with the highly fragmented hard disk.

    It is a multithreaded windows app that writes to temporary files 1 byte at a time, every time they get larger than the defined "Chunk Size" the file is reset to be one chunk larger. Eg. When the file gets larger than 100 megabytes it will jump in size by another 100 megabytes.

    Here is the code for the thread:

    Code:
          string filename = Path.Combine(@"D:\My Videos", Path.GetRandomFileName());
    
          FileStream fileStream = new FileStream(filename, FileMode.CreateNew);
    
          const int MegaByte = 1024 * 1024;
    
          int chunkSize = 100 * MegaByte;
    
          try
          {
            long size = 0;
            int multi = 0;
            int lastMulti = 0;
    
            while (size <= 1000 * MegaByte)
            {
              fileStream.WriteByte((byte)(size % 255));
              size++;
    
              multi = (int)(size / chunkSize) + 1;
    
              if (multi > lastMulti)
              {
                fileStream.SetLength(multi * chunkSize);
                lastMulti = multi;
              }
    
    //          Thread.Sleep(10);
            }
          }
          catch
          {
    
          }
          finally
          {
            fileStream.Close();
            //File.Delete(filename);
          }


    I ran 5 threads simultaneously about 1 second apart. By the time they got to 600 megabytes I decided to terminate them...

    Here is the report from Defrag:

    Code:
    Fragments       File Size       Most fragmented files
    2,466           3.53 GB         \My Videos\Recorded TV\Seconds From Disaster - 2008-4-16.ts
    2,196           6.55 GB         \My Videos\Recorded TV\The Sum of All Fears - 2008-4-13.ts
    1,542           3.54 GB         \My Videos\Recorded TV\Lost - 2008-4-24.ts
    1,491           3.47 GB         \My Videos\Recorded TV\Lost - 2008-4-17.ts
    1,459           3.09 GB         \My Videos\Recorded TV\Medium - 2008-4-24.ts
    1,342           2.42 GB         \My Videos\Recorded TV\Police Ten 7 - 2008-4-25.ts
    1,332           2.42 GB         \My Videos\Recorded TV\Scrapheap Challenge - 2008-4-26.ts
    973             2.70 GB         \My Videos\Recorded TV\MythBusters - 2008-4-12.ts
    747             2.45 GB         \My Videos\Recorded TV\Scrapheap Challenge - 2008-4-23.ts
    602             2.43 GB         \My Videos\Recorded TV\Scrapheap Challenge - 2008-4-17.ts
    404             2.48 GB         \My Videos\Recorded TV\Airline - 2008-4-25.ts
    230             2.42 GB         \My Videos\Recorded TV\Scrapheap Challenge - 2008-4-14.ts
    213             2.47 GB         \My Videos\Recorded TV\Scrapheap Challenge - 2008-4-21.ts
    180             2.42 GB         \My Videos\Recorded TV\Scrapheap Challenge - 2008-4-12.ts
    7               600 MB          \My Videos\amtkky3d.lff
    7               600 MB          \My Videos\tk5at2i5.4lq
    7               600 MB          \My Videos\buloxeom.uo3
    7               600 MB          \My Videos\idaeiolr.1kg
    7               600 MB          \My Videos\nlmn2v2n.v0n

    Notice the last 5 files? Looks good to me :)

    It seems that windows is smart enough to allocate the new chunks in contiguous space where available.

    This code could be pretty easily transfered (would need some minor changes) into MediaPortal.

    Basically, all that would need to be added is to resize the file to the actual size at the end.

    I'll bring this up with the other devs and see if they want to try it out.

    Cheers,
     

    grubi

    Portal Pro
    June 16, 2007
    1,216
    80
    127.0.0.1
    Home Country
    Germany Germany
    gxtracker:

    Sorry but I have to completely disagree with you. Preallocation is a best practise approach in development which many popular application use. As I said you have to give the OS more information about what you are intending to do. How should the OS know that you intend to write to a file which is getting quite huge at the end? This has nothing to do with fixing OS layer issues at application layer.

    and-81:
    Interresting testresults. IMHO I would try to make the chunks even larger. If you want to make the process even more smooth you would have to use the "SetFileValidData" API call. Otherwise when expanding a file the new space is getting completetly overwritten with zeroes for security resons. This could be prevented by using the API call I mentioned (Beware that this call needs Admin rights which should not be a problem as it would run in the context of the service account.

    grubi.
     

    and-81

    Retired Team Member
  • Premium Supporter
  • March 7, 2005
    2,257
    183
    Melbourne
    Home Country
    Australia Australia
    @and-81:
    Interresting testresults. IMHO I would try to make the chunks even larger. If you want to make the process even more smooth you would have to use the "SetFileValidData" API call. Otherwise when expanding a file the new space is getting completetly overwritten with zeroes for security resons. This could be prevented by using the API call I mentioned (Beware that this call needs Admin rights which should not be a problem as it would run in the context of the service account.

    Yep, it was just a proof of concept. I would look at using a step size of 512mb and would use the API calls. But the concept has been proven with this simple test.

    Cheers,
     

    grubi

    Portal Pro
    June 16, 2007
    1,216
    80
    127.0.0.1
    Home Country
    Germany Germany
    @and-81:
    Interresting testresults. IMHO I would try to make the chunks even larger. If you want to make the process even more smooth you would have to use the "SetFileValidData" API call. Otherwise when expanding a file the new space is getting completetly overwritten with zeroes for security resons. This could be prevented by using the API call I mentioned (Beware that this call needs Admin rights which should not be a problem as it would run in the context of the service account.

    Yep, it was just a proof of concept. I would look at using a step size of 512mb and would use the API calls. But the concept has been proven with this simple test.

    Cheers,

    Great. What is quite interresting is in my environment also with non-parallel recording the drive is getting fragmented rather quickly although I have a seperate partition only for the mediafiles. So this change would be interresting to everyone I think.

    grubi.
     

    gxtracker

    Retired Team Member
  • Premium Supporter
  • July 25, 2005
    316
    2
    Home Country
    Canada Canada
    Sorry but I have to completely disagree with you. Preallocation is a best practise approach in development which many popular application use.

    I understand your point and I promise im not trying to argue with anyone. :) I just dont see pre-allocation of diskspace as the best solution to the problem. It certanly is a solution though, which Aaron has wonderfully demonstrated.

    A DBMS can pre-allocate disk space for storage, but many of the commercial or enterprise DBMS you may be thinking about do not run overtop of a somewhat self repairing IO filesystem like NTFS.

    Bittorrent applications do cache their downloads, but many of them do their caching in memory, not directly onto the disk. once the memory cache is full, its contents are dumped to the hard drive. This method still allows caching of data, while allowing the IO system to perform as it was designed by the operating system - as an instant read/write system. Remember that with hard drives burst traffic is still quicker than sustained read/write performance.

    Thinking further about this, memory caching of recordings in the TV server is something I would agree more with! with most HTPC systems sporting more than 1GB of memory today, allocatating storage space in 256mb blocks of memory at a time for example, then writing to the disk drive would greatly reduce fragmentation.

    forgive my C# ignorance, but when you set fileStream.SetLength(ChunkSize); is the filestream being stored in memory? or is the data being directly written to the drive?
     

    grubi

    Portal Pro
    June 16, 2007
    1,216
    80
    127.0.0.1
    Home Country
    Germany Germany
    >>A DBMS can pre-allocate disk space for storage, but many of the commercial or enterprise DBMS you may be
    >>thinking about do not run overtop of a somewhat self repairing IO filesystem like NTFS.

    Oracle, Firebird, MSSQL and DB2 on Windows Server run ontop of NTFS. It's up to you if you call them enterprise DBMS. I would tend to do so. However you could argue that most of them (except MSSQL of course) will run unter Unix/Linux in production environments.

    >>Bittorrent applications do cache their downloads, but many of them do their caching in memory, not directly onto
    >>the disk. once the memory cache is full, its contents are dumped to the hard drive.

    That's not true. They do exactly what "and-81" did. They preallocate the diskspace in the estimated size to prevent fragmentation. And the Windows copy command does the same also.

    >>Thinking further about this, memory caching of recordings in the TV server is something I would agree more with! >>with most HTPC systems sporting more than 1GB of memory today, allocatating storage space in 256mb blocks of >>memory at a time for example, then writing to the disk drive would greatly reduce fragmentation.

    But this could introduce new problems. While I agree with you that writing whole chunks of 256 MB to disk in one run is the overall most economic way to do this it could cause new problems if done in multithreaded/multi process realtime applications where the different parts have to stay in snyc to each other. There it would be better to write the data in smaller chunks to make the thread more responsive.

    >>forgive my C# ignorance, but when you set fileStream.SetLength(ChunkSize); is the filestream being stored in >>memory? or is the data being directly written to the drive?[/QUOTE]

    I'm not that familiar with .NET as I'm a C++ WinAPI developer so I have to leave this question to the guys who know. I guess the stream is simply expanded / truncated to the given size.

    grubi.
     

    gxtracker

    Retired Team Member
  • Premium Supporter
  • July 25, 2005
    316
    2
    Home Country
    Canada Canada
    However you could argue that most of them (except MSSQL of course) will run unter Unix/Linux in production environments.

    Which is outside the scope of this discussion.

    That's not true. They do exactly what "and-81" did. They preallocate the diskspace in the estimated size to prevent fragmentation. And the Windows copy command does the same also.

    No, im afraid they dont - not all of them at least. utorrent for example creates a memory cache to store the contents of its downloads then dumps the contents of the cache to file. see the attached screenshot below - notice the write statistics. data is stored into a 32mb block of memory, then purged in pieces to disk. It's a dynamic system that also measures file IO, so if the hard drive is being accessed a lot, more data will be stored in the cache and dumped at less frequent intervals. This is a much more elegant solution than pre-allocating disk space and caching directly to disk. Please show me a torrent application that pre-allocates hard drive space so I can avoid it like the plague.

    The windows copy command is optimized to deal with smaller files - hardly a comparison when considering the size differential of MPEG2 video data.

    But this could introduce new problems. While I agree with you that writing whole chunks of 256 MB to disk in one run is the overall most economic way to do this it could cause new problems if done in multithreaded/multi process realtime applications where the different parts have to stay in snyc to each other.

    Problems such as?

    You have to remember that you're still dealing with a datastream. whether it's stored in memory, or on disk, its still a stream. So timeshifting and other features can still make use of the stream. In fact, they may even benefit from it in a performance improvement.

    There it would be better to write the data in smaller chunks to make the thread more responsive.

    The 256mb cache was an example, but IMO if you go much smaller on the cache you might as well not cache the data at all and simply defrag the volume on a regular basis - which goes back to my original post.
     

    tourettes

    Retired Team Member
  • Premium Supporter
  • January 7, 2005
    17,301
    4,800
    You have to remember that you're still dealing with a datastream. whether it's stored in memory, or on disk, its still a stream. So timeshifting and other features can still make use of the stream. In fact, they may even benefit from it in a performance improvement.

    Reading from disk is many times easier and non-error prone as directly from different process' memory space.

    And what comes to the actual topic "pre-allocating" the files, I think its a really good way to prevent the fragmentation. Easiest and safest way to handle it.
     

    Users who are viewing this thread

    Top Bottom