Vision
Working with large files
Written by: Margriet Bruggeman, Nikander Bruggeman.
June 10, 2008
One of our customers had issues working with Microsoft Office SharePoint Server 2007 and large files. They received time-out errors as well as general errors when working with large files (in this case, files that are larger than 25 MB). A small percentage of the files they work with are really large; they are close to 150MB in size. After trying out several solutions in a Knowledge Base article and Joel Oleson’s blog post "File Name, Length, Size and Invalid Character Restrictions and Recommendations" we were still not very happy with the results. At that point, we decided to call MS Support to try and find out if what we were trying to achieve was even feasible. This resulted in a lot of useful information about working with large files in MOSS. Mind you, we did not come up with most of this info ourselves, we have to thank MS Support and MS documentation for that (and most specifically Denes Theisz of MS Support), but we decided to blog about it anyway because we’re convinced it contains valuable information that can help out other people as well.
Average file sizes
First of all, what qualifies as a large file? The default file upload size is 50MB. This limit applies to a single item, although it also seems to equate to the max upload size of the entire batch of files that is uploaded via the Upload > Upload Multiple Documents option in the user interface of a list.
In his (very interesting and useful) blog post “File Name, Length, Size and Invalid Character Restrictions and Recommendations”, Joel Oleson mentions the default max file size of 50 MB is also the one that is recommended for the best experience, that Microsoft IT limits the max file upload size to 100 MB, that large files of 200-500 MB could be supported in a LAN, and that the absolute limit of the max file size is 2GB (which is a limit imposed by SQL Server). All of these numbers are, at least in our experience, quite optimistic, but we’ll get back to that later.
Let’s quote Joel Oleson one more time... Joel has also created an interesting presentation called "Is the File Server dead?" that expresses that MOSS shouldn’t be used as a substitute for the Windows file system. In this presentation, he describes a sampling of files that can be found in a SharePoint farm that contains around 2.5 million files (which equates to a size of roughly 2 TB). After analyzing the contents of this farm, Joel Oleson concludes that most files used in that MOSS environment consist of Office files (which we assume will be true for most MOSS implementations, although PDF files will typically make up a large part of the contents as well), 32% of them being .doc files with average file sizes of 500KB. In this specific farm, the largest files are up to 5MB.
These findings match our own experiences, accept for cases in which a company scans content and places that content in MOSS or works with large drawings. In such cases, file sizes exceed these numbers considerably. In such scenarios, a document size that lies somewhere between 100MB and 150MB is certainly possible.
How to facilitate working with large files?
All in all, working with large files in a MOSS environment is less than ideal. Having said that, there are a couple of things you can do to make working with large files easier.
Network bandwidth
First of all, you need to optimize the available network bandwidth. If you’re planning to work with large files in MOSS, your network bandwidth should be at least 100 Mbit full duplex. You can use the MOSS Multiple Document Upload tool to provide an estimate of the network speed currently available to you (although this estimation in general seems to be too pessimistic). There are two problems associated to having a slow network:
- It takes a while before the end user is able to open or save a document (this is an obvious one).
- The document (or at least parts of the document) gets loaded in the memory of the MOSS web front end (WFE) handling the request, and the SQL Server containing the document. The longer it takes the end user to open a document, the longer server resources are tied up. Memory usage suffers the most from this. So, please note, having a slow network can have a severe impact on the performance of your MOSS farm.
SharePoint configurations and installation
There are a couple of SharePoint configurations that make it easier to work with large files in MOSS. They are:
- Adjust the maximum upload size in SharePoint Central Administration (SCA).
- Adjust the IIS connection time-out settings.
- Adjust various ASP.NET time-out settings for a specific SharePoint web application.
- Adjust the default chunk size.
- Use 64-bit web front-ends.
All the configuration changes are described in detail in the KB article and Oleson’s blog post mentioned earlier in this article, so there’s no need to discuss the how-to’s in this article all over again.
Chunk size bug in MOSS
The chunk size determines the amount of data that the client retrieves in one go when opening a document. For example, if a client tries to open a document of 50MB and the chunk size is 10MB, the document is divided and retrieved in 5 chunks. Each chunk will be loaded into the memory of both the WFE handling the request and SQL Server. The default chunk size is 5MB and you can adjust the chunk size by issuing the following stsadm command:
Stsadm.exe -o setproperty -pn large-file-chunk-size -pv [size in bytes]As you’ve seen, you can change the chunk size setting. Raising it could benefit the client because a big chunk of 50MB loads faster than 10 chunks of 5MB. Then again, this also means that 50MB gets loaded into the memory of the WFE and SQL Server (instead of only 5MB at a time), thus increasing the load on your MOSS farm. This memory will only be released when the client has finished loading the entire document. It depends on your farm and the number of users accessing it if it’s a viable alternative for you to raise the chunk size. You can only answer this question truthfully by trying and monitoring your server farm intensively.
Now, let’s discuss the problem associated to chunk size. Because of a bug in MOSS, the chunk size doesn’t work within Explorer view (or any other method that uses WebDAV as the underlying protocol). This is a bug that only has been discovered recently. Currently there is nothing you can do about this problem, MS expects a bug fix rollup that is currently scheduled around August 2008 that will address the chunk size bug.
As a result, opening a file of 50MB via Explorer view places a much higher load on the server ( 50MB is loaded into the memory of the WFE for this document) compared to performing the same action via the standard SharePoint user interface (default, max 5 MB is loaded into the memory for this document).
32-bit versus 64-bit WFEs
MS strongly advises to install 64-bit installations of MOSS on WFEs, unless you have a very good reason not to do this (this advise can be found in the MS best practices document "Planning and Deploying Service Pack 1 for MOSS 2007 in a multi-server environment").
As discussed, if you use WebDAV, chunking is ignored. In such cases, accessing a large file uses a lot of the available memory on the WFE. On a 32 bit WFE, the W3WP.exe process hosting a SharePoint web application will not be able to consume more than around 800MB memory. After that, you run the risk that the process is restarted automatically, thus crashing the worker process and resulting in the failure of the download. It is easy to see that multiple users working with large files can lead to lots of worker process recycles, thus potentially obliterating server performance and end user experience.
By the way, as a point of interest, directly after such a crash occurs, you actually have a higher chance of successfully downloading the document, as the freshly restarted worker process will typically have enough memory to spare.
64 bit WFEs hold up better when working with large files, because the worker processes won’t recycle, but will only use a lot of memory (and eventually the swap file will be used a lot). Of course, this could lead to a disk bottleneck and slow performance, but we prefer this situation over the other one.
Client settings
You’ve seen that there a couple of steps you can take to improve the ability to work with large files in a MOSS environment by changing the server environment. There are also some steps that need to be taken on the client. We’ll discuss them in this section.
Make sure that the size of the Internet Explorer cache (located in the Temporary Internet Files folder) of end users has a size of at least 50 MB. End users that regularly work with large files should have a cache size that is even larger. Failing to do so will increase the load on the WFEs considerably.
End users working with large files (files that are 25MB+ in size) should first save them locally (btw, Office 2007 makes this a lot easier), edit them and then upload them again to MOSS. Failing to do so increases the network traffic and memory usage on the WFEs.
Download a document by right-clicking it and choosing Save Target As (Windows functionality) instead of Send To > Download a copy (MOSS functionality). The first one performs better, since the latter one causes the complete document to be loaded in the memory of the WFE. By the way, uploading large files works best and fastest using the Explorer view.
End users shouldn’t use the Explorer view when browsing files and folders. Placing the mouse cursor over a file or folder within the Explorer view leads to the retrieval of all metadata of all files in the folder that is currently viewed. In some scenarios, browsing files via the Explorer view might even lead to the full retrieval of those files. Failing to avoid the Explorer view when browsing files and folders increases the server load considerably. Instead, you should use a SharePoint view (for instance, the All Documents view) for browsing.
As a general rule, if possible, when using the SharePoint user interface, don’t work with files larger than 25MB. When working with WebDAV (i.e. via a file share), don’t work with files larger than 10MB. Please note that this info is only based on tests performed on a couple of test environments, so you might find different results yourself. Having said that, we do believe most environments will find that end users shouldn’t work with files in MOSS that are much larger than the sizes mentioned.
Large files that are often read by end users, but seldom updated, should be kept outside MOSS and stored on a file share that is indexed by MOSS.
Monitor working with large files
If you plan to work large files, monitoring your MOSS environment should be a part of your implementation strategy. In this section, we’ll discuss what we’ve done to gain insight in our server farm when end users start working with large files.
Please note: In our analysis, we’ve focused primarily on the WFEs, since we had indications that the bottleneck was to be found there. In most cases, you should also take an extensive look at the SQL Server acting as the data repository in your MOSS farm.
Analyzing performance counters
First of all, we’ve used perfmon and created a counter log for the following performance objects:
- LogicalDisk
- Memory
- Network interface
- Paging file
- PhysicalDisk
- Process
- Processor
Use a sample interval of 5 seconds and save the output to a binary file. Then, start perfmon manually and simulate one (or more clients) working with large files. When finished, you can open the binary file at a later time using one of the following methods:
- Open a command prompt and type perfmon. This opens the Performance dialog window.
- In the toolbar, click the View Log Data (Ctrl+L, the fourth icon on the left, the one that looks like a database) button. This opens the System Monitor Properties window.
- In the Data source section, click the Log files radio button.
- Click the Add button and locate the blg file containing the performance counters you're interested in.
- Click OK.
- Click the Add button (Ctrl+I, the + sign).
- Make sure the Select counters from computer radio button is selected and choose the server you're interested in.
After that, you're ready to analyse any performance counter you want. Alternatively, you can convert the binary log file to a .cvs file and analyse that. The next procedure explains how to do this:
- Open a command prompt.
- Navigate to the folder containing the binary log file (.blg).
- Type: relog [name log file].blg -f CSV -o [name csv file].csv
At this point, you could, for instance, open the .csv file in Excel.
When you’re ready the analyze the performance counters, you should at least consider to take a look at the following information:
LogicalDisk
Check the Current Disk Queue performance counter. If it’s low, system requests don’t have to wait for disk access (for instance, for reading and writing files), indicating that the hard disk is fast enough. You can also use the LogicalDisk performance object to determine if there is enough free space left on the server hard disks.
Memory
Microsoft Office SharePoint Server 2007 servers need memory – and lots of it. Use the MS Capacity Planner tool if you need specific guidelines. In general, a WFE needs at least 4GB. You should take a look at the Available Mbytes performance counters to obtain specifics.
Network interface
If you want to work with Microsoft Office SharePoint Server 2007 and large files, you need to make sure that the server is able to process all the network traffic. Take a look at the Bytes Total/sec counter. You should also take a look at the Output Queue Length which contains outbound packets and should remain below 2. You could also check the Packed Received Errors counter to determine if any network errors have occurred. If there is a problem, you should replace the current network adapter with a faster one, or add a new one.
Paging file
Check the % Usage counter. If it’s high, this might mean a lot of swapping is going on, which can be very detrimental to the performance of your server. In such cases, you need to add more memory.
Processor
Look at the % Processor Time counter to establish how busy the server processor is. Various guidelines exist, we use the rule that if you’re using 75% or more of the processor time, consistently, things become critical and your server is too busy. In such cases, you might replace the processor with a faster one, add new processors or implement some kind of scale-out strategy.
Process
Microsoft Office SharePoint Server 2007 web applications are hosted within their own application pools, which are eventually hosted within their own worker process (an instance of w3wp.exe). So, you should definitely take a look at the % Processor Time corresponding to the various instances of w3wp.exe on your server. Also, take a look at the Private bytes counter to see how much memory is allocated per worker process. You can expect to see this number steadily climbing when the server has more difficulty processing end user requests for large files. Eventually, such behavior can result (on 32bit WFEs) in automatic recycles of the w3wp processes.
In addition to keeping track of performance counters, you might want to open a command prompt and execute iisapp.vbs to gain insight in the various application pools that run on the WFE (and the w3wp process instances that host the application pools). At a later stage, we will look at Process performance counters that are related to instances of w3wp.exe processes that host SharePoint web applications, so it might be useful to have more knowledge about the available app pools, although unfortunately you have to realize that you can’t use the output of iisapp.vbs to map a specific application pool to a specific process performance counter of a w3wp process later. Btw, we have yet to find a way to do this automatically.
Other interesting log files
To get a complete picture of the status of your server, you should also collect and analyze the following:
- Event viewer application log
- Event viewer system log
- IIS log for the MOSS web application
- HTTPERR log (if it exists)
- ULS (the LOGS folder in the 12 hive).
Conclusion
Of course, when working with large files and Microsoft Office SharePoint Server 2007, the speed of your network is very important. It may be less obvious that a slow network can also have a severe impact on the performance of your WFEs as well as the SQL Server containing your content databases. In addition, MOSS isn’t very suitable to work with large files (files larger than 25MB) and is most suitable to work with Office files (which are seldomly larger than 5MB). Be careful when working with the Explorer view, it might tax your servers more than you realize (although this will be addressed in a bug fix that has yet to be released).
If you want to work with large files in a MOSS environments, there are various factors you can optimize:
- Optimize network speed.
- Use 64-bit web front-ends.
- If you’re using 32-bit WFEs, you can influence the recycle behaviour (http://support.microsoft.com/kb/332088 ).
- Try to avoid to work with files larger than 25MB (this may be considerably less when using WebDAV) .
- Try to store large files on a file share (not in MOSS) and have them indexed by MOSS.
- Optimize the chunk size, but realize that this (because of a bug) currently does not influence anything that uses WebDAV (such as the Explorer View).
- Optimize client settings.
- Teach end users how to work with MOSS, thus preventing expensive actions (such as browsing for files using the Explorer View).
- Install the August 2008 (or whenever it will actually be released) bug fix rollup as soon as you can.
In this section, we’ve summarized the lessons we’ve learned so far when it comes to working with large files in a MOSS environment. Our conclusion is the following: if you want to do it, work lies ahead of you!