Vision

Binary Large OBjects: Externalizing BLOB storage via RBS

Written by: Margriet Bruggeman, Nikander Bruggeman.

April 30, 2010

Binary Large OBjects: Externalizing BLOB storage via RBS

Before the 2008 release of SQL Server, you could choose whether a custom application stored large pieces of data (Binary Large Objects, or BLOBs) inside the database or maintained references to BLOBs on the file system. The FILESTREAM feature of SQL Server 2008 changed all that by offering the best of both worlds.

The drawback of storing BLOBs in data columns (with an upper limit of 2GB) is that it typically results in lesser performance because streaming BLOB data from a database is not nearly as efficient as streaming BLOB data from the file system, since the file system is optimized for streaming unstructured data. The SQL Server 2008 FILESTREAM feature allows you to configure a database so that it looks as though BLOB data is stored in the database (inline with the table data), whereas in reality the BLOB data is stored externally on the file system. This leads to several good things: BLOBs still benefit from the features offered by the database server such as transaction support, SQL Server manages links to BLOBs on the file system automatically, and automatic inclusion in backup/restore scenario's, while at the same time also gaining the performance benefits offered by the file system.

SharePoint implementations have two interesting traits that make externalizing BLOB storage an extremely interesting topic:

  • SharePoint implementations contain loads of pieces of BLOBs. According to Microsoft, this typically accounts for 60-70% of the total content.
  • SharePoint isn't very good in handling large files. Although the usual Office files are seldomly larger than 10MB, you probably will encounter issues once you cross the 25MB border. Some files, such as media files, advanced drawings, and scanned documents, are way larger than that and that is a sure way to ask for trouble. It depends on a lot of factors if working with large files will result in issues, if you want to learn more we have explored this topic in detail in our blog post "Working with large files" which can be found at http://www.lcbridge.nl/vision/2008/largefiles.htm.

As a side note, high-end document management systems don't usually store documents in a database. Instead, they store them on the file system or in a custom repository that is optimized for handling large files and large amounts of files. At one time, we were hired by a company that builds a high-end document management system. During our stay, the company tried to build in support for WinFS, but unfortunately WinFS itself died a slow death (again), so they had to abandon that idea.

We guessed the SharePoint 2010 product team acknowledged these facts which lead to the introduction of the External BLOB Storage (EBS) feature in SharePoint 2007 sp1, which allowed developers to create custom external BLOB store providers that allowed the storage of BLOBs in the external repository of your choice.

In case you are wondering, EBS is deprecated in SharePoint 2010 and you don't need it anymore because SharePoint 2010 introduces the Remote BLOB Storage (RBS) feature, which is the actual topic of this section.

Note. Other than providing a historical context, there is a reason why we brought up the SQL Server 2008 FILESTREAM at the beginning of this section. As you will see later, you can use RBS in conjunction with the FILESTREAM feature.

The following components are of interest when it comes to understanding RBS:

  • The SharePoint object model interacts with SQL Server configuration and content databases.
  • RBS introduces a new SQL RBS Client Library that allows the SharePoint 2010 object model to store BLOB data externally.
  • An RBS Provider API allows storage vendors to create an external storage system for SharePoint 2010. You should note that the adoption of a provider model makes it extremely easy to switch to another provider, which ultimately comes down to choosing another repository for storing BLOBs.
  • An RBS installation requires one or more BLOB store providers implementing the Provider API. Please note that you can associate one active BLOB store provider for a specific content database.
  • Every BLOB store provider uses it's own specialized type of BLOB store. BLOB stores are only accessed through their custom providers.
  • RBS Maintainer. The RBS Mainainer runs as a Windows scheduled task that is responsible for handling maintenance tasks and doing garbage collection. It can either be installed on a web front-end (WFE) or on the database server itself.

Note. Currently, several vendors (EMC², Open Text, NetApp, AvePoint, and CommVault) are developing external storage systems for SharePoint 2010. Typically, you can expect that these systems will be very good at storing and managing large BLOBs, but will also have advanced features such as support for granular disaster recovery scenarios, BLOB immutability (which causes a BLOB image to stay the same, newer versions of the same document will be stored as separate BLOB images), expunging capabilities (which refers to really deleting BLOB data, or better said: obliterating data beyond recovery, for safety reasons), data de-duplication, and the possibility to enforce guaranteed retention and deletion policies. Such capabilities transcend BLOB storage to a whole new, high-end level. As a side note, you can also create your own BLOB storage system, but we guess you should never walk down that road unless you want to become a BLOB storage vendor yourself.

Architecturally, RBS looks like this:

If you enable RBS and store a document, the following happens:

  1. The client issues a save request.
  2. The SharePoint object model enforces business logic such as the checking of credentials, extracting metadata and so on.
  3. The SharePoint object model calls the RBS client library on the WFE and asks it to save the BLOB.
  4. The RBS client library locates the appropriate BLOB store provider.
  5. The BLOB store provider stores the BLOB in it's associated BLOB store and returns a unique BLOB id. Through the RBS client library, this id is eventually returned to the SharePoint object model.
  6. The SharePoint object model saves metadata and the BLOB id in the SharePoint content database.

As you would expect, this process is entirely transparent to the end user. At the front-end, there is no way to determine if a BLOB was stored in either a custom BLOB store or the SharePoint content database.

If you save a document and RBS is enabled, the following happens:

  1. The client issues a request to open a document.
  2. Again, the SharePoint object model enforces business logic such as the checking of credentials.
  3. The SharePoint object model calls the SharePoint content dabase and retrieves the BLOB id and associated metadata. It may do so in advanced ways, such as requesting multiple BLOB id's for multiple documents at the same time (although you can't influence such implementation details).
  4. The SharePoint object model passes the BLOB id to the RBS client library.
  5. The RBS client library calls the appropriate BLOB store provider.
  6. The BLOB store provider reads the BLOB data and returns it to the RBS client library, which returns it to the SharePoint object model, which returns it to the ultimate destination: the client.

RBS offers the following advantages:

  • Custom BLOB stores are better suited for handling BLOB data.
  • The most simple form of a BLOB store, the file system, is cheaper than SQL storage (such as SAN storage).
  • It allows you to manage document metadata and the document BLOBs themselves separately. This allows you to set up more cost effective backup/restore scenario's such as separate back up schedules (for instance, in backup/restore scenario's).
  • It may give you, depending on the BLOB store you choose, advanced storage capabilities. Please refer to the previous side note about BLOB store provider vendors for more information.
  • Better support for handling hierarchical data.

RBS is a separate downloadable component that is a part of the SQL Server 2008 R2 feature pack. In order to find it, just search for "download rbs sharepoint" in your favorite search engine to find the latest location on http://go.microsoft.com. You can also check the SQL Remote Blob Storage team blog at http://blogs.msdn.com/sqlrbs. At the time of writing, the most recent version of RBS is called RBS 2008 R2 with FILESTREAM provider, which is supported on both SQL Server 2008 and SQL Server 2008 R2. Luckily, as it's name implies, RBS is shipped with a BLOB store provider for the file system, so you don't need to purchase a 3rd party vendor BLOB store provider in order to test RBS.

Note. Do you remember that we promised that RBS was able to leverage the FILESTREAM feature of SQL Server 2008? Here it is, in order to use the BLOB store provider for the file system, you need to enable the FILESTREAM feature on SQL Server 2008. Because of this, all BLOBs are stored on the local file system of the SQL Server.

If you want to install RBS and use the file system BLOB store provider that is available out-of-the-box, you need to do the following:

  1. Make sure you are using the Enterprise Edition of SQL Server 2008.
  2. Enable the SQL Server 2008 FILESTREAM feature.
  3. Alter your SharePoint content database so that it supports the FILESTREAM feature as well.
  4. First install RBS on SQL Server.
  5. Install RBS on every WFE.
  6. Enable RBS using PowerShell.

Please note that you can also use PowerShell to migrate BLOBs that are currently stored in the SharePoint content database to the BLOB store of the RBS provider that is currently active for the content database.

Enabling the FILESTREAM feature on SQL Server 2008

You need to enable the FILESTREAM server for a specific SQL Server instance first. Please note: In this example we are using SQL Server 2008 R2. The following procedure explains how to do this:

  1. Click Start > All Programs > Administrative Tools > Microsoft SQL Server 2008 R2 > Configuration Tools > SQL Server Configuration Manager. This opens Sql Server Configuration Manager.
  2. Click on the SQL Server Services node. This opens a list of all available SQL Server services.
  3. Right-click the instance of SQL Server (MSSQLSERVER) where you need support for the FILESTREAM feature and choose Properties. This opens the SQL Server (MSSQLSERVER) Properties window.
  4. Click the FILESTREAM tab.
  5. Click the Enable FILESTREAM for Transact-SQL access checkbox.
  6. Click the Enable FILESTREAM for file I/O streaming access checkbox.
  7. If your working on a development machine, go ahead and click the Allow remote clients to have streaming access to FILESTREAM data as well.
  8. Click OK.

This doesn't finish the process. You still need to enable the FILESTREAM feature for the SQL Server instance. You can do this in two ways. The next procedure explains how to enable the FILESTREAM feature by calling a stored procedure.

  1. Open SQL Server Management Studio, select the SQL Server instance you want to manage and click Connect.
  2. Click New Query.
  3. Issue the following command: EXEC_sp_configure filestream_access_level, 2.
  4. Issue the following command: RECONFIGURE.

If you don't enable FILESTREAM features often, you are probably better of using the next method, as it's easier to remember in the long run and makes it more clear to see what you are doing. The next procedure explains how to enable the FILESTREAM feature via the user interface:

  1. Open SQL Server Management Studio, select the SQL Server instance you want to manage and click Connect.
  2. Right-click the [name of SQL Server instance] node and choose Properties.
  3. Click Advanced.
  4. In the Filestream section, set the FileStream Access Level drop down list to Full access enabled.
  5. Click OK. Aninformational message appears stating that you need to restart SQL Server if you want your configuration changes to take effect.
  6. Click OK.
  7. Right-click the SQL Server node and choose Restart.

Enable FILESTREAM for your SharePoint 2010 content database

In this section, we will discuss how to enable FILESTREAM support for your SharePoint 2010 content database. First., you will need to determine the name of the SharePoint 2010 content database associated to your site collection.

Please note: In this example, we're expecting that your site collection is only associated to a single content database. In more advanced scenarios, this may not be so. In such a case, you need to iterate through the entire collection of content databases and locate the ones you want to change. At a later stage this issue pops up again, because soon you will be enabling RBS per content database via PowerShell scripts.

  1. Click Start > All Programs > Microsoft SharePoint 2010 Products > SharePoint 2010 Management Shell. This opens the Administrator: SharePoint 2010 Management Shell.
  2. Issue the following command: $site = get-spsite "http://[URL of site collection]", for instance $site = get-spsite "http://sharepoint2010.
  3. Issue the following command: $site.ContentDatabase.

Take a note of the name of your SharePoint content database. The name of ours is WSS_Content, and we'll use this name throughout the example. Then, you need to open up a query window in SQL Server Management Studio and issue three queries. First, you need to create a unique key.

use [WSS_Content]
if not exists (select * from sys.symmetric_keys where name = N'##MS_DatabaseMasterKey##')create master key encryption by password = N'[enter a password here]'

Then, you need to create a new file group for the FILESTREAM feature by entering the following command:

use [WSS_Content]
if not exists (select groupname from sysfilegroups where groupname=N'RBSFilestreamProvider')alter database [WSS_Content]
add filegroup RBSFilestreamProvider contains filestream

Finally, you need to add a new FILESTREAM file to the file group by issuing the following command:

use [WSS_Content]
alter database [WSS_Content] add file (name = RBSFilestreamFile, filename = 'c:\Blobstore') to filegroup RBSFilestreamProvider

This causes BLOB files to be stored on the following location of SQL Server: c:\Blobstore.

Note. You can copy the commands from the TechNet article "Install and configure Remote BLOB Storage (SharePoint 2010).

Install RBS on SQL Server

You can't directly click on RBS_X64.msi and follow the instructions of the installation wizard because the wizard sets default values that are not recommended for SharePoint 2010. Instead, do the following:

  1. Click Start and locate the Command Prompt icon.
  2. Right-click the Command Prompt icon and choose Run as administrator. This opens the Administrator: Command Prompt window.
  3. Issue the following command (and replace the values for both DBNAME and DBINSTANCE with the names of your SharePoint content database and SQL Server instance name respectively): msiexec /qn /lvx* rbs_install_log.txt /i RBS_X64.msi DBNAME="WSS_Content" DBINSTANCE="DBInstanceName" ADDLOCAL="Client,Docs,Maintainer,ServerScript,FilestreamClient,FilestreamServer"

Although the command prompt returns immediately, the msi file is installed in quiet mode. If you want to be informed explicictly, change the /qn parameter to /qn+, otherwise, you need to wait a couple of minutes and check if the msiexec process is still running via Task Manager.

After installation, the folder containing the RBS binaries should contain a file called rbs_install_log.txt. The last 20 lines should include the message that "Installation completed successfully".

Install RBS on WFEs

After installing RBS on SQL Server, you will also need to install RBS on all WFEs. You can do this by issung the following command from the Administrator Command Prompt window:

msiexec /qn /lvx* rbs_install_log.txt /i RBS_X64.msi DBNAME="WSS_Content" DBINSTANCE="DBInstanceName" ADDLOCAL="Client,Docs,Maintainer,ServerScript,FilestreamClient,FilestreamServer"

After installation, the folder containing the RBS binaries should contain a file called rbs_install_log.txt. The last 20 lines should include the message that "Installation completed successfully". You may also note that the SharePoint content database now contains several new tables that are prefixed with "mssqlrbs". You can check this using the following SQL command from SQL Server Management Studio within the context of the SharePoint content database:

select * from dbo.sysobjects where name like 'mssqlrbs%'

PowerShell time: enable a provider for a content database

Choose any given WFE within the SharePoint farm and follow the next procedure to enable the file system BLOB store provider that is available by default for a given content database.

  1. Click Start > All Programs > Microsoft SharePoint 2010 Products > SharePoint 2010 Management Shell. This opens the Administrator: SharePoint 2010 Management Shell.
  2. Issue the following command: $cdb = Get-SPContentDatabase –WebApplication http:/[name of site collection]
  3. Alternatively, if your site collection is associated to multiple content databases, issue the following command: $cdb = Get-SPContentDatabase –WebApplication http://sitename
  4. Then, issue the following commands to finish the configuration: $rbss = $cdb.RemoteBlobStorageSettings
    $rbss.Enable()
    $rbss.SetActiveProviderName($rbss.GetProviderNames()[0])
    $rbss

Note. If you need more information for working with PowerShell, and working with collections in PowerShell in particular, please refer to section "PowerShell: Retrieve information about SharePoint in the fastest way possible".

At this point, the file system BLOB store provider is enabled for your SharePoint 2010 content database. There is one quite cool feature that you still need to consider. You can set the minimum BLOB storage size. This allows you to determine that all files smaller than the specified threshold are still stored in the SharePoint content database itself, but all files larger than the threshold will be stored in the BLOB store. You can do this by issuing the following commands via PowerShell:

$cdb = Get-SPContentDatabase –WebApplication http:/[name of site collection]
$cdb.RemoteBlobStorageSettings.MinimumBlobStorageSize = [number of bytes]
#cdb.Update()

If you start adding files to SharePoint that surpass this threshold, in the user interface you will notice exactly… nothing! If you want to check that the document was indeed added to the BLOB store you can open SQL Server Management Studio, open the SharePoint content database and issue the following command from the query window:

SELECT siteid, content, rbsid FROM alldocstreams WHERE rbsid IS NOT NULL

Normally, the content column contains the BLOB. For documents stored in the BLOB store, this column is NULL and the column rbsid will contain a BLOB id that the BLOB provider uses to locate the correct BLBO file. So, if the previous queries returns results, you know that you have successfully added a file to the BLOB store. Alternatively, you could also check the contents of c:\Blobstore and see if it contains any entries.

Concluding the RBS section

To conclude the discussion of RBS. In general, RBS has one minor disadvantage: it makes a solution more complex to manage, because you need to maintain two stores (SQL Server and the BLOB store) instead of one. This particularly becomes obvious in back-up/restore scenario's. In complex scenario's where you really do need the benefits offered by RBS, we don't think this should be a factor in your decisioning making.

Other possible disadvantages are completely dependent upon the specific BLOB store provider you are considering. The file system BLOB store provider that is shipped out-of-the box is only able to store BLOBs on the local file system of the SQL Server instance containing the SharePoint 2010 content dabase. This makes it unsuitable for larger deployments. We don't have a lot of performance data about the file system provider yet, but it wouldn't surprise us if it needs to mature first in order to really become useful in production environments. As it stands now, we would definitely consider using the file system provider in intranet scenario's where a small number of users working in locations close to each other are working with large files (> 25 MB) on a regular basis. In other scenario's, probably not, although we're the first ones to admit that we haven't collected enough data to back this statement up.

As for the benefits: they depend completely on the choice of BLOB store provider too, although you can safely expect that this is the part where RBS really shines. Once vendors start delivering features such as performance optimizations for large BLOBs, expunging capabilities, and retention policies, RBS can lift SharePoint 2010 to a whole new dimension. We bet that RBS will become a huge factor in the high-end market for SharePoint 2010. Other implementations can safely ignore it and live happily ever after storing BLOBs in the SharePoint 2010 content database.

« back to overview page