All about storing binary blobs in the database.

Posted on 7/7/2006 @ 9:17 PM in #Vanilla .NET by | Feedback | 1226 views

This is another one of those topics where you have to be aware of the plusses and minuses of storing binaries inside the database, or outside with only a file path reference in the database.

Here are my 2 cents on this topic -

Advantages of putting binaries in the database -

1. Never have to worry about duplicate image filenames etc.
2. Never have to worry about access permissions.
3. Never have to worry about security outside of your ASP.NET app.
4. Easy porting because it's all in one place (the db).
5. Transactions dude ... :) that is a nice nice thing !! 

Downsides of putting binaries in the database -

1. First of all consider the cost of serializing the document/image into the blob.
2. Second think about timeouts and broken streams when retreiving larger datas that aren't quite one shot.
3. Third, your larger files will go from d/b to web server as 4 kb chunks and each chunk will require handshaking - too much overhead - you decide.
4. Fourth, What really guarantees that you will never run across an image > 50KB or document > 1 MB
5. Fifth, Sql server is gonna break the image/document up so it fits inside it's small small pages, and obviously it'll have to peice it together in a select query.

If I had to, I'd say keep small binaries in the database, and large binaries outside.

And for all you non-believers, who insist that binaries in a database are *always* a bad idea, remember, sharepoint stores binaries in the db. Also, check this out (in short, terraserver.microsoft.com stored images in Sql Server - and that was a LOT of data).

IF you _HAD_ to store binaries in the database, what can you do to alleviate the bad effects (to some extent). Here is a short list -

1. Use DataReader with CommandBehavior.SequentialAccess - this way you read the big blob only if you have to.
2. Contrary to popular belief, command builder will simply ignore the binary columns - so don't count on that :-)
3. Split the table vertically using a one to one relationship, index the PK/FK, and store the binary column on a seperate tablespace on a seperate disk altogether.
4. Don't use binary columns in where clauses :-)
5. Don't use binary columns in FOR XML queries - except FOR XML EXPLICIT where you can specify the xmltext directive to put stuff in cdata.
6. Conserve Resources when writing large blobs to the database.

Anyone have anything else to add? :-)

Sound off but keep it civil:

Older comments..


On 7/7/2006 8:55:11 PM John D. said ..
the only thing I have to say is your RSS feed is now broken with tags.


On 7/7/2006 9:02:57 PM Sahil Malik said ..
Good comment. I'll fix it immediately. Tell ya what, I'm lovin' the flexibility of having my own C# code running my blog, but it's sure a lot of work :)


On 7/7/2006 9:09:18 PM Sahil Malik said ..
Okay, how'z it look now? :)

If you have any more suggestions on how to make this a kickass blog engine, please lemme know.


On 7/8/2006 7:35:48 AM Jim L. said ..
Sahil,


Would making a new filegroup and associating that blob table with that group help performance as well? (referring to #3). I'm thinking in terms of I/O.

I've never really been a database geek so I'm having to play a little catch up here.

Thanks :)


On 7/8/2006 7:44:16 AM Sahil Malik said ..
Jim,

I may be mistaken about this, but I don't think you can change the cluster size in SQL Server. In Oracle you can however. So if you could change it to something other than 4K, then yes, it would help :)

SM


On 7/12/2006 9:29:11 AM Adam Hill said ..
A small correction "...terraserver.microsoft.com *stores* images..."

Terraserver has been happily serving up multi megabytes to the WorldWind community (http://worldwind.arc.nasa.gov) for the past two and a half years.