Storing Images in DB - Yea or Nay

Question

So I m using an app that stores images heavily in the DB  What s your outlook on this  I m more of a type to store the location in the filesystem  than store it directly in the DB   What do you think are the pros cons

User · Answer

If you need to store lots of images on the file system a couple of things to think about include    Backup and restore  How do you keep the images in sync  Filesystem performance  Depends on what you are doing and the filesystem  but you may want to implement a hashing mechanism so that you don t have a single directory with billions of files  Replication  Do you need to keep the files in sync between multiple servers

User · Answer

I would go with the file system approach  primarily due to its better flexibility  Consider that if the number of images gets huge  one database may not be able to handle it  With file system  you can simple add more file servers  assuming that you re using NFS or kind   Another advantage the file system approach has is to be able to do some fancy stuffs  such as you can use Amazon S3 as the primary storage  save the url in the database instead of file path   In case of outages happen to S3  you fall back to your file server  may be another database entry containing the file path   Some voodoo to apply to Apache or whatever web server you re using

User · Answer

Storing an image in the database still means that the image data ends up somewhere in the file system but obscured so that you cannot access it directly     ves    database integrity its easy to manage since you don t have to worry about keeping the filesystem in sync when an image is added or deleted   -ves    performance penalty -- a database lookup is usually slower that a filesystem lookup you cannot edit the image directly  crop  resize    Both methods are common and practiced  Have a look at the advantages and disadvantages  Either way  you ll have to think about how to overcome the disadvantages  Storing in database usually means tweaking database parameters and implement some kind of caching  Using filesystem requires you to find some way of keeping filesystem database in sync

User · Answer

File store   Facebook engineers had a great talk about it   One take away was to know the practical limit of files in a directory   Needle in a Haystack  Efficient Storage of Billions of Photos

User · Answer

Here s an interesting white paper on the topic     To BLOB or Not To BLOB  Large Object Storage in a Database or a Filesystem  The answer is  It depends    Certainly it would depend upon the database server and its approach to blob storage   It also depends on the type of data being stored in blobs  as well as how that data is to be accessed   Smaller sized files can be efficiently stored and delivered using the database as the storage mechanism   Larger files would probably be best stored using the file system  especially if they will be modified updated often    blob fragmentation becomes an issue in regards to performance    Here s an additional point to keep in mind   One of the reasons supporting the use of a database to store the blobs is ACID compliance   However  the approach that the testers used in the white paper   Bulk Logged option of SQL Server   which doubled SQL Server throughput  effectively changed the  D  in ACID to a  d   as the blob data was not logged with the initial writes for the transaction    Therefore  if full ACID compliance is an important requirement for your system  halve the SQL Server throughput figures for database writes when comparing file I O to database blob I O

User · Answer

Small static images  not more than a couple of megs  that are not frequently edited  should be stored in the database  This method has several benefits including easier portability  images are transferred with the database   easier backup restore  images are backed up with the database  and better scalability  a file system folder with thousands of little thumbnail files sounds like a scalability nightmare to me    Serving up images from a database is easy  just implement an http handler that serves the byte array returned from the DB server as a binary stream

User · Answer

Database for data  Filesystem for files

User · Answer

It depends on the number of images you are going to store and also their sizes  I have used databases to store images in the past and my experience has been fairly good    IMO  Pros of using database to store images are   A  You don t need FS structure to hold your images B  Database indexes perform better than FS trees when more number of items are to be stored C  Smartly tuned database perform good job at caching the query results D  Backups are simple  It also works well if you have replication set up and content is delivered from a server near to user  In such cases  explicit synchronization is not required   If your images are going to be small  say  lt  64k  and the storage engine of your db supports inline  in record  BLOBs  it improves performance further as no indirection is required  Locality of reference is achieved     Storing images may be a bad idea when you are dealing with small number of huge sized images  Another problem with storing images in db is that  metadata like creation  modification dates must handled by your application

User · Answer

The problem with storing only filepaths to images in a database is that the database s integrity can no longer be forced   If the actual image pointed to by the filepath becomes unavailable  the database unwittingly has an integrity error   Given that the images are the actual data being sought after  and that they can be managed easier  the images won t suddenly disappear  in one integrated database rather than having to interface with some kind of filesystem  if the filesystem is independently accessed  the images MIGHT suddenly  disappear    I d go for storing them directly as a BLOB or such

User · Answer

At a company where I used to work we stored 155 million images in an Oracle 8i  then 9i  database  7 5TB worth

User · Answer

In places where you MUST guarantee referential integrity and ACID compliance  storing images in the database is required   You cannot transactionaly guarantee that the image and the meta-data about that image stored in the database refer to the same file  In other words  it is impossible to guarantee that the file on the filesystem is only ever altered at the same time and in the same transaction as the metadata

User · Answer

If you are planning a public facing web site then you should not go with either option  Your should use a Content Delivery Network  CDN   There are price  scalability and speed advantages to a CDN when delivering a large amount of static content over the internet

User · Answer

File paths in the DB is definitely the way to go - I ve heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much

User · Answer

I have recently created a PHP MySQL app which stores PDFs Word files in a MySQL table  as big as 40MB per file so far    Pros    Uploaded files are replicated to backup server along with everything else  no separate backup strategy is needed  peace of mind   Setting up the web server is slightly simpler because I don t need to have an uploads  folder and tell all my applications where it is  I get to use transactions for edits to improve data integrity - I don t have to worry about orphaned and missing files   Cons    mysqldump now takes a looooong time because there is 500MB of file data in one of the tables  Overall not very memory cpu efficient when compared to filesystem   I d call my implementation a success  it takes care of backup requirements and simplifies the layout of the project  The performance is fine for the 20-30 people who use the app

User · Answer

No  due to page splits  You re essentially defining rows that can be 1KB - n MB so your database will have a lot of empty spaces in its pages which is bad for performance

User · Answer

As others have said SQL 2008 comes with a Filestream type that allows you to store a filename or identifier as a pointer in the db and automatically stores the image on your filesystem which is a great scenario   If you re on an older database  then I d say that if you re storing it as blob data  then you re really not going to get anything out of the database in the way of searching features  so it s probably best to store an address on a filesystem  and store the image that way   That way you also save space on your filesystem  as you are only going to save the exact amount of space  or even compacted space on the filesystem   Also  you could decide to save with some structure or elements that allow you to browse the raw images in your filesystem without any db hits  or transfer the files in bulk to another system  hard drive  S3 or another scenario - updating the location in your program  but keep the structure  again without much of a hit trying to bring the images out of your db when trying to increase storage   Probably  it would also allow you to throw some caching element  based on commonly hit image urls into your web engine program  so you re saving yourself there as well

User · Answer

The only reason we store images in our tables is because each table  or set of tables per range of work  is temporary and dropped at the end of the workflow   If there was any sort of long term storage we d definitely opt for storing file paths     It should also be noted that we work with a client server application internally so there s no web interface to worry about

User · Answer

Normally  I m storngly against taking the most expensive and hardest to scale part of your infrastructure  the database  and putting all load into it  On the other hand  It greatly simplifies backup strategy  especially when you have multiple web servers and need to somehow keep the data synchronized   Like most other things  It depends on the expected size and Budget

User · Answer

Another benefit of storing the images in the file system is that you don t have to do anything special to have the client cache them        unless of course the image isn t accessible via the document root  e g  authentication barrier   in which case you ll need to check the cache-control headers your code is sending

User · Answer

I m the lead developer on an enterprise document management system in which some customers store hundreds of gigabytes of documents  Terabytes in the not too distant future  We use the file system approach for many of the reasons mentioned on this page plus another  archiving    Many of our customers must conform to industry specific archival rules  such as storage to optical disk or storage in a non-proprietary format  Plus  you have the flexibility of simply adding more disks to a NAS device  If you have your files stored in your database  even with SQL Server 2008 s file stream data type  your archival options just became a whole lot narrower

User · Answer

I have worked with many digital storage systems and they all store digital objects on the file system  They tend to use a branch approach  so there will be an archive tree on the file system  often starting with year of entry e g  2009  subdirectory will be month e g  8 for August  next directory will be day e g  11 and sometimes they will use hour as well  the file will then be named with the records persistent ID  Using BLOBS has its advantages and I have heard of it being used often in the IT parts of the chemical industry for storing thousands or millions of photographs and diagrams  It can provide more granular security  a single method of backup  potentially better data integrity and improved inter media searching  Oracle has many features for this within the package they used to call Intermedia  I think it is called something else now   The file system can also have granular security provided through a system such as XACML or another XML type security object  See D Space of Fedora Object Store for examples

User · Answer

One thing you need to keep in mind is the size of your data set   I believe that Dillie-O was the only one who even remotely hit the point   If you have a small  single user  consumer app then I would say DB   I have a DVD management app that uses the file system  in Program Files at that  and it is a PIA to backup   I wish EVERY time that they would store them in a db  and let me choose where to save that file   For a larger commercial application then I would start to change my thinking   I used to work for a company that developed the county clerks information management application  We would store the images on disk  in an encoded format  to deal with FS problems with large numbers of files  based on the county assigned instrument number   This was useful on another front as the image could exist before the DB record  due to their workflow    As with most things   It depends on what you are doing

User · Answer

In my little application I have at least a million files weighing in at about 200GB at last count  All the files are sitting in an XFS file system mounted on a linux server over iscsi  The paths are stored in the database  use some kind of intelligent naming convention for your file paths and file names   IMHO  use the file system for what it was meant to do - store files  Databases generally do not offer you any advantage over a standard file system in storing binary data

User · Answer

I would go with the file system approach   As noted by a few others  most web servers are built to send images from a file path   You ll have much higher performance if you don t have to write or stream out BLOB fields from the database   Having filesystem storage for the images makes it easier to setup static pages when the content isn t changing or you want limit the load on the database

User · Answer

File system  for sure  Then you get to use all of the OS functionality to deal with these images - back ups  webserver  even just scripting batch changes using tools like imagemagic  If you store them in the DB then you ll need to write your own code to solve these problems

User · Answer

In a previous project i stored images on the filesystem  and that caused a lot of headaches with backups  replication  and the filesystem getting out of sync with the database   In my latest project i m storing images in the database  and caching them on the filesystem  and it works really well  I ve had no problems so far

User · Answer

In my current application  I m doing both   When the user identifies an image to attach to a record  I use ImageMagick to resize it to an appropriate size for display on screen  about 300x300 for my application  and store that in the database for ease of access  but then also copy the user s original file to a network share so that it s available for applications that require higher resolution  like printing     There are a couple other factors involved as well  Navision will only display BMPs  so when I resize it I also convert to BMP for storage  and the database is replicated to remote sites where it s useful to be able to display the image   Printing is only done at the head office  so I don t need to replicate the original file

User · Answer

Something nobody has mentioned is that the DB guarantees atomic actions  transactional integrity and deals with concurrency  Even referentially integrity is out of the window with a filesystem - so how do you know your file names are really still correct   If you have your images in a file-system and someone is reading the file as you re writing a new version or even deleting the file - what happens   We use blobs because they re easier to manage  backup  replication  transfer  too  They work well for us

User · Answer

If this is web-based application then there could be advantages to storing the images on a third-party storage delivery network  such as Amazon s S3 or the Nirvanix platform

User · Answer

I m not sure how much of a  real world  example this is  but I currently have an application out there that stores details for a trading card game  including the images for the cards  Granted the record count for the database is only 2851 records to date  but given the fact that certain cards have are released multiple times and have alternate artwork  it was actually more efficient sizewise to scan the  primary square  of the artwork and then dynamically generate the border and miscellaneous effects for the card when requested   The original creator of this image library created a data access class that renders the image based on the request  and it does it quite fast for viewing and individual card   This also eases deployment updates when new cards are released  instead of zipping up an entire folder of images and sending those down the pipe and ensuring the proper folder structure is created  I simply update the database and have the user download it again  This currently sizes up to 56MB  which isn t great  but I m working on an incremental update feature for future releases  In addition  there is a  no images  version of the application that allows those over dial-up to get the application without the download delay   This solution has worked great to date since the application itself is targeted as a single instance on the desktop  There is a web site where all of this data is archived for online access  but I would in no way use the same solution for this  I agree the file access would be preferable because it would scale better to the frequency and volume of requests being made for the images   Hopefully this isn t too much babble  but I saw the topic and wanted to provide some my insights from a relatively successful small medium scale application

User · Answer

Assumption  Application is web enabled web based  I m surprised no one has really mentioned this     delegate it out to others who are specialists -  use a 3rd party image file hosting provider   Store your files on a paid online service like   Amazon S3 Moso Cloud Storage   Another StackOverflow threads talking about this here   This thread explains why you should use a 3rd party hosting provider    It s so worth it  They store it efficiently  No bandwith getting uploaded from your servers to client requests  etc

User · Answer

Attempting to mimic a file system using SQL is generally a bad plan  You ultimately write less code with equal or better results if you stick with the file system for external storage

User · Answer

The word on the street is that unless you are a database vendor trying to prove that your database can do it  like  let s say Microsoft boasting about Terraserver storing a bajillion images in SQL Server  it s not a very good idea  When the alternative - storing images on file servers and paths in the database is so much easier  why bother  Blob fields are kind of like the off-road capabilities of SUVs - most people don t use them  those who do usually get in trouble  and then there are those who do  but only for the fun of it

User · Answer

Images on a file store are the best bet  and supplement this with storing the meta data in a database   From a web server perspective  the fast way to serve stuff up is to point to it directly   If it s in the database - ala Sharepoint - you have the overhead of ADO Net to pull it out  stream it  etc     Documentum - while bloated and complicated - has it right in that the files are out on the share and available for you to determine how to store them - disk on the server  SAN  NAS  whatever   The Documentum strategy is to store the files a tree structure by encoding the folders and file names according to their primary key in the DB   The DB becomes the resource for knowing what files are what and for enforcing security   For high volume systems this type of approach is a good way to go   Also consider this when dealing with metadata   should you ever need to update the attributes of your meta data corpus  the DB is your friend as you can quickly perform the updates with SQL   With other tagging systems you do not have the easy data manipulation tools at hand

User · Answer

As with most issues  it s not as simple as it sounds  There are cases where it would make sense to store the images in the database     You are storing images that are changing dynamically  say invoices and you wanted to get an invoice as it was on 1 Jan 2007  The government wants you to maintain 6 years of history Images stored in the database do not require a different backup strategy  Images stored on filesystem do It is easier to control access to the images if they are in a database  Idle admins can access any folder on disk  It takes a really determined admin to go snooping in a database to extract the images   On the other hand there are problems associated   Require additional code to extract and stream the images  Latency may be     slower than direct file access     Heavier load on the database server

User · Answer

I would go with the file system approach  No need to create or maintain a DB with images  it will save you some major headaches in the long run

User · Answer

I will go for both solution  I mean   I will develop a litle component  EJB  that store the images in a DB plus the path of this image into the server  This DB only will be updated if we have a new image or the original image it s updated  Then I will also store the path in the business DB   From an application point of view  I will always user the file system  retrieving the path from th business DB  and by this way we will fix the backup issue  and also avoid possible performance issues    The only weakness is that we will store the same image 2 times   the good point is that the memory is cheap  come on

User · Answer

I d almost never store them in the DB  The best approach is usually to store your images in a path controlled by a central configuration variable and name the images according to the DB table and primary key  if possible   This gives you the following advantages    Move your images to another partition or server just by updating the global config  Find the record matching the image by searching on its primary key  Your images are accessable to processing tools like imagemagick  In web-apps your images can be handled by your webserver directly  saving processing   CMS tools and web languages like Coldfusion can handle uploading natively

User · Answer

If you re not on SQL Server 2008 and you have some solid reasons for putting specific image files in the database  then you could take the  both  approach and use the file system as a temporary cache and use the database as the master repository    For example  your business logic can check if an image file exists on disc before serving it up  retrieving from the database when necessary  This buys you the capability of multiple web servers and fewer sync issues

User · Answer

The trick here is to not become a zealot    One thing to note here is that no one in the pro file system camp has listed a particular file system  Does this mean that everything from FAT16 to ZFS handily beats every database   No    The truth is that many databases beat many files systems  even when we re only talking about raw speed   The correct course of action is to make the right decision for your precise scenario  and to do that  you ll need some numbers and some use case estimates

User · Answer

I prefer to store image paths in the DB and images on the filesystem  with rsync between servers to keep everything reasonably current    However  some of the content-management-system stuff I do needs the images in the CMS for several reasons- visibility control  so the asset is held back until the press release goes out   versioning  reformatting  some CMS s will dynamically resize for thumbnails  and ease of use for linking the images into the WYSIWYG pages   So the rule of thumb for me is to always stash application stuff on the filesystem  unless it s CMS driven

User · Answer

If you are on Teradata  then Teradata Developer Exchange has a detailed article on loading and retrieving lobs and blobs    http   developer teradata com applications articles large-objects-part-1-loading

User · Answer

I would personally store the large data outside of the database   Pros  Stores everything in one please  easy access to data files  easy baskup Cons  Decreases database performance  many page splits  possible database coruption

User · Answer

I once worked on an image processing application  We stored the uploaded images in a directory that was something like  images  today s date   id number   But we also extracted the metadata  exif data  from the images and stored that in the database  along with a timestamp and such

User · Answer

Second the recommendation on file paths  I ve worked on a couple of projects that needed to manage large-ish asset collections  and any attempts to store things directly in the DB resulted in pain and frustration long-term   The only real  pro  I can think of regarding storing them in the DB is the potential for easy of individual image assets  If there are no file paths to use  and all images are streamed straight out of the DB  there s no danger of a user finding files they shouldn t have access to   That seems like it would be better solved with an intermediary script pulling data from a web-inaccessible file store  though  So the DB storage isn t REALLY necessary

User · Answer

Pulling loads of binary data out of your DB over the wire is going to cause huge latency issues and won t scale well   Store paths in the DB and let your webserver take the load - it s what it was designed for

User · Answer

SQL Server 2008 offers a solution that has the best of both worlds   The filestream data type   Manage it like a regular table and have the performance of the file system

User · Answer

In my experience  sometimes the simplest solution is to name the images according to the primary key  So it s easy to find the image that belongs to a particular record  and vice versa  But at the same time you re not storing anything about the image in the database

User · Answer

Your web-server  I m assuming you are using one  is designed to handle images while a database is not  Thus I would vote heavily on the nay side   Store just the path  and maybe file info too  in the database

User · Answer

We have implemented a document imaging system that stores all it s images in SQL2005 blob fields   There are several hundred GB at the moment and we are seeing excellent response times and little or no performance degradation  In addition  fr regulatory compliance  we have a middleware layer that archives newly posted documents to an optical jukebox system which exposes them as a standard NTFS file system     We ve been very pleased with the results  particularly with respect to    Ease of Replication and Backup Ability to easily implement a document versioning system

User · Answer

One thing that I haven t seen anyone mention yet but is definitely worth noting is that there are issues associated with storing large amounts of images in most filesystems too   For example if you take the approach mentioned above and name each image file after the primary key  on most filesystems you will run into issues if you try to put all of the images in one big directory once you reach a very large number of images  e g  in the hundreds of thousands or millions    Once common solution to this is to hash them out into a balanced tree of subdirectories

User · Answer

I m in charge of some applications that manage many TB of images   We ve found that storing file paths in the database to be best   There are a couple of issues    database storage is usually more expensive than file system storage you can super-accelerate file system access with standard off the shelf products   for example  many web servers use the operating system s sendfile   system call to asynchronously send a file directly from the file system to the network interface  Images stored in a database don t benefit from this optimization   things like web servers  etc  need no special coding or processing to access images in the file system databases win out where transactional integrity between the image and metadata are important    it is more complex to manage integrity between db metadata and file system data it is difficult  within the context of a web application  to guarantee data has been flushed to disk on the filesystem

User · Answer

As someone mentioned already   it depends   If storage in a database is supposed to be a 1-to-1 fancy replacement for filesystem  it may not be quite a best option   However  if a database backend will provide additional values  not only a serialization and storage of a blob  then it may make a real sense   You may take a look at WKT Raster which is a project aiming at developing raster support in PostGIS which in turn serves as a geospatial extension for PostgreSQL database system  Idea behind the WKT Raster is not only to define a format for raster serialization and storage  using PostgreSQL system   but  what s much more important than storage  is to specify database-side efficient image processing accessible from SQL  Long story short  the idea is to move the operational weight from client to database backend  so it take places as close to storage itself as possible  The WKT Raster  as PostGIS  is dedicate to applications of specific domain  GIS   For more complete overview  check the website and presentation  PDF  of the system

User · Answer

For a large number of small images  the database might be better   I had an application with many small thumbnails  2Kb each   When I put them on the filesystem  they each consumed 8kb  due to the filesystem s blocksize  A 400  increase in space   See this post for more information on block size   What is the block size of the iphone filesystem

User · Answer

This might be a bit of a long shot  but if you re using  or planning on using  SQL Server 2008 I d recommend having a look at the new FileStream data type   FileStream solves most of the problems around storing the files in the DB    The Blobs are actually stored as files in a folder  The Blobs can be accessed using either a database connection or over the filesystem  Backups are integrated  Migration  just works     However SQL s  Transparent Data Encryption  does not encrypt FileStream objects  so if that is a consideration  you may be better off just storing them as varbinary   From the MSDN Article      Transact-SQL statements can insert  update  query  search  and back up FILESTREAM data  Win32 file system interfaces provide streaming access to the data    FILESTREAM uses the NT system cache for caching file data  This helps reduce any effect that FILESTREAM data might have on Database Engine performance  The SQL Server buffer pool is not used  therefore  this memory is available for query processing

User · Answer

Im my experience I had to manage both situations  images stored in database and images on the file system with path stored in db   The first solution  images in database  is somewhat  cleaner  as your data access layer will have to deal only with database objects  but this is good only when you have to deal with low numbers    Obviously database access performance when you deal with binary large objects is degrading  and the database dimensions will grow a lot  causing again performance loss    and normally database space is much more expensive than file system space   On the other hand having large binary objects stored in file system will cause you to have backup plans that have to consider both database and file system  and this can be an issue for some systems    Another reason to go for file system is when you have to share your images data  or sounds  video  whatever  with third party access  in this days I m developing a web app that uses images that have to be accessed from  outside  my web farm in such a way that a database access to retrieve binary data is simply impossible  So sometimes there are also design considerations that will drive you to a choice   Consider also  when making this choice  if you have to deal with permission and authentication when accessing binary objects  these requisites normally can be solved in an easier way when data are stored in db

[database] Storing Images in DB - Yea or Nay?

Examples related to database

Examples related to image

Examples related to theory

Examples related to storage

Examples related to blob