Artificial Intelligence (AI) Relies on Vast Amounts of Data.

Enterprises that take on ai projects, especially for Large language models (llms) and Generative AI (Genai)Need to Capture Large Volumes of Data for Model Training as Well as to Store outputs from AI-Enabled Systems.

That data, however, is unlikely to be in a single system or location. Customers will draw on Multiple Data Sources, Including Structured Data in Databases and often Unstructured data. Some of these information sources will be on-loves and others in the cloud.

To deal with AI's Hunger for DataSystem Architects Need to Look at Storage Across Storage Area Networks (San), Network Attached Storage (NAS), and, potentially, Object Storage.

In this article, we look at the pros and conscience of Block, File and Object Storage For AI projects and the challenges of finding the right blend for organisations.

Ai's data mountain

The current generation of ai projects are rarely, if ever, characterized by a single source of data. INTEAD, General Ai Models Draw on a Wide Range of Data, Much of it unstructuredThis includes documents, images, audio and video and computer code, to name a few.

Everything about generative ai is about understanding relationships. You have the source data stil in your unstructed data, either file or object, and your vectorized data sitting on block

Patrick Smith, Pure Storage

When it comes to training llms, the more data sources the better. But, at the same time, enterprises link llms to their own data sources, eater directly or through retrieval augmented generation (rag) That improvement and relevance of results. That data might be documents but can include enterprise applications that Hold Data in a Relational Database.

“A lot of ai is driven by unstructured data, so applications point at files, images, video, audio – all unstructured data,” Says patrick smith, FILD TECHNOLOD TECHNOLOGY OFEFCERGY EMAIA AT STOREGARE Supplier Pure Storage. “But people also look at their production datasets and want to tie them to their generative ai projects.”

This, he adds, includes Adding vectorisation to databasesWhich is commonly supported by the main real database suppliers, such as Oracle.

Nas and san

For System Architects Who Support AI Projects, This Raises the question of where the best to store data. The Simplest option would be to leave data sources as they are, but this is not all Always Possible.

This should be beccause data needs further processing, the AI ​​Application needs to be islated from production systems, or current storage systems lacked the throughput the ai app.

In addition, vectorization usually leads to large increases in data volumes – a 10 times increase is not untipical – and this puts more demands on production storage.

This means that storage needs to be flexible and needs to be able to scale, and ai project data handling requirements directions directions direction Each Stage. Training Demands Large Volumes of Raw Data, Infererance – Running The Model in Production – Might Not Require as Much Data but Needs Higher ThroughPut and Minimal Latency.

Enterprises tend to keep the bulk of their unstructed data on file access nas storage. Nas have the advantages of being relatively low cost and easy to manage and scale than alternatives Such as directed storage (DAS) or block access san storage.

Structured data is more likely to be block storage. Usually this will be on a san, although directed attached storage might be sufficient for smaller ai projects.

Here, achieving the best performance – in terms of Iops and throughput from the storage Array – Offsets the Greater Complexity of Nas. Enterprise Production Systems, Such as Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM), will use san or Das to store his data in data in data. So, in practice, data for ai is likely to be drawn data from san and nas environment.

“AI Data can be stored eater in nas or san. It's all about the way the ai tools was want or need to access the data,” Says Bruce Kornfeld, Chief Product Officer at Stormagic. “You can store ai data on a san, but ai tools won't typically read the blocks.

It is not necessarily the case that one protocol will better than the other. It depends very much on the nature of the data sources and on the output of the AI ​​System

For a primarily document or image-based AI system, nas might be fast enough. For an application such as autonomous Driving or Survelance, Systems Might Use A San or even High-Speed ​​Local Storage.

Again, Data Architects will also also need to distrighting Benefits, Especially in Training.

Enter Object Storage

This has LED some Organizations to Look at Object Storage as a way of unifying data sources for ai. Object storage is Increasingly in Use with Enterprises, and Not just in Cloud Storage-On-Premise Object Stores are Gaining Market Share Too.

Object has some advantages for ai, not least its flat structure and global name space, (relatively) low management overheads, Ease of Expantion and Low Cost.

Performance, however, have not been a strength for object storage. This has tended to make it more suited to tasks such as Archiving Than Applications that Demand low latency and high levels of data throughput.

Suppliers are working to close the performance gap, however. Pure Storage and Netapp Sell Storage Systems that Can Handle File and Object and, In Some Cases, Block Too. These include pure's flashblade, and hardware that runs netapp's ontap storage operating system. These technologies give storage manners the flexibility to use the best data formats, without creating silos tied to specific hardware.

Others, Such as Hammerspace, With Its Hyperscale Nas, AIM to Squeeze Additional Performance out of Equipment that Runs the Network File System (NFS). This, they argue, prevents bottlenecks where storage fails to keep up with data-Hungry Graphics Processing Units (GPUS).

Ticking all the boxes

But until Better-Peerming Object Storage Systems becomes more widely available, or more enterprises move to universal storage platforms, ai is likely to use nas, san, obes and eat das in Combination.

That said, the balance between the elements is likely to change the lifetime of an ai project, and as ai tools and their applications evolve.

At Pure, Smith has Seen Requests for New Hardware for Unstructured Data, While Block and Vector Database Requirements are Being Met for Most Customers on Existing Hardware.

“Everything about generative ai is about understanding relationships,” He says. “You have the source data still in your unstructed data, either file or object, and your vectorized data sitting on block.”

Leave a Reply

Your email address will not be published. Required fields are marked *