Big data has become one of the most widely touted technology initiatives recently as improved analytics technologies are coupled with new sources of digital data and expanded enterprise storage capacities. As more data becomes relevant for analytics programs, however, companies need to consider how their archiving strategies will be affected.
Archival data can be a valuable source of information for analytics, but keeping information readily accessible on disk drives can complicate tiered storage strategies. To adapt archiving plans for a modern analytics environment, organizations may want to work with a managed IT services provider experienced in creating advanced storage architectures.
Part of the challenge with big data storage is that traditional archiving practices are not necessarily suited for therealities of data use in an analytics age, IT analyst George Crump noted in a recent InformationWeek column. Typically, information is archived according to age, but this approach does not reflect how data use may vary depending on type and intended purpose. Some data may be able to go into archives right away, while some very old data may retain ongoing importance for analytics.
“We need better metrics to help us decide what data should be on primary storage and what should be on archive storage,” Crump wrote. “A key criteria is going to be what data, if it needs to be accessed, will need to be delivered instantly – in other words, something that may need to be analyzed in the future. This data should probably not go to an archive no matter how old it gets since it could have a statistical probability of value.”
Making the most of the archive
As organizations consider how they might draw their archives into their big data programs, one of the most important factors to consider is query time. Archive format needs can be determined by evaluating what types of data users would want to access immediately and what types they could afford to wait a few minutes for, Crump noted. Fortunately, there are plenty of queryable archiving tools available to an organization with the right planning process, InfoWorld’s James Kobielus wrote in a recent column.
“Archives may in fact be the first database in your organization that achieves big data status, in terms of growing to petabytes and storing heterogeneous information from a wide variety of sources,” he explained. “The fact that the archive’s purpose is to persist historical data for as-needed retrieval and analysis means it needs to be optimized for fast query, search, and reporting.”
Also worth considering in retrieval time planning is what tools an organization might use to run queries on its archive, Kobielus added. Companies may even choose to create multiple archives based around the platform they plan to use to analyze the data in those archives.
An option for enterprises to consider would be EMC Isilon. One of the core challenges businesses face when first embarking on a big data analytics initiative is dealing with the increasing variety of the information being stored and analyzed. As unstructured data becomes more prevalent in analytics, organizations may want to leverage Isilon technology to ensure that their enterprise storage solution is capable of effectively storing and handling a rising amount of all types of data.
As they are developing their storage plan to turn their archives into a big data resource, organizations may want to work with a managed IT services provider such as FlexITy. With proven expertise in designing storage management solutions tailored to modern data growth challenges and analytics environments, FlexITy’s solution architects can help implement a data archiving strategy that helps businesses turn their information into an asset rather than a cost centre.