Why More Tech Giants Opt for IBM’s Tape Data Storage in the Cloud
On the 20th anniversary of the open LTO tape standard, more companies are turning to the reinvented digital tape-in-the-cloud technology.
Remember the days of Walkman, with your favorite tape stuck inside? The era of audio cassettes may be long gone, but quietly, tape technology lives on in our digital world.
And alongside tomorrow’s quantum computers and today’s artificial intelligence, IBM continues reinventing tape — and supplying more and more tech giants like Microsoft and several others with good ol’ cartridges. Microsoft now has tape libraries deployed in at least 18 data centers around the world.
Soon, companies may start adding really futuristic tape tech; IBM, for one, is now developing the first-ever quantum computing-safe tape drive.
Once quantum computers surpass traditional computers, possibly within the next decade, they will likely be able to break the currently widely used asymetric (public/private key) encryption. With this new tech though that relies on quantum-safe algorithms that are part of the CRYSTAL (CRYptographic Suite for Algebraic latticeS) suite, your data should stay safe.
But that’s still to come.
Today, the reinvention of tape is all about blurring the borders more than ever between the physical cassette and the abstract digital space. The approach to create tape is still similar to what it was nearly a century ago, when German-Austrian engineer Fritz Pfleumer invented magnetic tape to record sound in 1928. One major difference is that back then tape was analogue. The first digital tape drive came out in 1951, invented by UNIVAC. IBM followed, releasing its own tape drive in 1952.
What we have today is still that same digital tape tech — but on steroids. It’s much faster, offers more storage space than ever before — and most of all, the 21st century tape has gone into the cloud.
For decades, tape had been the best option for backup and recovery of data, but the market started shrinking in early 2000s because of stiff competition with HDD technology combined with data deduplication techniques. Hard disk drives aren’t new, although not as old as tape: IBM made the first HDD, called the IBM Model 350 Disk File, in 1956. It was huge, with 50 24-inch disks inside a cabinet the size of a cupboard, and could store 5MB of data.
Just like the tape technology, HDD tech has kept evolving. It reached the storage capacity of 1 terabyte (TB, or 1,000GB) in 2007, then hit 16TB in 2019 for the largest commercially available HDDs — all while shrinking to just a few inches in size. A 20TB HDD is likely to be launched in 2020/21; Western Digital, one of the leading HDD manufacturers, demoed its 20TB datacenter hard drive in June 2019.
Tape to the Rescue
The world has kept producing more and more data but for years it was possible to continue to store it with the constant storage budget. But it can’t continue indefinitely — there’s a fundamental physical limit to how much bits can be squeezed.
“Data is growing even faster than in the past, because of the new kinds of AI and analytics, of all the applications that are driven by data”, says IBM physicist Mark Lantz. I meet him at a small lab at IBM Research in Zurich. The air is permeated by a low frequency humming noise of turning reels — from top to bottom, the room is filled with multiple generations of tape drives, with hundreds of cassettes either inside or neatly stacked on shelves.
The size of the bits written on tape are about 100 times larger than on HDD, meaning that it’s possible to keep scaling tape at the same rate for at least two more decades. Thanks to the so-called backwards compatibility of Linear Tape Open (LTO), which is celebrating its 20th anniversary this year, and enterprise tape drives, it’s easy to migrate old tapes to new ones.
The latest digital IBM tapes have the capacity of 20TB just like the hard disk drives. The highest aerial density of magnetic tape demonstrated to date is about 201 billion bits per square inch — so a tape the size of a typical hard drive could store about 330 terabytes of information. “We’re at the renaissance of tape technology,” says Lantz.
One key advantages of tape is that for large amounts of data, tape systems are much more cost effective to purchase and operate. They are about eight times cheaper, as one tape drive can be used with many cartridges. Operational costs are low because the power consumption of a tape system is much less and when tapes aren’t in use, they require no power at all.
Tape can also play a vital role in a company’s modern data protection plan. This ‘vintage’ technology is a very effective tool against cyberthreats. Unlike with other storage media, it’s possible to easily pull a tape cartridge offline and simply store it on a shelf, creating a physical barrier or “air gap” between hackers and your data. The air gap is a security measure critical to preventing more sophisticated ransomware and malware that could otherwise corrupt the data.
The first tech giant to start using digital tape tech was Google. While the company kept taps on their tape storage at first, the news spilled when during a software update for Gmail, about 1 percent of all of the Gmail accounts got deleted unintentionally. While Google has a redundant system of data centers around the world with multiple copies of data, the software gets replicated across all of the data centers — so the same error happened everywhere. The data was lost in all the data centers — but because Google also had a copy on tape, it was possible to rebuild the lost accounts.
But is it really the same type of tech as the cassette many of us had in our Walkman in the 1980s and 90s? Fundamentally, the technology is not very different. It all comes down to an electromagnet, magnetic material with the coil wrapped around it. It generates a magnetic field to write data onto the surface, and a sensor then reads it back again.
Over time, this technology has been shrinking, with bits of information written on the magnetic surface becoming smaller and smaller. At the same time, the data rate has been getting faster — today, the rate of a tape drive is 400 megabytes per second. It’s much faster than early tape used to be, and modern tape systems are also highly automated. Tape drives are kept in tape libraries where robots are in charge — a robotic system takes a cartridge and loads it into the drive. Automation increases reliability — no more sloppy humans dropping the reels while mounting it on. Tape is still there but people have been taken out of that loop.
Soaring to the cloud — and beyond
And then there’s the impact of the cloud. In the past, most of the data in the cloud was on hard disk. But hard disks are expensive and power-hungry, as they are constantly spinning. As they spin, they generate heat and need to be cooled, driving power costs up. But the tape just sits in a slot and doesn’t consume any power, meaning the OPEX and cost of ownership is much lower. Because of these advantages, hyperscale cloud providers are now starting to introduce tape into their infrastructure to have a low-cost storage solution, says Robert Haas, the head of the Cloud and AI Systems Research department at IBM Research in Zurich.
There is a trade-off though: with the HDD, one can access data in a few tens of milliseconds. But in a tape system, with the robot getting the cartridge, loading it into a drive and fast forwarding it to the right place — all that process takes tens of seconds at least.
For many applications — particularly backup and archival needs — that is not an issue. Data for such applications is ‘cold’ data, meaning it hasn’t been accessed in months and just sits in the cloud or in data centers. “On social media, when somebody first posts something, it’s beautiful. It’s known as hot data. But a week later, nobody looks at it and turns into cold data. As data ages, it becomes less frequently accessed,” says Lantz.
And in future, this access latency — the time it takes to get to the cold data — may be resolved anyway with the help of artificial intelligence. At IBM, researchers are working on combining tape and AI to predict accesses to data. This way, data would be brought back from tape even before a user wants to read that touching social media post from last month or watch a year-old cute puppy video.
Today, social media giants have huge data centers with hard disk drives used for backup of primary data. To make it more cost effective, the power is turned off for 94 percent of the disks at any given time, and to access a piece of data, it’s necessary to wait until the disks where it is stored are powered up. As a result, it can take several hours before a piece of data can be accessed — while with tape, it only takes tens of seconds. “Because of that, there is lot of interest to extend the adoption of tape for the backups and more generally for all archival data, creating an active archive,” says Lantz. “There’s a large demand for that in the cloud from hyperscales.”
So if you do come across an old Michael Jackson tape from 1980, don’t chuck it in the bin. Who knows what technology we will have in a decade or two that may suddenly be able to play it.