Petabyte
It wasn’t that long ago that data storage devices were measured in gigabytes and terabyte hard drives seemed exotic. Now that terabytes are commonplace, the next unit of data measurement you’re likely to hear about is the petabyte.
In data storage, terms have been defined out to the yottabyte (1024 bytes), or the yobybyte (280 bytes) if you prefer binary to decimal. Having defined those terms, data storage capacities are now working through them one by one. Even PCs have gone from the kilobyte through the megabyte and are now featuring gigabyte RAM and terabyte storage.
Enterprise storage systems are starting to leave the terabyte behind, moving into petabytes and toward the exabyte stage. A petabyte (PB) is 1015 bytes of data, 1,000 terabytes (TB) or 1,000,000 gigabytes (GB).
Petabytes and Pebibytes
A petabyte is sometimes defined as a 250 bytes. This is technically incorrect. There are two different systems of measuring bytes: the Base10 SI system and the Base2 IEC system. While 250 bytes is close to 1015, it is higher and has a different term – pebibyte (PiB). Computing is based on the binary system, rather than decimal. At low levels, the SI and IEC terms are, in practical terms, interchangeable. 103 equals 1000 and 210 equals 1024 so there is only slightly more than 2% difference between the one kilobyte (1000 bytes) and one kibibyte (1024 bytes). And as storage systems are never run to within 2% of their capacity, there is no operational difference between the two.
There is greater divergence, however, as we get into higher orders of magnitude. When you get up to the scale of one petabyte, it does make a difference with a pebibyte (250) being 12.6% larger than a petabyte. Here is a full rundown of the two systems:
SI decimal prefixes
IEC binary prefixes
Percentage Difference IEC/SI
Name
Value
Name
Value
kilobyte (kB)
103
kibibyte (KiB)
210
2.4%
megabyte (MB)
106
mebibyte (MiB)
220
4.9%
gigabyte (GB)
109
gibibyte (GiB)
230
7.4%
terabyte (TB)
1012
tebibyte (TiB)
240
10.0%
petabyte (PB)
1015
pebibyte (PiB)
250
12.6%
exabyte (EB)
1018
exbibyte (EiB)
260
15.3%
zettabyte (ZB)
1021
zebibyte (ZiB)
270
18.1%
yottabyte (YB)
1024
yobibyte (YiB)
280
20.9%
When specifying capacity of a large scale system, therefore, it is important to make sure not to conflate the two different systems of counting data storage space. Also, be sure to determine exactly how a particular vendor is counting that space. In enterprise systems, raw disk space alone is not the sole determining factor. How the drives are partitioned, the sector size, the level of RAID (Redundant Array of Independent Disks) or RAIN (Redundant Array of Independent Nodes), any compression or deduplication technologies used and other factors must be considered and compared to the end use case to see how efficiently that storage capacity lines up with the usage scenarios.
Petabyte storage vendors
Petabytes per se are not sold, but there are vendors who sell massive storage systems, including:
Teradata Inc. The Teradata Database 13.10 is a data warehousing database designed to scale from fewer than 10GB up to multiple PB of data. At the top end, the database can have 186 PB of data and 4096 nodes. The company also has data appliances, private cloud services, MapReduce appliances for big data analytics, and other products and services.
IBM Scale Out Network Attached Storage (SONAS) supports up to 256 file systems. Billions of files and up to 21PB of storage in a single file system.
EMC Isilon Network Attached Storage can store up to 15PB of data and a trillion files in a single namespace.
Hewlett-Packard IBRIX X9000 Storage: Modular architecture that grows up to 16PB in a single namespace.
Hitachi NAS Platform (HNAS) works with Hitachi’s SAN storage. Each file system can have up to 256 TB and 16 million objects, with up to 16PB total usable capacity.
Panasas ActiveStor: rackmount appliances containing 20 2TB or 3TB SATA drives. Can scale up to 6.6PB and 4 billion objects per file system.
Petabyte technology
Apache Hadoop is a project developing open-source software that is commonly used for handling Big Data (http://hadoop.apache.org/).
The Petascale Data Storage Institute hosts an annual Parallel Data Storage Workshop (http://www.pdsi-scidac.org/), held in conjunction with the IEEE’s fall Supercomputing workshop.
Next steps
Looking Behind the Big Data Storage Buzz
How Facebook Is Handling All That Really Big Data
IBM Storage Buying Guide
Panasas Storage Buyer’s Guide
In-Memory Analytics Buyer’s Guide: Oracle Big Data/Exalytics Appliances vs. SAP HANA
Big Data Buyer’s Guide, Part Two: IBM, SAS, Pentaho and More
Read Also:
- Petaflop
A petaflop is the ability of a computer to do one quadrillion floating point operations per second (FLOPS). Additionally, a petaflop can be measured as one thousand teraflops. A petaflop computer requires a massive number of computers working in parallel on the same problem. Applications might include real-time nuclear magnetic resonance imaging during surgery or […]
- pharming
Similar in nature to e-mail phishing, pharming seeks to obtain personal or private (usually financial related) information through domain spoofing. Rather than being spammed with malicious and mischievous e-mail requests for you to visit spoof Web sites which appear legitimate, pharming ‘poisons’ a DNS server by infusing false information into the DNS server, resulting in […]
- phase change disk
A type of rewritable optical disk that employs the phase change recording method. Using this technique, the disk drive writes data with a laser that changes spots on the disk between amorphous and crystalline states. An optical head reads data by detecting the difference in reflected light from amorphous and crystalline spots. A medium-intensity pulse […]
- phase change memory
Abbreviated as PCM, phase change memory is a type of non-volatile memory that is much faster than the common flash memory, from 500 to 1,000 times faster, and it also uses up to one half the power. Phase change memory uses a semiconductor alloy that can be changed rapidly between an ordered, crystalline phase having […]
- Phishing
(fish´ing) (n.) The act of sending an email to a user falsely claiming to be an established legitimate enterprise in an attempt to scam the user into surrendering private information that will be used for identity theft. Phishing email will typically direct the user to visit a website where they are asked to update personal […]