Would You Believe It?

The following article was printed in Computer Shopper, June 1992 issue (page 152). Commentary follows.

The Big Squeeze

Compression Scheme Shatters Storage Logjam

Todd Daniel believes he has found a way to revolutionize data storage as we know it.

DataFiles/16, a zip-style data-compression product released by Wider Electronic Bandwidth Technologies (WEB), allows users to compress data at a 16-to-1 ratio. That’s a major advance when you compare it with Stacker 2.0’s 2-to-1 ratio.

DataFiles/16 relies purely on mathematical algorithms; it works with almost any binary file. Because of the math involved, the ratio of compression is directly proportional to the size of the uncompressed file. In order to get the full effect of 16-to-1 compression, the original file must be at least 64K.

During a demonstration at our offices, Daniel, the company’s vice president of research and development, compressed a 2.5Mb file down to about 500 bytes, four levels of DataFiles/16. Because successive levels compress at a lower ratio as the volume of the file decreases, DataFiles/16 directly zips and unzips files to the chosen level.

After compressing a file, users can compress the data another eight times with DataFiles/16. This gives DataFiles/16 the potential to compress most files to under 1,024 bytes, whether the original file is 64K or 2.6 gigabytes. By comparison, SuperStor 2.0’s new compression technique can be performed only once.

By June, WEB plans to release its first hardware packages utilizing the same method. The two new device-driver cards will operate impeccably, compressing and decompressing data on the fly at ratios of 8-to-1 and 16-to-1, respectively.

A standard defragmentation program will optimize data arrangement, while an optional disk cache will speed access time. Both cards will come in DOS, Macintosh, and Unix versions. The DOS version is scheduled for a July release, and the company says the others will follow shortly.

The implications of WEBs data-compression technique in the communications field have yet to be calculated, but Daniel says a 16-to-1 ratio could save certain companies up to 5 percent of their storage costs. If DataFiles/16 lives up to its early promise, data compression will have taken a quantum leap forward. Β  Β β€” Jim O’Brien

Oh Really?

So much Computer Shopper. Why have you (most likely) never heard of DataFiles/16? Because it was a scam, of course. And since it wasn’t published in the April issue, it was presumably not a hoax by Computer ShopperΒ itself but rather by the company behind it.

The article perhaps highlights the terrible fate of journalists: Writing about things they don’t understand. A computer scientist, or really anyone with a passing familiarity with information theory, would immediately recognize the claims as impossible and preposterous, if enticing. The only question about the article isn’t whether the whole thing was a scam, only how many people were in on it.

DataFiles/16, like other similar scams, was most likely an attempt to defraud investors rather than scam end users. Such compression could never work, so the question is only whether the software failed to achieve anything like the claimed compression ratios, or if it did… and could never decompress to anything resembling the original files.

These days it may be more difficult to set up compression scams, but the hucksters certainly didn’t disappear, they just moved elsewhere.

This entry was posted in PC history, PC press. Bookmark the permalink.

65 Responses to Would You Believe It?

  1. zeurkous says:

    “and could never decompress to anything resembling the original files.”

    Unless the original was, say, filled with zeroes. Perhaps that’s the
    “demonstration” they pulled?

  2. Michal Necasek says:

    That’s entirely possible. The link references a BYTE article excerpt where they tested the software and found that it compressed files but couldn’t decompress them (well not without destroying the contents).

  3. rasz_pl says:

    >These days it may be more difficult to set up compression scams

    nah, scams just migrated to Video compression.
    Too many links to paste my whole slashdot post, so just gonna link to it directly: https://slashdot.org/comments.pl?sid=10717819&cid=54582687

    Best modern one I could find is V-Nova Perseus – promise 3x smaller files than h.264, but somewhat independent tests show 20% bigger files at same quality πŸ™‚ and the real kicker – Perseus is really just reencapsulated h.264 video with resize filter on top πŸ˜€ multi million dollar scam, they even scored one Sat TV network contract.

  4. Michal Necasek says:

    I stand corrected (but not surprised).

    People want to believe, and extraordinary claims rarely get extraordinary scrutiny.

  5. zeurkous says:

    Mepersonally tends to interpret statements like ‘I can compress this
    infitely’ in a not unsimilar manner to ‘I just saw the face of Jesus in
    a pancake’…

  6. zeurkous says:

    Or perhaps more like, ‘I can make pancakes with Jesus’ face on them!’.

    YMMV.

  7. Yuhong Bao says:

    https://trixter.oldskool.org/2007/08/19/the-truth-about-netopsystems/
    This feature was invented in Acrobat Reader 6.0 when modems was still common. The fun thing is that we now have 3G connections with low data caps too.

  8. Paranoid Survivor says:

    This is a magazine, IIRC, that also had articles on how to implement Huffman encoders, and LZ or LZW, somewhere in that same era. (Maybe it was after this occurred, as a penance?) I don’t remember seeing this article at the time, but I would give them a star if their intent was to fool the magazine (rather than investors).

    I have read that one trick-of-the-trade back then was to delete the original file, but be able to undelete it for decompression. And another trick was just to hide the source file, and then reveal it for decompression. It would interesting to see if Computer Shopper ever ran a retraction or correction on this particular instance.

  9. MiaM says:

    Pied Piper! πŸ˜€

    As a sidenote, the file formats generally in use in many fields by that time were really crap. Draw a simple painting, save it as BMP and be amazed about how efficient even grand old ZIP can compress your file. At that time iirc GIF were the only format that had any good compression buildt in. As I understood it, people sent (by then) gigantic amounts of uncompressed data on removable media for printing newspapers, magazines e.t.c.

  10. Richard Wells says:

    BMP files didn’t make much sense having an advanced compression; the system has to work with the file in its uncompressed state. Waste of memory having both a small copy and the working big copy and rather slow on the systems planned. If it took 5 minutes to decompress the splash screen on a 286, no one would buy that software.

    RLE compression was very common in early monochrome paint formats. Lots of compression for very little CPU. Not quite as effective a technique for color pictures.

    Plenty of trade offs in compression design. One of the best in terms of size for video files that could run on 486s was Vivoactive but that lacked index and key frames so one was stuck watching from beginning to end. When it got bought out by Real and turned in RM files, the files became about 25% larger. Even then, Real did a better job than a lot of competitors in implementing variations on the early video standards. Of course, Real killed all the benefits of an optimized video format by drowning the Player in ineptly coded advertising.

    Some of these scams had interesting concepts that were useful in special cases but were marketed to investors for general purpose applications. The one that promised incredibly small video files thanks to use of a gigantic dictionary* was probably unworkable even if every system included the necessary gigabyte plus dictionary. Might have been useful for a CD media presentation focusing on the works of Andy Warhol and that was about it.

    * One of the more common legitimate tricks. Have the compression result in multiple files or hide large amounts of information in file system metadata areas. Total size of all the stored compressed data is same or larger than with mainstream compression but the file on disk is a lot smaller.

  11. Michal Necasek says:

    Yes, I thought of Pied Piper too. But that’s fiction/satire, it’s not meant to be believed πŸ™‚

    The reality is that as Richard Wells pointed out, compression has its costs. And disk space was never that tight, people who needed lots of storage bought bigger/more disks. Where compression really became ubiquitous was BBSs and generally communications, because it very obviously saved both time and money. It often made the difference between practical/affordable and impractical/unaffordable.

    It hasn’t really changed much. Photography or audio professionals still work with uncompressed data, and just buy bigger disks. But for distribution, files get compressed.

  12. zeurkous says:

    @Miam:
    Me’s not not that familiar w/ fairy tales, but it’s a plausible
    analogy.

    Don’t forget that not compressing something can be a conscious choice,
    for simplicity or generality, as opposed to something dictated by
    unavailable processing power.

  13. zeurkous says:

    @Necasek: even so, there’s still SneakerNet πŸ™‚

    Even today, sometimes, sending something by post is the right solution.

  14. Michal Necasek says:

    The reference was to this Pied Piper. See also here.

  15. zeurkous says:

    Alright, me’ll bite. Have things degraded so much in
    {pee-cee,dumbPDA,…} land that that is considered funny?

  16. John H says:

    Wow I remember that exact article. Read as a kid hoping it wasn’t false as it was what my BBS needed :).

    Now I’m curious if lossless compression limits have mathematical proof..

  17. MiaM says:

    @Michal Necasek

    Oh, I thought that atleast photography professionals used lossless compression formats by now? (Afaik for example Canons raw CR2 files have some kind of lossless compression. They sure have different size on different pictures of the same resolution).

    By todays standard audio consumes so little disk space so it’s probably worth having uncompressed data to speed up jumping to a specific time point in a music project or similar. But for storage flac seems to be the right choice; don’t know if that is actually used by the professionals or not.

    (And today there is this big question of “who is a professional” in the fields of photography, audio e.t.c.? Would one play at a broadcast radio station make a musician count as professional? πŸ™‚ )

  18. zeurkous says:

    @MiaM:
    If you’ll forgive me for riding once of me hobby horses, mehas come to
    firmly believe that it makes more sense to treat audio samples
    (and pixels) as ‘text’ characters, space and time permitting. At least
    that way there’d be consistent handling of text, audio, and video, w/o
    the need for funny container formats.

    But then again, me’s a bit of a UNIX weenie

  19. zeurkous says:

    Hm, yeah, what’s a professional? ‘I get paid for this’?

    Or, me personal definition, ‘I take this work very seriously’?

  20. Michal Necasek says:

    Professional as in “I know what I’m doing”.

  21. Michal Necasek says:

    That doesn’t work. Processing audio or graphics in text form is not possible, it needs to be converted into binary data. So why waste the time and space on redundant conversion.

    And unless you can hear audio or see RGB images when you read ASCII text… what’s the point of storing it that way?

  22. zeurkous says:

    You’re forgetting that text is also binary data. Medoesn’t propose we
    convert the binary data to text — instead meproposes to make a/v data
    valid text by, say, applying a prefix. That would prolly require a
    variable-length code, a la UTF-8 (though mehas something simpler in
    mind), but that’s not rocket science anymore.

    Me’s been trying to ‘modernize’ UNIX for some time now. This is one
    of me proposals (think terminal handling, too).

  23. zeurkous says:

    “I know what I’m doing.”

    That’s a rather philosophical matter. How many times in life have we
    been forced to challenge our prior knowledge?

    Perhaps being capable of such challenge is one of the prerequisites of
    being a ‘professional’?

  24. zeurkous says:

    That brings me to the thought that perhaps, like security,
    professionalism is a journey, not a destination…

  25. Michal Necasek says:

    Yes, RAW files tend to have some lossless compression, but it’s almost irrelevant. The compression doesn’t buy you that much, maybe 30-40%, meaning that you don’t save even half of the storage space. A good thing for sure, but it’s not in the same ballpark as JPEG, not even close.

    For audio storage, FLAC is definitely used. For work files, not as much. Same problem as with video, the lossless compression is definitely worth it for storage and transfer, but it doesn’t do that much. Again, not the same league as MP3 or similar.

    I wouldn’t say audio consumes little disk space. Depends what you’re doing, but 1 hour of 24-bit, 192 kHz stereo audio still takes a bit of disk space.

  26. zeurkous says:

    Depends on what you call ‘little’. To us old farts (me’s been around for
    a while, too, by now) a 1G disk can seem huge, even if it’s tiny by
    today’s general standards.

    The question is, as always, would it be too much of a bother to store
    audio uncompressed? Today, in most cases, me’d have to say ‘no’ for a
    general-purpose machine.

    Then there’s filesystem or even block-level compression…

  27. Richard Wells says:

    An hour of 192kB stereo audio would require about 4GB. Minimal built-in format lossless compression or OS compression could drop that in half. Lossy MP3 at roughly the same apparent quality would be closer to 300 MB, much better for internet transfers but saving a whole incredible 4 cents of disk space.

    In terms of truly optimistic compression ratios, all the all-flash backup appliances I get make their predicated storage levels based on a 10 to 1 dedupe. Only with fresh installs on a thousand workstations can that be achieved.

  28. zeurkous says:

    Granted, we’re not entirely there yet, and uncompressed storage of
    video (as opposed to still images) is quite far off still, but the
    day will come.

    And then we’ll rejoice that we’re no longer dependent on all that
    black magic that takes up increasingly insane amounts of processor
    time, and core. Kind of like when DVD playback became possible on
    pee-cees w/o a hardware MPEG2 decoder.

    Me’s not one for hype, but the future invariably arrives sooner
    than we think.

  29. zeurkous says:

    Oh, and as for ‘appliances’, that reminds me of tape drive
    manufacturer’s claims. ‘x can store y!’ Yeah, right, how
    about y/2, or even y/3?

  30. zeurkous says:

    Or the ongoing hdd capacity scam.

  31. random lurker says:

    If you actually think that the fact that HDD manufacturers use decimal prefixes is a scam, then you are quite deluded. Get out there in the real world and see if it is actually a problem.

    (Well, it sort of is, but only because Microsoft refuses to switch to decimal prefixes πŸ˜‰

    ((To be blunt, I think this “binary prefixes are more correct because” meme is just that – an almost cargo cult-ish belief by people who think that they are privy to some awesome secret that lets them feel superior over the common masses. And I feel sorry for anybody who cannot critically and rationally process their beliefs and change their thinking if necessary.))

  32. Michal Necasek says:

    Can you buy a 4.295GB stick of RAM yet?

  33. random lurker says:

    No, but you can buy 4 GiB just fine. If you’re lazy, you can even drop the ‘i’, because everybody knows what we’re talking about.

    I maintain that there is no need to pretend a confusion exists over HDD sizes, network speeds, etc. being presented in decimal.

  34. vbdasc says:

    @rasz_pl
    Are you sure about the V-Nova Perseus being a scam, found to encapsulate x264 etc.? I was unable to find any sources supporting that (except your Slashdot post).

  35. vbdasc says:

    Oops, I meant H.264

  36. zeurkous says:

    @random lurker: It’s a scam ’cause they, *with intent to deceive*, fail
    to observe long-standing convention.

    As for the GiB/GB stuff, me’d argue that it’s a matter of taste,
    though that it wasn’t very smart of SI to attempt to redefine the
    entire computing world like that.

    But then again, like just about all committees, SI ain’t smart πŸ™‚

  37. zeurkous says:

    As for link speeds, well, me’d argue that’s a bit of a border case as
    the telco world is deeply involved.

  38. random lurker says:

    In the telco world (be it bits or bytes per second or Hz of frequency) it is absolutely, 100% of the time, decimal prefixes. There is no room for argument. The fact that you imply otherwise makes me quite sad about your condition.

    Please, do take a good look at yourself and consider the possibility that your perception is clouded by a misguided belief. The world is not out to get you, and there are enough actual scams going on that you do not need to go looking for them in places where they do not exist.

  39. Paranoid Survivor says:

    Random Lurker is correct. Storage has NEVER been sold in base 2, going back decades, excepting a couple models of Quantum hard drives in the mid/late 1980s. Anybody who wants to disprove this statement is welcome to do so, but do provide PDFs or other historical documentation.

    One of the earliest instances of the base-2 KB for storage is the Mac Finder from 1984.

  40. zeurkous says:

    Sorry? Where didmesay exactly that medidn’t accept the use of
    decimal prefices for link speeds?

    It’s a ‘border case’ only ’cause both the computing and telco worlds are
    involved, world which have historically been markedly different (and, to
    a great extent, still are).

    Whether me’s ‘sad’ or not is of course open to interpretation, but
    mehopes you’ll forgive me if mewon’t let you be the sole judge of
    that πŸ˜‰

    Oh, and you wouldn’t know the crap me’s gone through, hell, medoubts you
    you could even imagine it. But that doesn’t mean that the world is out
    to get me, no. It’s just people like you who are out to get me, people
    who have this need to tell other people how to behave, how to act, how
    to think. (In deference to Necasek me’ll refrain from posting a nasty
    flame here.)

  41. zeurkous says:

    @Paranoid Survivor: really? Then why does merecall SCSI drives often
    being labeled with the correct capacity, hm? Perhaps ’cause companies,
    as opposed to individuals, might be more inclined to sue them for being
    cheated?

    Unless the labels did not differ whereever you lived at the time, but
    that would just show how opportunistic the makers are (or at least
    were).

  42. zeurkous says:

    Oh, you’re talking about the consumer market only. Sorry, didn’t
    realize, but that’s indeed a very diff (twisted and warped, me’d
    argue, but feh) reality.

  43. Richard Wells says:

    Even in business, early drive sizes were defined with a decimal base.
    DEC RP04 disk pack: 20.48 million words.
    DEC RX-01 8″ disk drive: 256k actual capacity 256,256 bytes.
    DEC RX-02 8″ disk drive: 512k actual capacity 509,184 bytes (double density except the first track was single density)
    Purchasers were expected to read the specifications and not care about the short number marketing used.

    Binary defined capacities became more common during the 80s because it understated capacity to make up for potential bad sectors. My ST-4096 was officially rated at 80 MB, stored 84MB or about 88 million bytes. Quantum stole a march on the industry and switched back to decimal notation to make their drives a seeming better value. Flash drive makers just round the capacity to the nearest whole number; I think every 4GB drive I have has a different capacity.

  44. zeurkous says:

    RW to the rescue with some hard data!

    You’re right, of course, that it was never so clear-cut. Nevertheless,
    I still don’t agree with our random lurker that it’s all M$’ fault for
    ‘misreporting’ the capacity.

    There’s more than just M$ out there. And then there are people…

  45. Even back in the day, STAC did license their stuff, and then provide various ‘demo’ programs to see their API in action. Google “LZS221.ZIP” for one of these, which is a MS-DOS version of v2.21-86, so you can see how ‘awesome’ and fast it is.

    There was a time when STAC had their cards for PCs, networking equipment, and tape drives. Probably a whole lot of other things too. But CPU’s got faster, disks much more cheaper, and tapes are so damned slow, the processor can run far more intense compression along with encryption. Much like SSL offload we’ve been trained to be scared of compression, although it is coming back into database space, where so many DB’s are padded with fluff, and if you get a 1.5:1 ration you just greatly sped up disk access by pretty much doing nothing.

    Stacker compression is burried as MTF_LZS221 for tapes, LZS-DCP for PPP and even in PGP as IPCP-LZS

  46. zeurkous says:

    So many db’s contain nothing but fluff!

    But yeah, makes me wonder just how (not) diff M$’ clone was…

  47. Paranoid Survivor says:

    OK, the ST-4096 has been cited.

    I am reading ftp://ftp.seagate.com/techsuppt/mfm/st4096.txt which says that drive had 156,672 sectors, 512-bytes each. That would work out to 76.5MiB before FAT formatting, while the same spec sheet gives a formatted capacity of 80.2MB (with 17 sectors per track) which is clearly base-10, and unformatted capacity of 96.0MB. So that would not be evidence of binary marketing.

    I also want to point out to anybody who doesn’t know, that ‘formatting’ in this ancient context doesn’t refer to HFS or FAT. It refers to low-level formatting the media to define the tracks and sectors per track, and the interleave of the sectors. Drives now handle this internally. Computers were so slow that sectors would be stored on the drive out-of-order, so that a sector wouldn’t spin under the head until the controller would be ready to receive it. Example: 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11.

    @zeurkous: Provide a link to a product sheet or manual if you think you saw storage sold in base-2. If you look, you’ll find you are mis-remembering. Although I don’t subscribe to the theory, this may be an instance of the ‘Mandela Effect,’ where many people remember something that was never true. It is just very interesting that everybody believes a switch happened, but can never show when, how, and by who. The fault lies with any and every programmer who divided by 1024 instead of 1000 when showing directory listings in units other than bytes. Apple was doing this with the first Mac in 1984, Microsoft was not in Windows 1.01 in 1985. http://www.pcjs.org/disks/pcx86/windows/1.01/cga/

  48. zeurkous says:

    Actually, merecalls it fairly clearly. It was Seagate, 1990s and early
    2000s. SCSI drives only. Mewas living in .nl at the time.

    Memight attempt to find documentation, later.

    And medoesn’t consider /1024 a fault. Instead me’d argue it’s a matter
    of culture. And hasn’t UNIX (me’s a UNIX-weenie) always done /1024,
    anyway? Me’ll look that up, too, if and when mehas time. The source is
    available.

    Either way, most hdds have been formatted into 512-byte blocks (and more
    recently 4096-byte ones, mebelieves). Core, on the i386 et al, is mapped
    in 4096-byte pages. AFAIK m68k uses 2048, SPARC uses 8192. Treating 1K
    as 1024 is not at all unnatural in the world of computing.

    Which leaves me to suggest that, if one wants a system that both humans
    and other machines can handle well, one should adopt ternary.

  49. Richard Wells says:

    It isn’t exact but SCSI drives tended to have higher capacities than the marketing number showed. The Seagate ST32272W was sold as a 4 GB drive but had a decimal formatted capacity of 4.55 GB* with is slightly higher than the 4294967296 bytes the binary value 4 GB translates to. Conversely, the IDE Maxtor Diamond Max 85120A was sold as a 2.6 GB drive but its total formatted capacity was 2559836160 bytes. I think both drives were from 1997; other drives from other manufacturers in different years may have defined marketing capacity differently.

    * Documentation gives a total sector count of 87A25Bh which should mean 4551128576 bytes. I do not know if that includes the roughly 60 MB of spares.

  50. Paranoid Survivor says:

    On the Barracuda 9 SCSI, I notice in the manual that it mentions they can be commanded to support a sector size from something like 140 to 4096. That flexibility may factor in to how they ended up at a particular capacity. Spares have never been counted as ‘formatted’ capacity, and in unformatted capacity, it is of course up to the operator what they want to do about bad sectors. (You may remember that drives could arrive with a printout of bad locations, which were then supposed to be provided to the file system to mark as bad blocks. FAT has such a concept, but there wasn’t a way to inform it AFAIK.) Drive-managed spares are per-track, I believe, not a pool for the entire drive. Of course, it is up to the drive firmware, and they don’t usually tell.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.