So just how new and innovative is in-memory
BI?
The surprising answer is that in-memory BI came before disk-based BI,
probably because in-memory programs are easier to write. Programmers would
rather not have to write code to constantly move active data back and forth
between disk and RAM.
The first multidimensional tool was APL, whose origins were in the
1960s, based on a book called A Programming Language published in 1962.
The first usable implementation of APL was in 1967, on the IBM 1130 mainframe.
It is ironic that IBM, which pioneered in-memory multidimensional BI in the
1960s, had to buy TM1 (through its
Cognos
acquisition) and Cognos Planning (whose origins
before Adaytum
were in an IBM APL-based product called Frango, developed in the early 1980s)
to re-enter this market segment 40 years later. Analyst, the oldest component
of Cognos Planning is still written in APL, and even the newer components use
an APL-like language.
Most other early modeling (which we would now call BI) tools also used
in-memory architectures. Indeed, I built very complex oil tax models in the
late 1970s using a now almost-forgotten in-memory financial modeling product
called FCS. Back then, only 320 KB RAM was available for both the FCS software
and the 30+ year tax models on the timesharing mainframe (which served numerous
concurrent users with much less CPU power and RAM than a modern cell phone). Of
course, we would now have to call it Software as a Service (SaaS) in-memory
performance management (PM) - except that I doubt that any of the modern BI
tools could handle the complexity of the tax rules I had to model more than 30
years ago. And don't let anyone tell you that application development is faster
in modern tools: I could respond to the frequent changes of the tax laws within
a day or two.
So why did in-memory BI go away for so
long?
It didn't. By far the most widely used tool for BI applications today is
Microsoft Excel, which has always had an in-memory architecture. So did Lotus
1-2-3, the product it defeated in the market in the 1990s, and VisiCalc before
that. And Lotus Improv was a short-lived, in-memory multidimensional
spreadsheet that would clearly be described as a BI tool if it was still on
sale today.
Of course, spreadsheets are not marketed primarily as BI tools, but
conventional BI products like Cognos PowerPlay (first released in 1990) also
started out as in-memory tools. This is a short extract of the PowerPlay review
from the original edition of The OLAP Report from 1995:
"PowerPlay is usually installed as a stand-alone PC product, with a
memory resident database loaded from pre-prepared files. It is this somewhat
simpler architectural option that makes PowerPlay so easy to deploy in large
numbers; this same architecture also limits its capacity."
Exactly the same could be said of any of the modern in-memory BI tools,
though of course that capacity limit is far higher today than in the 1990s.
Later, PowerPlay, like most other long-lived BI products, moved to a disk-based
architecture in order to handle more data. This wasn't mainly because it hit
addressable RAM limits, but simply that, by modern standards, RAM was so
expensive in the 1990s that the trade-off of speed vs capacity favored
disk-based solutions for even medium-sized applications.
However, TM1 (first released in the mid 1980s) has always been, and
remains, a pure in-memory OLAP engine, as are the several similar products
inspired by it (Alea, now Infor PM OLAP, PowerOLAP, proCube and Palo). In
fact, TM1 is not only the longest established in-memory BI product currently
available, but also the longest surviving BI product of any type - so in-memory
architectures do seem to have a sustained advantage.
Will in-memory solutions kill off disk-based
BI?
Products don't have to stick rigidly with one architecture or the other.
There is nothing to stop designers of disk-based products from also including
an in-memory option, or simply using large caches to optimize disk performance
(just as disk controllers do). Indeed, CPUs also include small, high speed
memory caches to minimize wasted cycles. However, products designed and
optimized for pure in-memory architectures will outperform disk-based products
that simply take advantage of available disk caches, as such products will
still shunt data around unnecessarily, and have redundant indexing.
For example, back in the 1990s, Holos had both in-memory and disk based
structures that could be freely mixed in a single application, and
MicroStrategy now has similar capabilities. Microsoft's new PowerPivot cubes
offer similar capabilities when promoted to Analysis
Services.
In any case, disk-based products automatically take advantage of
RAM-based disk caches, while in-memory products automatically take advantage of
(disk-based) virtual memory if they run low on real RAM. Large, low cost flash
memory - which offers non-volatile storage just like disk drives, but is much
faster - further confuses the picture. So, with today's sophisticated
computers, there is little real distinction between the two architectures, and
both will continue to co-exist, often in the same product and even in the same
application.
In other words, the venerable 40 year old in-memory BI architecture was
not defeated by the new generation disk-based BI architectures that came along
a few years later, but nor did it triumph either. One approach is not 'better'
than the other; both are useful tools in the software designer's armory. But
the growing availability of masses of low cost RAM will swing the pendulum
towards the in-memory BI direction.
The next breakthrough?
In-memory BI eliminates the traditional disk input/output bottleneck, so
to gain yet more speed, attention must switch to the next bottleneck: overloaded
CPUs. Processor clock speed acceleration has ground to a halt, because of
overheating - modern CPUs have more cores than ever, but the individual cores
are not much faster than their predecessors.
So overall CPU throughput continues to improve, but individual tasks are
no longer speeding up very much. At the very least, in-memory BI tools need to
be designed to take maximum advantage of multi-core CPUs - 12 or more cores per
CPU will soon be available, and high-end BI applications should automatically
exploit them all concurrently, so that even individual tasks should be
multi-threaded.
But most desktop computers and workstations, as well as some servers,
also include at least one other, much
faster processor: the graphics processing unit (GPU). GPU speeds tens or even
hundreds of times faster than conventional CPUs have been reported for some
scientific computations, so could exploitation of GPU accelerators be the next
BI performance optimizer? Research projects in this area have been underway for
some time, and commercial products are imminent. This could lead to some very
dramatic acceleration of calculation-intensive BI applications, possibly
allowing OLAP engines to be used for entirely new classes of business problems,
such as large econometric models.
From bi-verdict.com

