Scanning a conventional relational database of that size for the answers to a complex analytical query could take days. That's because these databases were designed primarily to process transactions, and only secondarily to process a limited variety of queries. The lingua franca of relational databases is a programming language called SQL, pronounced "sequel,'' for structured query language, and as the name implies, it is very effective if a query adheres to a certain structure.
But analytical queries are rarely defined in advance; indeed, the first question an analyst asks is rarely the right one to get the answer desired. Processing a string of ad hoc queries, without structure, forces a conventional database to perform multiple scans of all of its records, a time-consuming process. To optimize performance in such applications, data warehouses structure the data, rather than the query, with various indexing schemes, so that the system can respond rapidly to unforeseeable queries.
Data warehouses can be optimized for analytical tasks precisely because they are not called upon to process transactions, and need not maintain the absolute accuracy at any moment in time that O.L.T.P. systems must. Because the warehouse is separated from the O.L.T.P. system, complex queries do not slow transactions and security is less of a concern. Users ranging from clerical staff to programmers reach the level of access they each require with the help of a wide variety of querying tools.
"The reason you can do this safely is because a data warehouse is 'read-only,'" meaning the data can be accessed but not altered by the end user, said Aaron Zornes, a database specialist in Burlingame, Calif., with the Meta Group, a market research firm. "The end users can do whatever they want with the data and it won't screw up the production system because it's a separate database." Among Meta's clients, which include 800 of the 2,000 largest companies in the United States, 90 percent are developing data warehouses, Mr. Zornes said.
Warehouses also pull together data from multiple sources, overcoming the usual roadblock posed by a legacy of incompatible database systems. For companies that have a multitude of databases, each created with different software and running on different hardware, this is a major gain.
For HCIA Inc., a Baltimore-based healthcare information provider formed by the merger of four companies, the data warehouse was key to making the combined organization work. The component companies of HCIA had been collecting databases since the 1950's -- tracking individual patients at hundreds of hospitals, or recording the outcomes of specific prescription drugs or surgical procedures. The data were invaluable to insurers, health maintenance organizations and pharmaceutical manufacturers. But the databases were a heterogeneous mess, accessible only to highly trained individuals within the organization.
"Our goal was to make everything in our humongous databases, with millions and millions of records, operate as quickly and easily as our internal cost reports,'' said Jean Chenowyth, HCIA's senior vice president. By combining the records in one data warehouse of 2 terabytes, using Informix software, the company was able to make them available to all employees and, by dial-up service, to many customers as well.
"We were able to do more and more things for our customers and do it quicker, too,'' Ms. Chenowyth said. "It not only transformed our business, it pushed us ahead of our competitors.''
Sometimes the transformation of a process is so complete that it creates a new business. As analysts at Bear, Stearns & Company, the big financial services concern, Jonathan Raiff and Philip Millman found that performing the complex queries needed to evaluate a mortgage-backed security could take weeks of mainframe computer time. But with a Red Brick data warehouse running on an inexpensive Sun Microsystems server, the same process could be done in 10 minutes. Mr. Raiff and Mr. Millman left Bear, Stearns a couple of years ago to form the Mortgage Research Group Inc. in Jersey City, using this technology to serve investors.
"Everything we do is predicated upon the data,'' Mr. Raiff said. "It is not a decision support tool for us, it is the building blocks of the business. If someone needs to bid on a security, they need to scan several thousand loans against our database; getting back to them in two days isn't fast enough. It's not that the data warehouse makes it faster, it makes it feasible.''
Yet deploying a data warehouse is rarely quick or easy. Few installations take under six months, and many take two years or more. In many cases, the most difficult part is gathering all the data from multiple databases, and "cleansing'' or "scrubbing'' the information so that any conflicts in the recordkeeping are resolved and the data can coexist homogeneously. Although this process has been greatly simplified by data extraction tools, such as those produced by Prism Solutions Inc., it remains a common stumbling block.
"The hardest part is finding out where your data are; it's not like it's sitting on one mainframe,'' said Marianne Elkholy, Informix's director of data warehousing. "Data extraction can take 60 percent of your time.''
Once the data are gathered, access becomes the big issue. Again, this process has been simplified by the proliferation of user, rather than programmer-oriented, querying tools from companies like Business Objects, Cognos, Andyne and Brio. But matching the right tool to the right set of users is still critical. "Without understanding the business requirements, the data warehouse is not going to pay off,'' Ms. Elkholy said.
The warehouses also run the risk of overwhelming their target audiences with information, becoming part of the noise of modern corporate life. Sometimes, less can be more.
"Most companies really need not warehouses, but data deli's, data convenience stores,'' said Mr. Davenport, the University of Texas professor, who has studied the sociology of computer systems.
Actually, the term the industry prefers is "data mart,'' but there is a strong trend toward implementing multiple smaller data warehouses that can be focused very directly on key users. Organizations often build a series of data marts over time, treating them as test beds before larger deployment when they are linked via an enterprise-wide
The Huntington Bank Corporation fits into this trend. Rather than building a corporate data warehouse containing every possible piece of financial information, the $250 million Ohio-based bank set up a data mart for its general ledger system. Using Oracle's O.L.A.P. software, the data mart gets the ledger system's functional information to the bank's financial analysts and budget coordinators much more quickly than before. The bank is exploring the idea of setting up other data marts for similar focused tasks.
"Look at where there is a problem,'' said Mr. Lane of Oracle, when considering a data mart. Can you buy a prepackaged application that will handle that problem? If you can, he says, "you get an early win.'' But if you start broad-based, he added, "you can make it into an endless data-scrubbing exercise.''
Other experts disagree. The incremental approach, they say, risks leaving out the very data that might produce unexpected insights into your organization. This becomes particularly critical when deploying data mining, a new set of software tools that uses neural networks and other esoteric technologies to seek trends and patterns in the data that might evade human analysis.
"Make sure the data are there and accessible,'' said Ralph Kimball, the founder of Red Brick, who is now an independent data warehouse design consultant. "The technology has gotten good enough that we can take the base transactions and put them all into this form. At that point, you can ask any question of that data."
Another big area of contention concerns which type of database management program to use as the foundation for the warehouse. The choice is between a product designed specifically for this purpose or a standard relational database system.
Red Brick, which offers only a specialized program, is of course in the former camp. So is Sybase, alone among the major database companies; it offers Sybase IQ, a specialized product that runs alongside its standard offering, System 11.
"One size does not fit all,'' said Dennis McEvoy, Sybase's president of the enterprise business group. "Data warehousing has blessed the concept of a separate database for decision support. If you're going to do that, why compromise its performance?''
The other major database vendors, Oracle and Informix, as well as I.B.M., take the path of offering a number of specialized analysis tools that run atop their conventional relational database systems. "Most customers want to use the database they're familiar with,'' said Neil Mendelson, Oracle's director of data warehousing. "The object isn't to bring in the coolest technology; this is a business solution.''
Mr. Kimball, despite what would be his presumed allegiance to specialized products, has a more pragmatic suggestion: insist on a demonstration. "There are just a few issues: Is it simple for your users and your administrators to deal with? Does it run fast? What does it cost?'' he said. "The proof is in the pudding.''
Potentially more difficult than the technological decisions are the political and cultural issues raised by warehousing. Traditionally, the corporate database was off limits to all but a few professionals in the information science group. Although data warehouses make it safe for end users to perform their own searches, they still require service and support from I.S. The new system can be a heavy burden for these professionals.
That's if the data warehouse gets a lot of use. What if it doesn't? Again, in the traditional corporate structure, relatively few people other than statisticians and controllers actually worked with data. This is changing, since most executives at least have familiarity with spreadsheets. But many marketing and management decisions are still based more on relationships and intuition than on analytical processes.
"How do you change the culture of a company, to go from being reactive and gut-oriented to being analytical and proactive?'' asked Chris M. Grejtak, Red Brick's vice president for marketing. "How do you get to the customers? You own the data, you own the information. You know exactly what they're doing.''
The prize is worth pursuing. Mr. Schroer, of Booz-Allen, said data warehouses can be one of the most powerful tools a company employs, both for improving efficiencies between suppliers and retailers and for relationship marketing. He cites American Express's Platinum Club as a masterful use of a data warehouse to offer key customers goods and services and a sense of belonging that go far beyond holding a particular charge card.
"If you have the right information about your customer and you use it right, you can literally market your brand with these tools as effectively, or more so, than with radio, TV and other mass marketing,'' Mr. Schroer said. "Relationship marketing, using that database, can simulate what the Fuller brush man used to do with you, one-on-one.''
The down side, though, is the effort it will take to make the cultural shifts that a warehouse requires. "It's different, it's a lot of work, it's analytical and it requires numbers skills,'' Mr. Schroer said. "It takes some proof, some learning and, in some cases, new people to convince yourself that this fairly painful process is worth it.''
Most companies are somewhere in the middle of that learning curve, still finding the right new people. It will take time before they come to rely as completely on online analytical processing to shape strategy as they now rely on online transaction processing to run operations. In short, only a few have learned how to harness the power of warehouses to make their data truly strategic.
"Every single customer does some data warehousing today,'' said Robert Epstein, executive vice president and co-founder of Sybase. But most, he said, are using it to look backward, to analyze past performance. Managers are just now beginning to let warehousing come into its own, to help them drive business forward.
Of Data Warehouses And the Net
The simultaneous explosion of interest in the Internet and in data warehousing is almost entirely coincidental; either one could have existed credibly and productively without the other. But as it turns out, there is a large amount of serendipitous synergy in the coevolution of these two powerful technologies.
After decades of serving as a simple, but nuclear-war-proof communications conduit for scientists, the Internet has blossomed in the last few years with the development of the World Wide Web. With millions of potential customers "surfing'' the Web, many corporations have found this multimedia portion of the Internet a compelling place to advertise their goods, offer information about their services and even fill orders. Federal Express, for example, gives customers access to its internal package tracking system via the Web, saving untold dollars on telephone service calls.
Even when companies do not trust the Internet with their sensitive internal information, many are creating private "intranets,'' which use Web-based technology to deliver information within an organization or to key suppliers and customers. Indeed, Netscape Communications Inc., the leading Internet software producer, says more than half its revenues now come from applications that are not on the Internet. According to a report by Forrester Research, Visa expects to eliminate 2 million pieces of paper per day by letting its 19,000 member banks into its internal network.
Whether by Internet or intranet, when companies start letting customers and suppliers gain access to their corporate data, a data warehouse becomes a must. Clearly, letting outsiders, or even unmonitored employees, touch the operational database would be a recipe for disaster, inviting huge losses in performance if not outright theft of data. In addition, if untrained users are to get anything of value from a database, they need the ad hoc querying capabilities only a data warehouse can provide.
"If you think about publishing data on the Web, there's simply no way you're going to be able to control the users, to limit the kinds of queries they do,'' said Dennis McEvoy, president of the enterprise business group at Sybase, the big database software company.
Actually, until recently the technology has limited the types of queries possible, and limited the utility of Web-based data warehouses. Browsing tools like Netscape's Navigator were not designed for analytical purposes, and are really only adept at finding records that have been encoded in HTML, or hypertext markup language, the lingua franca of the Web. Most of the search engines on the Web are brute force tools, likely to fetch far too much, rather than to home in on a specific piece of data.
But these limitations are passing with the hyper-rapid development of the Internet, and as the data warehouse and analytical software producers find innovative ways to link their technology with the Web. The links are aided by the fact that both data warehouses and the Internet are based on a computer systems architecture called client/server, in which tasks are split between small desktop client computers and larger servers spread across a network.
The Arbor Software Corporation has produced a version of its Essbase program that operates within Netscape Navigator or other browsers to import records from a database to a simple, spreadsheet-like tool for online analytical processing. "The Web enables deployment to wide users, independent of client hardware,'' said Dan Druker, manager of product marketing. "There is zero marginal cost to add another user to the system, and an easy learning curve.''
Sybase has demonstrated a version of its Sybase IQ data warehouse operating over the Web. "We think ultimately all data warehouses will wind up on intranets,'' said Robert Epstein, Sybase's executive vice president. "Graphics viewers, spreadsheet viewers, text, will all work as plug-ins to browsers.''
Further advances in Internet technology will make it possible for users to adopt these plug-ins on an as-needed basis. Using the Java programming language from Sun Microsystems Inc., companies could send the appropriate analytical tool across the Internet or intranet; it would then operate in the user's client machine. While such processes are still in the demonstration stage, the rapid pace of the Internet insures they will reach deployment rapidly.
Web-based tools "will provide information to a larger set of people in the decision-making process,'' said Neil Mendelson, Oracle's director of data warehousing. "The real future of the Web is to enable everyone to get access.''
Reprint No. 96308