The data warehouse is now coming into its own as a management system driving major changes in business practices. What sets a data warehouse apart from other databases is that the information it contains is not used for operational purposes, but for analytical tasks - everything from identifying new market segments to corporate brainstorming. Its adoption has been spurred by the use of standardized hardware and software, putting the average cost below $3 million and, from its earliest applications in the consumer packaged goods and retail industries, it is now spreading into telecommunications, banking and other rapidly-changing business sectors.
As author Lawrence Fisher explains, companies are finding myriad uses for the data warehouse. Some use it to build relationships with their most important customers, by aggregating information about individual and group buying patterns. Some use it to rationalize inventory and supply, to the extent of driving production cycles at their key suppliers. Still others have discovered that the timely access to complex data can be a new business in itself.
However, running a data warehouse is neither a simple nor a predictable proposition, Mr. Fisher points out, and deploying it in the first place is rarely quick or easy. Few installations take less than six months, and many take two years or more. Another problematical area is over the type of database management program to use - a product designed specifically for the purpose or a standard relational database system. And potentially more difficult than the technological decisions are the political and cultural issues raised by warehousing. Harnessing the power of warehouses to make them truly strategic tools will require time and internal change, but most companies are finding the effort worthwhile.
When market needs and technological progress converge, they drive major changes in business practices. In the past two years, the adoption of data warehouses has helped many companies respond to an ever-shifting competitive landscape. In some cases, the effect has been quite dramatic, resulting in the wholesale transformation of business models or even the creation of entirely new enterprises.
Simply put, a data warehouse is just another database. What sets it apart is that the information it contains is not used for operational purposes, but rather for analytical tasks -- everything from identifying new market segments to corporate brainstorming. It is not a new device; the first decision-support systems, as they were then known, appeared in the early 1970's. But those systems were fiercely expensive, difficult to use and narrowly deployed. And most industries were more stable then, leaving companies with little incentive to pour resources into a system whose main purpose was to improve understanding.
There is plenty of incentive now. Deregulation, consolidation and the relentless globalization of competition have combined to put intense pressure on managers to better understand their businesses and their customers. At the same time, sweeping technological advances have reduced the cost of implementing a data warehouse to a tenth or less of the expense of the old days, while vastly increasing its ease of use.
For those reasons, most large companies have installed data warehouses, or are in the process of doing so. And even though the transformative power of this management tool has only begun to be felt, companies that have taken an aggressive approach to developing its potential are finding plenty of ways to make the warehouse pay off. Some use it to build relationships with their most important customers, by aggregating information about individual and group buying patterns. Some use it to rationalize inventory and supply, to the extent of driving production cycles at their key suppliers. Still others have discovered that the timely access to complex data can be a new business in itself.
"If you look at the uses of all our technology, I put this at the top of the list in business value,'' said Ray Lane, president of the Oracle Corporation, the leading producer of database software. "Usually our technology is used to improve efficiency, but the data warehouse is, without question, a top line builder.''
Consider one success story, that of Longs Drug Stores, a retail chain based in Walnut Creek, Calif.
Prior to installing a data warehouse, category management was extremely difficult for Longs. With a decentralized business model, in which individual store managers served as buyers, the chain as a whole lacked data on specific product sales, making it impossible to measure the success of a promotion in a timely way. Now, all Longs stores feed information nightly to a corporate data warehouse that runs on programming supplied by Red Brick Systems Inc., a pioneering software maker in the field. That means promotions can be measured, and altered, on a daily basis.
"Before this, it would not have been overstated to say we didn't know what was sold, as a corporate entity, although our store managers knew,'' said Brian Kilcourse, Longs' vice president and chief information officer. "We weren't getting any economies of scale.'' But since the warehouse opened, "our stores have the ability to see the impact of their merchandising decisions within a day,'' he said. Longs is now working on an expanded warehouse that will combine its internal data with syndicated information from suppliers like A.C. Nielsen for a broader perspective.
Still, despite all the technological gains, running a data warehouse is neither a simple nor a predictable proposition. And as much as costs have come down, a lot more than pocket change is needed to make it work.
"It's easy enough to create these things and put all kinds of stuff in, but it's harder to get it out,'' said Thomas H. Davenport, a professor at the University of Texas at Austin who is the director of the school's Information Systems Management Program. "The term itself, data warehouse, illustrates what goes on; it's not a user or customer-friendly environment.''
Data warehouses represent the latest great paradigm of database management. The earliest data management systems were hierarchical, run on massive mainframes, and were used primarily for archival purposes. The first big change came in the early 1980's, with the adoption of relational database systems, which have primarily operational applications. These systems, typically run on minicomputers, are used for online transaction processing, or O.L.T.P., to operate networks of automated teller machines, for example. Now come data warehouses, commonly run on client/server networks of personal computers and more powerful server machines. These latest systems are used for online analytical processing, or O.L.A.P., an essentially strategic application.
Put another way, traditional database systems are good at recording and reporting what happened. A data warehouse shows why. When business managers gain this understanding, often for the first time, the returns can be exponential.
Indeed, a recent study of 45 major companies by the International Data Corporation found an average three-year return on investment in data warehouse systems of 401 percent.
Impressive though that number is, perhaps more instructive is the very wide range of returns reported by the companies, from a gain of 16,000 percent to minus 1,857 percent.
Although International Data said the negative results could be explained by unusually expensive systems, low usage or large projects requiring longer than three years for payback, the stark difference in results shows that managers must proceed with care. The simple fact is that not every warehouse bears fruit.
That is because setting one up involves a host of ticklish technological and cultural issues. The technological questions alone can make a manager's head spin: Should the warehouse be built on a traditional relational database, which is more familiar to information systems staff, or on a specialized foundation, which may offer higher performance for analytical tasks? Should you make or buy such software helpers as data extraction tools, query tools and user interfaces? Should the system be broadly deployed or confined to specific business processes?
On the cultural side, data warehouses push marketing and other management professionals into new roles, and often force changes in the relationships between vendors and buyers.
For the technologist, "it's not only uncharted territory, it's uncomfortable territory,'' said Larry Greenfield, president of LGI Systems Inc., a data warehouse consulting firm based in Northfield, Ill. "There's a lot of ambiguity in the roles of the information systems people when you start giving end users data warehouse tools. Are you a developer, a cheerleader or a tutor? It's a bit of a puzzle.''
For the business manager, "the upfront cost for developing the data warehouse can be huge, with an unknown return,'' said Jim Schroer, a lead partner in Booz-Allen & Hamilton's consumer products and retail group. And when two organizations must cooperate on a warehouse project, "there's a lot of fear that the benefits will accrue to the other party.''
It sounds self-evident, but it's worth noting that data warehouse returns will vary depending on the purpose for which a system was created. The warehouses can be used to improve the effectiveness of an existing business process, to change a process or to foster a new process. In many cases, traditional measures such as return on investment may not be meaningful.
The earliest adopters of data warehouse technology were in consumer packaged goods and retail, driven by the narrow profit margins and fierce competition that have always characterized those businesses. Companies like Procter & Gamble, Wal-Mart and PepsiCo's Frito-Lay began implementing data warehouses before the term existed, often with highly specialized systems costing in the tens of millions of dollars. Now that data warehouses can be built with standard hardware and software, at an average cost below $3 million, their use is spreading to many more packaged goods and retail organizations. The most common use, as at the Longs drug store chain, is in category management, which is the fine-tuning of product placement, purchasing and promotion to maximize sales and profits.
Retailers like Longs have always had to manage myriad products in a highly competitive environment, but that's a relatively recent phenomenon for industries like telecommunications and financial services. Telco's now want to sell Internet access or paging along with basic long-distance services, and banks want to augment the meager returns on checking or passbook accounts by offering mutual funds and financial planning. Both groups need to work harder than ever to retain customers. To better understand their customers, and to pitch the right additional products to the most receptive potential buyers, these businesses are rapidly turning to data warehouses.
"When customers buy more than one product, the churn rate goes way down,'' said Lance Boxer, senior vice president and chief information officer at the MCI Communications Corporation, the second-largest long-distance provider, which began building a data warehouse in 1993 using software from the Informix Corporation. "We keep very detailed records on our buyers, so that we can sell them contiguous products,'' Mr. Boxer said. Thus, if analysis shows that customers with pagers are most likely also to want Internet service, MCI's sales representatives will concentrate calls on those buyers.
The system also allows MCI to respond quickly to competitive threats, according to Mr. Boxer. "If AT&T one morning were to announce a campaign for international callers, we could go in very quickly and start scoring people in our database who were likely candidates, and the marketing people could put together a response very quickly,'' he said. The data warehouse lets MCI avoid pitching customers who change carriers often, and who thus generate little profit, and instead target those likely to buy special services, who generate a lot. The system stores nearly 3 terabytes of data, or 3 billion bytes, enough to hold more than 100 million customer records.
Scanning a conventional relational database of that size for the answers to a complex analytical query could take days. That's because these databases were designed primarily to process transactions, and only secondarily to process a limited variety of queries. The lingua franca of relational databases is a programming language called SQL, pronounced "sequel,'' for structured query language, and as the name implies, it is very effective if a query adheres to a certain structure.
But analytical queries are rarely defined in advance; indeed, the first question an analyst asks is rarely the right one to get the answer desired. Processing a string of ad hoc queries, without structure, forces a conventional database to perform multiple scans of all of its records, a time-consuming process. To optimize performance in such applications, data warehouses structure the data, rather than the query, with various indexing schemes, so that the system can respond rapidly to unforeseeable queries.
Data warehouses can be optimized for analytical tasks precisely because they are not called upon to process transactions, and need not maintain the absolute accuracy at any moment in time that O.L.T.P. systems must. Because the warehouse is separated from the O.L.T.P. system, complex queries do not slow transactions and security is less of a concern. Users ranging from clerical staff to programmers reach the level of access they each require with the help of a wide variety of querying tools.
"The reason you can do this safely is because a data warehouse is 'read-only,'" meaning the data can be accessed but not altered by the end user, said Aaron Zornes, a database specialist in Burlingame, Calif., with the Meta Group, a market research firm. "The end users can do whatever they want with the data and it won't screw up the production system because it's a separate database." Among Meta's clients, which include 800 of the 2,000 largest companies in the United States, 90 percent are developing data warehouses, Mr. Zornes said.
Warehouses also pull together data from multiple sources, overcoming the usual roadblock posed by a legacy of incompatible database systems. For companies that have a multitude of databases, each created with different software and running on different hardware, this is a major gain.
For HCIA Inc., a Baltimore-based healthcare information provider formed by the merger of four companies, the data warehouse was key to making the combined organization work. The component companies of HCIA had been collecting databases since the 1950's -- tracking individual patients at hundreds of hospitals, or recording the outcomes of specific prescription drugs or surgical procedures. The data were invaluable to insurers, health maintenance organizations and pharmaceutical manufacturers. But the databases were a heterogeneous mess, accessible only to highly trained individuals within the organization.
"Our goal was to make everything in our humongous databases, with millions and millions of records, operate as quickly and easily as our internal cost reports,'' said Jean Chenowyth, HCIA's senior vice president. By combining the records in one data warehouse of 2 terabytes, using Informix software, the company was able to make them available to all employees and, by dial-up service, to many customers as well.
"We were able to do more and more things for our customers and do it quicker, too,'' Ms. Chenowyth said. "It not only transformed our business, it pushed us ahead of our competitors.''
Sometimes the transformation of a process is so complete that it creates a new business. As analysts at Bear, Stearns & Company, the big financial services concern, Jonathan Raiff and Philip Millman found that performing the complex queries needed to evaluate a mortgage-backed security could take weeks of mainframe computer time. But with a Red Brick data warehouse running on an inexpensive Sun Microsystems server, the same process could be done in 10 minutes. Mr. Raiff and Mr. Millman left Bear, Stearns a couple of years ago to form the Mortgage Research Group Inc. in Jersey City, using this technology to serve investors.
"Everything we do is predicated upon the data,'' Mr. Raiff said. "It is not a decision support tool for us, it is the building blocks of the business. If someone needs to bid on a security, they need to scan several thousand loans against our database; getting back to them in two days isn't fast enough. It's not that the data warehouse makes it faster, it makes it feasible.''
Yet deploying a data warehouse is rarely quick or easy. Few installations take under six months, and many take two years or more. In many cases, the most difficult part is gathering all the data from multiple databases, and "cleansing'' or "scrubbing'' the information so that any conflicts in the recordkeeping are resolved and the data can coexist homogeneously. Although this process has been greatly simplified by data extraction tools, such as those produced by Prism Solutions Inc., it remains a common stumbling block.
"The hardest part is finding out where your data are; it's not like it's sitting on one mainframe,'' said Marianne Elkholy, Informix's director of data warehousing. "Data extraction can take 60 percent of your time.''
Once the data are gathered, access becomes the big issue. Again, this process has been simplified by the proliferation of user, rather than programmer-oriented, querying tools from companies like Business Objects, Cognos, Andyne and Brio. But matching the right tool to the right set of users is still critical. "Without understanding the business requirements, the data warehouse is not going to pay off,'' Ms. Elkholy said.
The warehouses also run the risk of overwhelming their target audiences with information, becoming part of the noise of modern corporate life. Sometimes, less can be more.
"Most companies really need not warehouses, but data deli's, data convenience stores,'' said Mr. Davenport, the University of Texas professor, who has studied the sociology of computer systems.
Actually, the term the industry prefers is "data mart,'' but there is a strong trend toward implementing multiple smaller data warehouses that can be focused very directly on key users. Organizations often build a series of data marts over time, treating them as test beds before larger deployment when they are linked via an enterprise-wide
The Huntington Bank Corporation fits into this trend. Rather than building a corporate data warehouse containing every possible piece of financial information, the $250 million Ohio-based bank set up a data mart for its general ledger system. Using Oracle's O.L.A.P. software, the data mart gets the ledger system's functional information to the bank's financial analysts and budget coordinators much more quickly than before. The bank is exploring the idea of setting up other data marts for similar focused tasks.
"Look at where there is a problem,'' said Mr. Lane of Oracle, when considering a data mart. Can you buy a prepackaged application that will handle that problem? If you can, he says, "you get an early win.'' But if you start broad-based, he added, "you can make it into an endless data-scrubbing exercise.''
Other experts disagree. The incremental approach, they say, risks leaving out the very data that might produce unexpected insights into your organization. This becomes particularly critical when deploying data mining, a new set of software tools that uses neural networks and other esoteric technologies to seek trends and patterns in the data that might evade human analysis.
"Make sure the data are there and accessible,'' said Ralph Kimball, the founder of Red Brick, who is now an independent data warehouse design consultant. "The technology has gotten good enough that we can take the base transactions and put them all into this form. At that point, you can ask any question of that data."
Another big area of contention concerns which type of database management program to use as the foundation for the warehouse. The choice is between a product designed specifically for this purpose or a standard relational database system.
Red Brick, which offers only a specialized program, is of course in the former camp. So is Sybase, alone among the major database companies; it offers Sybase IQ, a specialized product that runs alongside its standard offering, System 11.
"One size does not fit all,'' said Dennis McEvoy, Sybase's president of the enterprise business group. "Data warehousing has blessed the concept of a separate database for decision support. If you're going to do that, why compromise its performance?''
The other major database vendors, Oracle and Informix, as well as I.B.M., take the path of offering a number of specialized analysis tools that run atop their conventional relational database systems. "Most customers want to use the database they're familiar with,'' said Neil Mendelson, Oracle's director of data warehousing. "The object isn't to bring in the coolest technology; this is a business solution.''
Mr. Kimball, despite what would be his presumed allegiance to specialized products, has a more pragmatic suggestion: insist on a demonstration. "There are just a few issues: Is it simple for your users and your administrators to deal with? Does it run fast? What does it cost?'' he said. "The proof is in the pudding.''
Potentially more difficult than the technological decisions are the political and cultural issues raised by warehousing. Traditionally, the corporate database was off limits to all but a few professionals in the information science group. Although data warehouses make it safe for end users to perform their own searches, they still require service and support from I.S. The new system can be a heavy burden for these professionals.
That's if the data warehouse gets a lot of use. What if it doesn't? Again, in the traditional corporate structure, relatively few people other than statisticians and controllers actually worked with data. This is changing, since most executives at least have familiarity with spreadsheets. But many marketing and management decisions are still based more on relationships and intuition than on analytical processes.
"How do you change the culture of a company, to go from being reactive and gut-oriented to being analytical and proactive?'' asked Chris M. Grejtak, Red Brick's vice president for marketing. "How do you get to the customers? You own the data, you own the information. You know exactly what they're doing.''
The prize is worth pursuing. Mr. Schroer, of Booz-Allen, said data warehouses can be one of the most powerful tools a company employs, both for improving efficiencies between suppliers and retailers and for relationship marketing. He cites American Express's Platinum Club as a masterful use of a data warehouse to offer key customers goods and services and a sense of belonging that go far beyond holding a particular charge card.
"If you have the right information about your customer and you use it right, you can literally market your brand with these tools as effectively, or more so, than with radio, TV and other mass marketing,'' Mr. Schroer said. "Relationship marketing, using that database, can simulate what the Fuller brush man used to do with you, one-on-one.''
The down side, though, is the effort it will take to make the cultural shifts that a warehouse requires. "It's different, it's a lot of work, it's analytical and it requires numbers skills,'' Mr. Schroer said. "It takes some proof, some learning and, in some cases, new people to convince yourself that this fairly painful process is worth it.''
Most companies are somewhere in the middle of that learning curve, still finding the right new people. It will take time before they come to rely as completely on online analytical processing to shape strategy as they now rely on online transaction processing to run operations. In short, only a few have learned how to harness the power of warehouses to make their data truly strategic.
"Every single customer does some data warehousing today,'' said Robert Epstein, executive vice president and co-founder of Sybase. But most, he said, are using it to look backward, to analyze past performance. Managers are just now beginning to let warehousing come into its own, to help them drive business forward.
Of Data Warehouses And the Net
The simultaneous explosion of interest in the Internet and in data warehousing is almost entirely coincidental; either one could have existed credibly and productively without the other. But as it turns out, there is a large amount of serendipitous synergy in the coevolution of these two powerful technologies.
After decades of serving as a simple, but nuclear-war-proof communications conduit for scientists, the Internet has blossomed in the last few years with the development of the World Wide Web. With millions of potential customers "surfing'' the Web, many corporations have found this multimedia portion of the Internet a compelling place to advertise their goods, offer information about their services and even fill orders. Federal Express, for example, gives customers access to its internal package tracking system via the Web, saving untold dollars on telephone service calls.
Even when companies do not trust the Internet with their sensitive internal information, many are creating private "intranets,'' which use Web-based technology to deliver information within an organization or to key suppliers and customers. Indeed, Netscape Communications Inc., the leading Internet software producer, says more than half its revenues now come from applications that are not on the Internet. According to a report by Forrester Research, Visa expects to eliminate 2 million pieces of paper per day by letting its 19,000 member banks into its internal network.
Whether by Internet or intranet, when companies start letting customers and suppliers gain access to their corporate data, a data warehouse becomes a must. Clearly, letting outsiders, or even unmonitored employees, touch the operational database would be a recipe for disaster, inviting huge losses in performance if not outright theft of data. In addition, if untrained users are to get anything of value from a database, they need the ad hoc querying capabilities only a data warehouse can provide.
"If you think about publishing data on the Web, there's simply no way you're going to be able to control the users, to limit the kinds of queries they do,'' said Dennis McEvoy, president of the enterprise business group at Sybase, the big database software company.
Actually, until recently the technology has limited the types of queries possible, and limited the utility of Web-based data warehouses. Browsing tools like Netscape's Navigator were not designed for analytical purposes, and are really only adept at finding records that have been encoded in HTML, or hypertext markup language, the lingua franca of the Web. Most of the search engines on the Web are brute force tools, likely to fetch far too much, rather than to home in on a specific piece of data.
But these limitations are passing with the hyper-rapid development of the Internet, and as the data warehouse and analytical software producers find innovative ways to link their technology with the Web. The links are aided by the fact that both data warehouses and the Internet are based on a computer systems architecture called client/server, in which tasks are split between small desktop client computers and larger servers spread across a network.
The Arbor Software Corporation has produced a version of its Essbase program that operates within Netscape Navigator or other browsers to import records from a database to a simple, spreadsheet-like tool for online analytical processing. "The Web enables deployment to wide users, independent of client hardware,'' said Dan Druker, manager of product marketing. "There is zero marginal cost to add another user to the system, and an easy learning curve.''
Sybase has demonstrated a version of its Sybase IQ data warehouse operating over the Web. "We think ultimately all data warehouses will wind up on intranets,'' said Robert Epstein, Sybase's executive vice president. "Graphics viewers, spreadsheet viewers, text, will all work as plug-ins to browsers.''
Further advances in Internet technology will make it possible for users to adopt these plug-ins on an as-needed basis. Using the Java programming language from Sun Microsystems Inc., companies could send the appropriate analytical tool across the Internet or intranet; it would then operate in the user's client machine. While such processes are still in the demonstration stage, the rapid pace of the Internet insures they will reach deployment rapidly.
Web-based tools "will provide information to a larger set of people in the decision-making process,'' said Neil Mendelson, Oracle's director of data warehousing. "The real future of the Web is to enable everyone to get access.''
Reprint No. 96308