Document Type


Publication Date



Scientific databases are usually large distributed and dynamically changing We address the problem of efficient processing of queries in scientific databases especially in very large numerical databases Previous work has focused on how to store the database and the design of index structures for the efficient access of data Recently more and more statistical methods have been used in query optimization Those meth- ods essentially attempt to approximate the distribution of the attribute values in order to estimate the selectivity of query results We introduce a new methodology that uses regression techniques to approximate the actual attribute values Through analysis of the data one derives a set of characteristic functions to form a "regression database " a compressed image of the original database Based on these functions approximate answers to queries may be provided within a pre-specified tolerable error but without the expensive search overhead usually inherent with the use of indexing techniques We propose a framework to build regression databases An experimental prototype is implemented to evaluate the technique in terms of realizability efficiency and practicality The results demonstrate that our approach is complementary to conventional approaches and to statistical methods 0