Sunday, October 19, 2008

Privacy Preserving Data Integration And Mining : Schema Matching

To share data, sources must first establish semantic correspondences between schemas. However, all current schema matching solutions assume sources can freely share their data and schema. we have to develop schema matching solutions that do not expose the source data and schemas. Once two data sources S and T have adopted their privacy policies, they can start the process of data sharing. As the first step, the sources must cooperate to create semantic mappings among their schemas, to enable the exchange of queries and data . Such semantic mappings can be specified as SQL queries. For example, suppose S and T are data sources that list houses for sale, then a mapping for attribute list-price of source T is:
list-price = SELECT price * (1 + agent-fee-rate)
FROM HOUSES, AGENTS
WHERE (HOUSES.agent id = AGENTS.id)
which specifies how to obtain data values for list price from the tables HOUSES and AGENTS of source S.
Creating mappings typically proceeds in two steps: finding matches, and elaborating matches into semantic mappings. In the first step, matches are found which specify how an attribute of one schema corresponds to an attribute or set of attributes in the other schema. Examples of match include “address = location”, “name = concat (first name,last name)”, and “list-price = price * (1 + agent- fee-rate)”. Research on schema matching has developed a plethora of automated heuristic or learning-based methods to predict matches. These methods significantly reduce the human effort involved in creating matches.
In the second step, a mapping tool elaborates the matches into semantic mappings. For example, the match “list-price = price * (1 + agent-fee-rate)” will be elaborated into the SQL query described earlier, which is the mapping for list- price. This mapping adds information to the match typically, humans must verify the predicted matches. Further more recent work has argued that elaborating matches in to mapping must also involve human efforts
Schema matching lies at the heart of virtually all data integration and sharing efforts. Consequently, numerous matching algorithms have been developed . All current existing matching algorithms, however, assume that sources can freely share their data and schemas, and hence are unsuitable. To develop matching algorithms that preserve privacy, first the following components need to be developed:

No comments:


Find It