Sunday, October 19, 2008

Privacy Preserving Data Integration And Mining : Match Prediction

Match Prediction is done using learning based approach. In learning-based approaches, one or more classifiers (e.g., decision tree, Naive Bayes, SVM, etc.) are constructed at source S, using the data instances and schema of S, then sent over to source T. The classifiers are then used to classify the data instances and schema of T. Similarly, classifiers can be constructed at source T and sent over to classify the data instances and schema of S. The classification results are used to construct a matrix that contains a similarity value for any attribute s of S and t of T. This similarity matrix can then be utilized to find matches between S and T.
Schema matching in this approach reduces to a series of classification problems that involve the data and schemas of the two input sources. As such, it is possible to leverage work in privacy-preserving distributed data mining, which have studied how to train and apply classifiers across disparate datasets without revealing sensitive information at the datasets.

No comments:


Find It