Seminar Reports: October 2008

Sunday, October 19, 2008

Privacy Preserving Data Integration And Mining : Abstract

Integrating data from multiple sources has been a long standing challenge in the database community. Techniques such as privacy-preserving data mining promises privacy, but assume data has integration has been accomplished. This means that data mining requires data integration to be done . Data integration methods are seriously hampered by inability to share the data to be integrated.This paper lays out a privacy framework for data integration. Challenges for data integration in the context of this framework are discussed, in the context of existing accomplishments in data integration. Many of these challenges are opportunities for the data mining

Privacy Preserving Data Integration And Mining : Contents

1.INTRODUCTION
2.MOTIVATION
3.DATA INTEGRATION AND MINING
4.PRIVACY PRESERVATION CHALLENGES
5.SCHEMA MATCHING
6.OBJECT MATCHING AND CONSOLIDATION
7.QUERYING ACROSS SOURCES
8.QUANTIFYING PRIVACY DISCLOSURE
9.CONCLUSION
10.REFERENCE

Privacy Preserving Data Integration And Mining : Introduction

The goal of this paper is to identify potential research directions and challenges that need to be addressed to perform privacy-preserving data integration. Increasing privacy and security consciousness has lead to increased research (and development) of methods that compute useful information in a secure fashion. Data integration and sharing have been a long standing challenge for the database community. This need has become critical in numerous contexts, including integrating data on the Web and at enterprises, building ecommerce market places, sharing data for scientific research, data exchange at government agencies, monitoring health crises, and improving homeland security.
Unfortunately, data integration and sharing are hampered by legitimate and widespread privacy concerns. Companies could exchange information to boost productivity, but are prevented by fear of being exploited by competitors or antitrust concerns. Sharing healthcare data could improve scientific research, but the cost of obtaining consent to use individually identifiable information can be prohibitive. Sharing
health care and consumer data enables early detection of disease outbreak, but without provable privacy protection it is difficult to extend these surveillance measures nationally
or internationally. Fire departments could share regulatory and defense plans to enhance their ability to fight terrorism and provide community defense, but fear loss of privacy could lead to liability. The continued exponential growth of distributed personal data could further fuel data integration and sharing applications, but may also be stymied by a privacy backlash. It is critical to develop techniques to enable the integration and sharing of data without losing privacy. The need of the hour is to develop solutions that enable widespread integration and sharing of data, especially in domains of national priorities, while allowing easy and effective privacy control by users. A comprehensive framework
that handles the fundamental problems underlying privacy preserving data integration and sharing is necessary. The framework should be validated by applying it to several important domains and evaluating the result.

Concurrently, various privacy-preserving distributed data mining methods have also been developed which mine global data while protecting the privacy/security of the underlying data sites. However, all of these methods also assume that data integration (including record linkage) has already been done. Note that while data integration is related to privacy-preserving data mining, it is still significantly different. Privacy-preserving data mining deals with gaining knowledge after integration problems are solved. First, a framework and methods for performing such integration is required.

MOTIVATION

There are numerous real-world applications which require data integration while meeting specific privacy constraints. The following, discusses, some of the “motivating drivers”.
1. Sharing Scientific Research Data
Analyzing the prevalence, incidence, and risk factors of diseases is crucial
to understanding and treating them. Such analyses have significant impact on policy decisions. An obvious pre-requisite to (carrying out) such studies is to have the requisite data available. First, data needs to be collected from disparate health care providers and integrated while sanitizing privacy-sensitive information.
This process is extremely time consuming and labour intensive. . A breach of privacy can lead to significant damage (harm) to individuals both materially and/or emotionally. Another problem is the possibility of discrimination against various sub-groups from seemingly conclusive statistical results. Similarly, health care providers themselves risk loss by leaking accurate data reflecting their performance and weaknesses.

Privacy is addressed today by preventing dissemination
rather than integrating privacy constraints into the data sharing process. Privacy-preserving integration and sharing of research data in health sciences has become crucial to enabling scientific discovery.

2. Effective Public Safety
Integration and sharing between public agencies, and public and private organizations, can have a strong positive impact on public safety. But concerns over the privacy implications of such private/public sector sharing have
impacted uses of data mining in public safety:
For example, fire fighting departments in Illinois routinely seek sample regulations and training materials from fellow fire fighting departments (e.g., handling a bio-hazard situation, or an unknown emerging public safety threat). Such materials allow them to develop similar programs and to provide the most up-to-date effective community defense. However, fellow departments are reluctant to share such materials, for fear of liability if programs are deemed inadequate. They would be happy to share the material if identity (and thus liability exposure) was protected.

3. Monitoring Healthcare Crisis
Detecting and containing disease outbreaks early is key to preventing life-threatening infectious diseases, witness the successful eradication of smallpox. Outbreaks of infectious diseases such as West Nile, SARS, and bird flu; as well as threats of bio-terrorism; have made disease surveillance into a national priority. Outbreak detection works best when a variety of data sources (human health-care, animal health, consumer data) are integrated and evaluated in real time. For example, the Real-Time Outbreak Detection System (at the University of Pittsburgh Medical Center) uses data collected from regional healthcare providers and purchase records of over-the-counter drugs to determine outbreak patterns. This system forwards all regional data to a central data warehouse for evaluation purposes.
Privacy laws typically do not cover government public health organizations, raising the spectre of systems with inadequate privacy protection. The concerns are similar to the risks noted above for healthcare research data: External attacks or insider misuse can damage individuals, healthcare providers, or groups within society. Protecting identity and liability exposure by effective privacy-preserving data integration and sharing techniques will enable advances in emergency preparedness and response, public safety, health care and homeland security that might otherwise be prevented due to privacy concerns.

4 Facilitating Ecommerce
There are innumerable opportunities in e-commerce to enable beneficial collaboration, if privacy concerns could be met. Corporation will not (in some cases, cannot) share confidential data with each other, but are willing to engage in some process for mutual benefit. As an example, consider secure supply-chain management. An example scenario would be two companies that use a common raw material. Knowing that they share this need and coordinating their orders and production would enable smoothing out the supply line and improving overall supply chain efficiency. A prerequisite for this coordination is the ability to identify the common raw material, suppliers, customers, etc., without giving up competitive knowledge advantages or violating anti-trust law. Standards for sharing logistics information cover such a wide ground that ambiguity is inevitable (e.g., the ECCMA Open Technical Dictionary has over 30,000 standard attribute names).

DATA INTEGRATION AND DATA MINING

Data Integration and Data Mining are quite closely coupled. Integration is a necessary pre-requisite before mining data collected from multiple sources. At the same time, data mining/machine learning techniques are used to enable automatic data integration. Several systems have been developed to implement automatic schema matching . The systems use machine learning/data mining tools to help automate schema matching. SemInt uses neural networks to determine match candidates. Clustering is done
on similar attributes of the input schema. The signatures of the cluster centers are used as training data. Matching is done by feeding attributes from the second schema into the neural network. LSD also uses machine learning techniques for schema matching .LSD consists of several phases. First, mappings for several sources are manually specified. Then source data is extracted (into XML) and training data is created for each base learner. Finally the base learners and the meta-learner are trained. Further steps are carried out to refine the weights learned. The base learners used are a nearest neighbor classification model as well as a Na¨ıve Bayes learner. Again, there has been work on different privacy-preserving classification models that is applicable. Artemis is another schema integration tool that computes “affinities” in the range 0 to 1 between attributes. Schema integration is done by clustering attributes based on those affinities. Clearly, a lot of work in both privacy preserving data mining as well as cryptography is relevant to
the problem of privacy-preserving schema integration.
Record linkage also uses various machine learning techniques.
Record linkage can be viewed as a pattern classification problem. In pattern classification problems, the goal is to correctly assign patterns to one of a finite number of classes. Similarly, the goal of the record linkage problem is to determine the matching status of a pair of records brought together for comparison. Machine learning methods, such as decision tree induction, neural networks, instance-based learning, clustering, are widely used for pattern classification. Given a set of patterns, a machine learning method builds a decision model that can be used to predict the class of each unclassified pattern. Again, prior privacy-preserving work is relevant. At the other end of the spectrum, privacy-preserving data mining assumes that data integration has already been done, which is clearly not a solved problem.

PRIVACY PRESERVATION CHALLENGES

The following describes the fundamental challenges in privacy-
preserving data integration.

Privacy Preserving Data Integration And Mining : Privacy Framework

We have to develop a privacy framework for data integration that is flexible and clear to the end users. This demands understandable and provably consistent definitions for building a privacy policy, as well as standards and mechanisms for enforcement .
Database security has generally focused on access control.Users are explicitly (or perhaps implicitly) allowed certain types of access to a data item.This includes work in multilevel secure database as well as statistical queries. Privacy is a more complex concept. Most privacy laws balance benefit vs. risk, access is allowed when there is adequate benefit resulting from access. An example is the European Community directive on data protection which allows processing of private data in situations where specific conditions are met. The Health Insurance Portability and Accountability Act in the U.S. specifies similar conditions for use of data. Individual organizations may define their own policies to address their customers’ needs. The problems are exacerbated in a federated environment. The task of data integration itself poses risks, as revealing even the presence of data items at a site may violate privacy.
Some of the privacy issues have been addressed for the case of a single database management system in Hippocratic Databases. Other privacy issues have been addressed for the case of a single interaction between a user and a Websitein the P3P standard . None of the current techniques address privacy concerns when data is exchanged between multiple organizations, and transformed and integrated with other data sources.
A framework is required for defining private data and privacy in the context of data integration and sharing. The notion of Privacy Views, Privacy Policies, and Purpose
Statements is essential towards such a framework. We illustrate using the “Sharing Scientific Research Data”.

Privacy Preserving Data Integration And Mining : Privacy views

The database administrator defines what is private data by specifying a set of privacy views, in a declarative language extending SQL.View defines a set of private attributes and an owner to whom it is related or belongs.
For example, the database administrator in a health care organization might define the following three privacy views

PRIVACY-VIEW patientAddressDob
OWNER Patient.pid
SELECT Patient.address, Patient.dob
FROM Patient

PRIVACY-VIEW zipDisease
OWNER Patient.pid
SELECT Patient.address.zip, Disease.name
FROM Patient, Treatment, Disease
WHERE Patient.pid = Treatment.pid and Treatment.did = Disease.did

PRIVACY-VIEW physicianDisease
OWNER Patient.pid
SELECT Physician.name, Disease.name
FROM Patient, Treatment, Disease, Physician
WHERE Patient.pid = Treatment.pid and Treatment.did = Disease.did and Physician.id = Treatment.id

The first Privacy view specifies that a patient’s address and dob (date-of-birth) are considered private data when occurring together. Whenever these two attributes occur together in a piece of data, e.g., to be exchanged with a partner or integrated with other data, they are private. Notice that here dob is not private by itself (and similarly address: more below). Similar definitions can be given for patient name and other fields commonly referred to as “individually identifiable information”: Sets of attributes that can be used to tie a tuple or a set of tuples in a data source to a specific real-world entity (e.g., a person). Alternatively, administrators may choose to define database IDs or tuple IDs as private data, both of which could be used to breach privacy over time. Database IDs are used to identify from which data source the data comes. While not necessarily an individual privacy issue, protecting the data source may be a prerequisite for organizations to participate in sharing. Tuple IDs are used to identify tuples within a source. While this may not inherently violate privacy, it may enable tracking of tuples that can violate privacy over time. In general, privacy views can be much more complex (i.e. by specifying associations between attributes from different tables).
The second privacy view, zipDisease, is more subtle: it says that the patient’s zip code and the disease constitutes private data. The zip code alone is not an individually identifiable information, still it is part of a person’s private data and here the decision has been made to consider the association zip, disease as private. Notice also that the two attributes come from different tables. This example illustrates the power of the privacy views: any combination of data can be declared to be private and have an owner.
The third privacy definition query specifies that even the association between physician names and diseases is to be considered private data, owned by the patient. This example illustrates the difficulty in defining ownership for private data. Suppose Dr. Johnson treats both patients “Smith” and patient “Brown” for Diabetes. Who owns the association (“Dr. Johnson”, “Diabetes”), Smith or Brown ? We address this by adopting bag semantics, i.e., we consider two occurrences of the tuple (“Dr. Johnson”, “Diabetes”), one owned by Smith the other by Brown. Privacy views could be implemented by a privacy monitor that checks every data item being retrieved from the database and detects if it contains items that have been defined as private. There are two approaches: compile-time (based on query containment) and run-time (based on materializing the privacy views and building indices on the private attributes). Both approaches need to be investigated and tradeoffs evaluated.

Privacy Preserving Data Integration And Mining : Privacy Policies

Along with privacy views, it is necessary to have a notion of privacy policies. The database administrator can decide which policy applies to each view. The following describes privacy policy each of the views described above

PRIVACY-POLICY individualData
ALLOW-ACCESS-TO y
FROM Consent x, patientAddressDob y
WHERE x.pid = y.owner and x.type = ’yes’
BENEFICIARY *

PRIVACY-POLICY defaultPolicy
ALLOW-ACCESS-TO x
FROM patientName x
BENEFICIARY x.owner
BENEFICIARY *

PRIVACY-POLICY militaryPersonellWaiver
ALLOW-ACCESS-TO x
FROM patientName x, Patient y
WHERE x.owner=y.pid and y.employer=’Military’
BENEFICIARY Government
The first privacy policy states that y can be allowed access only if patient x has given explicit consent ,that is private data patientAddressDob (defined above) can be released if the owner has given explicit consent, as registered in a Consent table.
The second is a default policy which allows access to patient names as long as benefit accrues to the patient. The second policy says that any patient name can be released as long as the application using it runs on behalf of (for the benefit of) that patient.
The third says that Military patientNames can be released for use by the Government. As with privacy views, more complex privacy policies are also possible.
Privacy policies can be enforced by the server holding the data: data items will be shared only if the purpose statement of the requester (see below) satisfies the policy. But, in addition, every data item leaving the server should be annotated with privacy metadata expressing the privacy policies that have to be applied. These annotations travel with the data, and are preserved and perhaps modified when the data is
integrated with data from other sources or transformed.
Query execution becomes much harder, since all privacy views and policies must result in a single piece of privacy metadata; it is not obvious how to do that. Prior work addresses a similar but not identical challenge: how a set of access control policies result in a single, multiple encrypted data instance

Privacy Preserving Data Integration And Mining : Purpose Statements

Finally, once data has been shared and integrated, it eventually reaches an application that uses it. Here, the privacy metadata needs to be compared with the application’s stated purpose. A flexible language is required in which applications can state the purpose of their action, and explicitly mention the beneficiary.

Privacy Preserving Data Integration And Mining : Schema Matching

To share data, sources must first establish semantic correspondences between schemas. However, all current schema matching solutions assume sources can freely share their data and schema. we have to develop schema matching solutions that do not expose the source data and schemas. Once two data sources S and T have adopted their privacy policies, they can start the process of data sharing. As the first step, the sources must cooperate to create semantic mappings among their schemas, to enable the exchange of queries and data . Such semantic mappings can be specified as SQL queries. For example, suppose S and T are data sources that list houses for sale, then a mapping for attribute list-price of source T is:
list-price = SELECT price * (1 + agent-fee-rate)
FROM HOUSES, AGENTS
WHERE (HOUSES.agent id = AGENTS.id)
which specifies how to obtain data values for list price from the tables HOUSES and AGENTS of source S.
Creating mappings typically proceeds in two steps: finding matches, and elaborating matches into semantic mappings. In the first step, matches are found which specify how an attribute of one schema corresponds to an attribute or set of attributes in the other schema. Examples of match include “address = location”, “name = concat (first name,last name)”, and “list-price = price * (1 + agent- fee-rate)”. Research on schema matching has developed a plethora of automated heuristic or learning-based methods to predict matches. These methods significantly reduce the human effort involved in creating matches.
In the second step, a mapping tool elaborates the matches into semantic mappings. For example, the match “list-price = price * (1 + agent-fee-rate)” will be elaborated into the SQL query described earlier, which is the mapping for list- price. This mapping adds information to the match typically, humans must verify the predicted matches. Further more recent work has argued that elaborating matches in to mapping must also involve human efforts
Schema matching lies at the heart of virtually all data integration and sharing efforts. Consequently, numerous matching algorithms have been developed . All current existing matching algorithms, however, assume that sources can freely share their data and schemas, and hence are unsuitable. To develop matching algorithms that preserve privacy, first the following components need to be developed:

Privacy Preserving Data Integration And Mining : Match Prediction

Match Prediction is done using learning based approach. In learning-based approaches, one or more classifiers (e.g., decision tree, Naive Bayes, SVM, etc.) are constructed at source S, using the data instances and schema of S, then sent over to source T. The classifiers are then used to classify the data instances and schema of T. Similarly, classifiers can be constructed at source T and sent over to classify the data instances and schema of S. The classification results are used to construct a matrix that contains a similarity value for any attribute s of S and t of T. This similarity matrix can then be utilized to find matches between S and T.
Schema matching in this approach reduces to a series of classification problems that involve the data and schemas of the two input sources. As such, it is possible to leverage work in privacy-preserving distributed data mining, which have studied how to train and apply classifiers across disparate datasets without revealing sensitive information at the datasets.

Privacy Preserving Data Integration And Mining : Human Verification Of Matches

Suppose a match m has been found. Now humans at both or one of the sources S and T must examine m to verify its correctness. The goal is then to make certain such verification is privacy-preserving. The goal is to give humans enough information to verify matches, while preserving privacy. One way to achieve this can be randomly selecting some values for particular attributes and show the user only these values. It can be argued that revealing only few attribute values does not reveal anything useful about the distribution. Since two attributes are found to be similar, it can be argued that few samples does not reveal too much useful information.

Privacy Preserving Data Integration And Mining : Mapping Creation

Once a match has been verified and appears to be correct, humans can proceed to the step of working in conjunction with a mapping tool to refine the match into a mapping. In this step, humans typically are shown examples of data, as generated by various mapping choices, and asked to select the correct example. It is necessary to ensure that people are shown data that allows generating mappings, but does not violate privacy.

Privacy Preserving Data Integration And Mining : Object Matching and Consolidation

Data received from multiple sources may contain duplicates that need to be removed. In many cases it is important to be able to consolidate information about entities (e.g., to construct more comprehensive sets of scientific data).How can we match entities and consolidate information about them across sources, without revealing the origin of the sources or the real-world origin of the entities? Record Linkage is the identification of records that refer to the same real-world entity. This is a key challenge to
enabling data integration from heterogeneous data sources. What makes record linkage a problem in its own right, (i.e., different from the duplicate elimination problem), is the factthat real-world data is “dirty”. In other words, if data were accurate, record linkage would be similar to duplicate elimination. Unfortunately, in real-world data, duplicate records may have different values in one or more fields (e.g. misspelling causes multiple records for the same person).
Record linkage techniques can be used to disclose data confidentiality. In particular, a privacy-aware corporation will use anonymization techniques to protect its own data before sharing it with other businesses. A data intruder tries to identify as many concealed records as possible using an external database (many external databases are now publicly-available). Anonymization techniques must be aware of the capabilities of record linkage techniques to preserve the privacy of the data.
On the other hand, businesses need to integrate their databases to perform data mining and analysis procedures. Such data integration requires privacy-preserving record linkage, that is record linkage in presence of a privacy framework that ensures the data confidentiality of each business. Thus, we need solutions for the following problems:

• Privacy-preserving record linkage: discovering the records that represent the same real world entity from two integrated databases each of which is protected (encrypted or anonymized). In other words, records are matched without having their identity revealed.

• Record linkage aware data protection: that is protecting the data, before sharing, using anonymization techniques that are aware of the possible use of record linkage, with public available data, to reveal the identity of the records.

• Online record linkage: linking records that arrive continuously in a stream. Real-time systems and sensor networks are two examples of applications that need online data analysis, cleaning, and mining.
Record linkage has been studied in various contexts and has been referred to using different names, such as the merge purge problem. The record linkage problem can also be viewed as a pattern classification problem . In pattern classification problems, the goal is to correctly assign patterns to one of a finite number of classes. Similarly, the goal of the record linkage problem is to determine the matching status of a pair of records brought together for comparison. Machine learning methods, such as decision tree induction, neural networks, instance-based learning, clustering, are widely used for pattern classification. Given a set of patterns, a machine learning method builds a decision model that can be used to predict the class of each unclassified pattern. TAILOR, an interactive Record Linkage toolbox, uses three classification models for record linkage based on induction and clustering.

Privacy Preserving Data Integration And Mining : Querying Across Sources

Once semantic correspondences have been established, it is possible to query (e.g., with SQL queries) across the sources. How do we ensure that query results do not violate privacy policy? How do we query the sources such that only the results are disclosed? How can we prevent the leaking of information from answering a set of queries? Only a few general techniques exist today for querying datasets while preserving privacy: statistical databases, privacy-preserving join computation, and privacy-preserving top-K queries. In statistical databases, the goal is to allow users to ask aggregate queries over the database while hiding individual data items. There exists a rich literature on this topic, and a comprehensive survey is in. Unfortunately, the main results
are negative: while it is possible to preserve privacy for a single query, ensuring that a sequence of query results cannot be combined to disclose individual data is not practical. Privacy-preserving joins and the more restricted privacy-preserving intersection size computation have been addressed in. Here, each of the two parties learns only the query’s answer, and nothing else. The techniques only apply to a specialized class of queries.
Privacy-preserving top-K queries have also recently been studied. Such a query returns just the closest K matches to a query without revealing anything about why those matches are close, what the values of the attributes of the close items are, or even which site the closest matches come from. This is accomplished efficiently through the use of an untrusted third party: a party that is not allowed to see private values, but is trusted not to collude with any site to violate privacy(see Figure 1.)

In this method, each site finds its own top k, and encrypts each result with the public key of the querying site. The parties then compare their top k with the top k of all other sites – except that the comparison used gives each site a random share of the result, so neither learns the result. The results from all sites are combined, scrambled, and given to the non-colluding untrusted site. This site can combine the random shares to get a comparison result for each pair, enabling it to sort and select the top k. The results corresponding to these k are sent to the querying site. Each site learns nothing about other sites (the comparison result it sees appears to be a randomly chosen bit.) The untrusted site sees k*n encrypted results. It is able to totally order the results, but since it knows nothing about what each means or where it comes from, it learns nothing. The querying site only sees the final result.

For cohort studies, query criteria will combine attributes about individuals across data sources. Solutions have been developed for this. Privacy-preserving data mining also provides some building blocks. However, the issue of inference from multiple queries must still be resolved. Issues include categorizing types of queries with respect to privacy policy, ensuring that query processing does not disclose
information, and guarding against leakage from a set of queries. While there has been work in this area, many practical challenges remain.
Another result is finding matches to a query without revealing the query. In this case, both the query and the data are private – the only thing that can be revealed is which items match. In addition, the method allows checking for “forbidden” queries – even though the query is not revealed, it can be checked against combinations of query criteria that are not permitted.

Privacy Preserving Data Integration And Mining : Quantifying Privacy Disclosure

In real life, with any information disclosure there is always some privacy loss. We need reliable metrics for quantifying privacy loss. Instead of simple 0-1 metrics (whether an item is revealed or not), we need to consider probabilistic notions of conditional loss, such as decreasing the range of values an item could have, or increasing the probability of accuracy of an estimate. In general, a starting classification could measure the following: probability of complete disclosure of all data, probability of complete disclosure of a specific item, probability of complete disclosure of a random item. Privacy-preserving methods can be evaluated on the basis of their susceptibility to the above metrics. Also some of the existing measures can be used in this direction. For example, one of the popular metrics (Infer(x ! y)) used in database security can be easily applied for measuring privacy loss in schema matching phase. In the original definition H(y) corresponds to entropy of y, and Hx(y) corresponds to conditional entropy of y given x then privacy loss due to revelation of x is given as follows:
Infer(x ! y) =H(y) − Hx(y)
H(y)
Note that for schema matching phase, what is revealed to the human for verification can be modeled as revealing x. Although this measure can be used in many different cases, it is hard to calculate the conditional entropies. Therefore, there is need for developing different privacy metrics.

Privacy Preserving Data Integration And Mining : Conclusion

This paper presents potential research directions and challenges that need to be addressed in order to achieve privacy-preserving data integration. This also points out some plausible solution ideas. Though much work remains to be done, we believe that the full potential of privacy preserving data management can only be exploited if privacy is also maintained during data integration. Availability of such tools will also enable us to use distributed data while protecting privacy.

Privacy Preserving Data Integration And Mining : References

www.doi.acm.org
www.ieeexplore.ieee.org
www.niss.org

Multi Core Processors : Contents

Introduction
History
Intel Processors
A Fundamental Theorem of Multi-Core Processors

The Quad-Core Line Up
Intel® Core 2 Extreme Quad-Core Processor
Intel® Xeon 5300 Processor
Transitioning the Industry to Multi-Core Processing
Amd processors
AMD Turion™ 64 Mobile Technology
AMD Opteron™ Processors

The advanced technologies in opteron processors
Quad-Core Upgradeability of Next-Generation
AMD Opteron Processors
Comparison of processors

Conclusion.

Multi Core Processors : Introduction

One constant in computing is that the world’s hunger for faster performance is never satisfied. Every new performance advance in processors leads to another level of greater performance demands from businesses and consumers. Today these performance demands are not just for speed, but also for smaller, more powerful mobile devices, longer battery life, quieter desktop PCs, and—in the enterprise—better Price/performance per watt and lower cooling costs. People want improvements in productivity, security, multitasking (running multiple applications simultaneously on your computer), data protection, game performance, and many other capabilities. There’s also a growing demand for more convenient form factors for the home, office, data center, and on the go.

Through advances in silicon technology, microarchitecture, software, and platform technologies, Intel is on a fast-paced trajectory to continuously deliver new generations of multi-core processors with the superior performance and energy-efficiency necessary to meet these demands for years to come. A new cadence1 in the microarchitecture arena (see sidebar next page), combined with Intel’s ability to continue to extend Moore’s Law, will enable Intel to bring new levels of performance, power savings, and computing capabilities year after year.

In mid-2006, we reached new levels of energy-efficient performance with our Intel® Core™2 Duo processors and Dual-Core Intel® Xeon® processor 5100 series, both produced with our latest 65-nanometer(nm) silicon technology and microarchitecture (Intel® Core™ microarchitecture). Now we’re ready to top that with the world’s first mainstream quad-core processors for both desktop and mainstream servers—Intel® Core™2 Quad processors, Intel® Core™2 Extreme quad-core processors, and others. This paper explains the advantages and challenges of multi-core processing, plus provides a glimpse into the upcoming Intel quad-core processors and the direction in which Intel is taking multi-core processors to the future. We discuss many of the benefits you will see as we continue to increase processor performance, energy efficiency, and capabilities.

Intel quad core processors

Intel® Core™ Extreme Quad-Core Processor
Intel® Xeon 5300 Processor
Intel® Core™2 Quad processor Q6600Quad Core
Intel® Xeon® processor L5310
Quad Core Intel® Xeon® processor 3200

Intel processors: History

For years, Intel customers came to expect a doubling of performance every 18-24 months in accordance with oore’s Law. Most of these performance gains came from dramatic increases in frequency (from 5 MHz to 3 GHz in the years from 1983 to 2002) and through process technology advancements. Improvements also came from increases in instructions per cycle (IPC). By 2002, however, increasing power densities and the resultant heat began to reveal some limitations in using predominately frequency as a way of improving performance. So, while Moore’s Law frequency increases, and IPC improvements continue to play an important role in performance increases, new thinking is also required. The best example of this new thinking is multi-core processors. By putting multiple execution cores into a single processor (as well as continuing to increase clock frequency), Intel is able to provide even greater multiples of processing power.

Using multi-core processors, Intel can dramatically increase a computer’s capabilities and computing resources, providing better responsiveness, improving multithreaded throughput, and delivering the advantages of parallel computing to properly threaded mainstream applications.

A New Cadence for Technological Advancement Building on the foundation of Intel Core microarchitecture (introduced in 2006), Intel is establishing a new cadence that will speed up the delivery of products featuring superior performance and energy-efficiency for years to come. Intel plans to deliver a new, optimized energy-efficient performance microarchitecture approximately every two years that supports all its process technology advancements. For instance, in late 2007, Intel process technology will transition to 45 nm and effectively double the number of transistors in a given die size.

In 2008 Intel will follow this gain with a new microarchitecture codenamed “Nehalem” expected to deliver new capabilities and several percentage-point improvements in performance and energy-efficiency. This cycle will then move on to 32 nm and another new microarchitecture targeted for 2010.

A Fundamental Theorem of Multi-Core Processors

Multi-core processors take advantage of a fundamental relationship between power and frequency. By incorporating multiple cores, each core is able to run at a lower frequency, dividing among them the power normally given to a single core. The result is a big performance increase over a single core processor. The following illustration—based on our lab experiments with commonly used workloads—illustrates this key advantage.

Increasing clock frequency by 20 percent to a single core delivers a 13 percent performance gain, but requires 73 percent greater power. Conversely, decreasing clock frequency by 20 percent reduces power usage by 49 percent, but results in just a 13 percent performance loss.
Under-Clocking Relative single-core frequency and Vcc 1.13x, 0.51x, 1.73x, 1.00x, 0.87x Over-clocked (+20%) Max Frequency Under-clocked (-20%)

Here we add a second core on the under clocked example in Figure 1. This result in a dual-core processor that at 20 percent reduced clock frequency effectively delivers 73 percent more performance while using approximately the same power as a single-core processor at maximum frequency. Multi-Core Energy-Efficient Performance

The Quad-Core Line Up

First up are the Intel® Core™2 Extreme quad-core processor QX6700 and the new Quad-Core Intel® Xeon® 5300 processor for servers. Slated for introduction in late 2006, these 65 nm quad-core processors feature four complete execution cores within a single processor and are based upon the revolutionary and proven Intel® Core™ microarchitecture.

Intel® Core 2 Extreme Quad-Core Processor

World’s First Quad-Core for the Desktop. This quad-core desktop processor will be the ultimate gaming machine and multimedia processing engine for today’s growing list of threaded applications. In addition to being excellent for intensive multitasking, the Intel Core 2 Extreme quad-core processor will provide impressive gaming performance, offering plenty of headroom for tomorrow’s thread-intensive games. Gamers can expect a smoother, more exciting gaming experience through the distribution of artificial intelligence (AI), physics and rendering across four hardware threads. Ideal for processor-intensive, highly threaded applications, the Intel Core 2 Extreme quad-core processor will be the top choice for multimedia enthusiasts, gamers, and workers in demanding multitasking environments. It will feature 2.66 GHz core speed and 1066 MHz front side bus speed.

Intel® Xeon 5300 Processor

Break through Performance from the Industry’s First Quad-Core Standard High-Volume Processor. This new quad-core processor will enable server customers to turbo boost their general purpose servers with breakthrough energy-efficient performance, greater density and fewer cooling challenges.

The Quad-Core Intel Xeon 5300 processor provides up to 50 percent better performance8 (SPE Cint Rate) than dual-core 2-way Intel Xeon processors on certain applications. The additional threads from quad-core technology and key Intel platform-level innovations deliver the most headroom for running multiple applications simultaneously and virtualized environments on a two-way server.

The Quad-Core Intel Xeon 5300 series will feature 2.66 GHz
to 1.60 GHz cores speeds, 1333 to 1066 MHz bus speeds, and a 105 watt thermal design point (TDP). A low power version (L5310) with a 50 watt TDP will be available in the first quarter of 2007. Another version will be available for single-processor servers and workstations in the same time frame.

Quad-Core Intel Xeon 5300 processor platforms : advanced capabilities

• Intel® Virtualization Technology (Intel® VT). This is the industry’s first hardware-assisted technology supporting today’s industry leading virtualization software.

• Fully-buffered DIMM Technology. The latest in memory technology, fully-buffered DIMM technology provides significantly greater performance and capacity while improving memory reliability.

• Intel® I/O Acceleration Technology. This unique Intel technology
moves network data more efficiently through Intel Xeon processor based
servers for fast, scaleable, and reliable networking

Intel® Core™2 Quad processor Q6600

• Multimedia powerhouse for demanding entertainment applications
• Ideal choice for processor intensive, highly threaded applications
• 2.40 GHz core speed, 1066 MHz bus speed

Quad Core Intel® Xeon® processor L5310

• Low Power version of 5300
• 50 watt thermal design point
• 1.6 GHz core speed, 1066 MHz bus speed Quad Core Intel® Xeon® processor 3200
• For single-processor servers and workstation systems

Transitioning the Industry to Multi-Core Processing

One immediate benefit of multi-core processors is how they improve an operating system’s ability to multitask applications. For instance, say you have a virus scan running in the background while you’re working on your word-processing application. This often degrades responsiveness so much that when you strike a key, there can be a delay before the letter actually appears on the screen. On multi-core processors, the operating system can schedule the tasks in different cores so that each task runs at full performance. Another major multi-core benefit comes from individual applications optimized for multi-core processors. These applications, when properly programmed, can split a task into multiple smaller tasks and run them in separate threads. For instance, a word processor can have “find and replace” run as a separate thread so doing a “find and replace” on a big document doesn’t have to keep you from continuing to write or edit.

In a game, a graphics algorithm needing extensive processing power could be one thread, rendering the next scene on the fly, while another thread responds to your commands for a character’s movements. The critical element in multi-core computing is the software. The throughput, energy efficiency, and multitasking performance of multi-core processors will all be more fully realized when application code is threaded and multi-core ready. Intel provides extensive partner programs with software developers, operating system vendors, ISVs, and academia to accelerate the delivery of dual-core and quad-core products. Intel has recently updated the Intel® Threading Building Blocks, Intel® Thread Profiler, and Intel® Thread Checker tools to support our quad-core products.

AMD processors

1.AMD Turion™ 64 Mobile Technology
2.AMD Turion™ 64 X 2 Dual-Core Mobile Technologies

AMD processors AMD Turion™ 64 Mobile Technology

1.AMD Turion™ 64 Mobile Technology
2.AMD Turion™ 64 X 2 Dual-Core Mobile Technologies
AMD Turion™ 64 mobile technology is the most advanced family of simultaneous 32- and 64-bit Windows®-compatible processors made for mobility uniquely optimized to deliver AMD64 performance in thinner and lighter notebook designs with longer battery life, enhanced security*, and compatibility with the latest wireless and graphics technologies

AMD Turion™ 64 Mobile Technology: Key Architectural Features Key Architectural Features

The AMD64 core provides leading-edge 32-bit performance, seamless 32-to 64-bit migration and therefore investment protection

AMD64 technology features uncompromising 32-bit performance and is ready for tomorrow’s software, today.
Vastly expands memory addressability with 40-bit physical addresses, 48-bit virtual addresses
Doubles the number of internal registers with eight additional (sixteen total) 64-bit integer registers and eight additional (sixteen total) 128-bit SSE/SSE2/SSE3 registers

Incorporating 3DNow Professional technology, SSE2 and SSE3, AMD Turion 64 mobile technology is compatible with the largest installed base of multimedia-enhance software

Enhanced Virus Protection with Windows® XP SP2

Enhanced Virus Protection enabled by Windows XP Service Pack 2 is designed to prevent the spread of certain viruses, like MSBlaster and Slammer, significantly reducing the cost and down-time associated with similar viruses and improving the protection of computers and personal information against certain PC viruses. Enhanced Virus Protection will by default only protect the user's Windows operating system. Users must, at installation of Microsoft Windows® XP Service Pack 2 or when they first purchase their system, choose to enable the protection of their applications and associated files from memory buffer overrun attacks. AMD and Microsoft strongly recommend that users use third party anti-virus software as part of their security strategy.
High-bandwidth, low-latency integrated memory controller

Provides a performance boost by directly connecting the processor to the memory, thus dramatically reducing memory latency.
Supports industry-standard, widely-available PC3200,PC2700, PC2100 or PC1600 DDR SDRAM memory
Unbuffered SO-DIMMs
Features ECC protection that enables increased system reliability, helping to ensure your systems run smoothly
Up to 3.2 GB/s memory bandwidth

Hyper Transport™ technology for high speed I/O communication

Hyper Transport technology helps increase overall system performance by helping reduce traditional system bottlenecks, increasing I/O bandwidth, and reducing I/O latency; which translates into overall better performance.
One 16-bit link supporting up to 1600MHz
Up to 6.4GB/s peak Hyper Transport™ I/O bandwidth

Large high performance, on-chip cache

64KB Level 1 instruction cache
64KB Level 1 data cache
Up to 1MB Level 2 cache
Improved branch prediction for greater accuracy in anticipating instruction calls
Enhanced TLB structures for better memory management of complex workloads

Made for mobility

AMD PowerNow!™ technology extends battery life by dynamically switching the performance states (processor core voltage and operating frequency) based on processor performance requirements
Additional C3 Deeper Sleep state reduces power consumption during idle moments

Wireless Compatibility

Compatible with currently available 802.11 a, b and g wireless solutions
AMD enables a choice of best-in-class wireless solutions
Packaging: 754-pin lidless micro PGA, organic package. Die size: Approximately 114 million transistors on 115mm 2. Processors are manufactured using AMD's state-of-the-art 90nm SOI technology optimized for mobile processors at AMD's Fab 30 wafer fabrication facility in Dresden, Germany.

AMD Opteron™ Processors

Next-Generation AMD Opteron™ processors add important new features like quad-core upgradeability, AMD Virtualization™ (AMD-V™), and energy-efficient DDR2 memory to the proven technologies introduced in 2003 with first-generation AMD Opteron processors.
The advanced technologies in opteron processors:

AMD64 technology
Runs existing installed base of 32-bit applications and operating systems at peak performance, while providing a 64-bit capable migration path
Designed to enable 64-bit computing while remaining compatible with the vast x86 software infrastructure
Enables a single architecture across 32- and 64-bit environments

Direct Connect Architecture
AMD’s revolutionary Direct Connect Architecture helps eliminate the bottlenecks inherent in traditional front-side bus architectures
Direct Connect Architecture connects the processors, integrated memory controller, and I/O directly to the CPU and communicates at CPU speed
HyperTransport™ technology provides a scalable bandwidth interconnect between processors, I/O subsystems, and other chipsets, with up to three coherent HyperTransport technology links providing up to 24.0 GB/s peak bandwidth per processor

Integrated Memory Controller - Integrated on-die DDR2 DRAM memory controller offers available memory bandwidth up to 10.7 GB/s (with DDR2-667) per processor
AMD Opteron™ processors with DDR2 memory are designed to offer a seamless upgrade path from dual-core to quad-core when they are available in 2007 in the same thermal envelope to help leverage existing investments
Offers a significant performance improvement while maintaining the same platform at the same power efficiency

AMD Virtualization™ (AMD-V™)

Quad-Core Upgradeability of Next-Generation AMD Opteron™ Processors

AMD introduced the first multi-core technology for x86-based servers and workstations with the Dual-Core AMD Opteron™ processor launch in April 2005. Next-Generation AMD Opteron processors add DDR2 memory and AMD Virtualization™ to further enhance AMD’s industry leading virtualization capabilities. Quad-Core AMD Opteron processors, planned for 2007, represent the next milestone in our multi-core roadmap.

The upgrade path built into Next-Generation AMD Opteron processors is enabled by AMD64 technology with Direct Connect Architecture. We designed the AMD64 computing platform from the ground up to be optimized for multiple cores. At the heart of AMD64 is our innovative Direct Connect Architecture, which helps eliminate the bottlenecks inherent in traditional front-side bus architecture by directly connecting the processors, the memory controller, and the I/O to the central processing unit (CPU).

Multi Core Processors : Conclusion

INTEL QUAD CORE PROCESSORS & AMD’S ADVANCED PROCESSORS are the processors of future which offers a ultimate computing effect for the users. Each processor has their own advantages in separate stages.

Multi Core Processors : Reference

www.tomshardware.com
www.intel.com
www.amd.com
www.wikipedia.org

Sunday, October 19, 2008

Find It

My Blog List

Category

Blog Archive

Followers