Distributed_computing

Distributed computing

From Wikipedia, the free encyclopedia

"Distributed Information Processing" redirects here. For the computer company, see DIP Research.

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.^[1] The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components,lack of a global clock, and independent failure of components.^[1] Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.^[2] There are many alternatives for the message passing mechanism, including pure HTTP, RPC-like connectors and message queues.

A goal and challenge pursued by some computer scientists and practitioners in distributed systems is location transparency; however, this goal has fallen out of favour in industry, as distributed systems are different from conventional non-distributed systems, and the differences, such as network partitions, partial system failures, and partial upgrades, cannot simply be "papered over" by attempts at "transparency" (seeCAP theorem).

Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers,^[3] which communicate with each other by message passing.^[4]

Introduction[edit]

The word distributed in terms such as "distributed system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area.^[5] The terms are nowadays used in a much wider sense, even referring to autonomous processes that run on the same physical computer and interact with each other by message passing.^[4] While there is no single definition of a distributed system,^[6] the following defining properties are commonly used:

There are several autonomous computational entities, each of which has its own local memory.^[7]
The entities communicate with each other by message passing.^[8]

In this article, the computational entities are called computers or nodes.

A distributed system may have a common goal, such as solving a large computational problem.^[9] Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users.^[10]

Other typical properties of distributed systems include the following:

The system has to tolerate failures in individual computers.^[11]
The structure of the system (network topology, network latency, number of computers) is not known in advance, the system may consist of different kinds of computers and network links, and the system may change during the execution of a distributed program.^[12]
Each computer has only a limited, incomplete view of the system. Each computer may know only one part of the input.^[13]

Parallel and distributed computing[edit]

(a), (b): a distributed system.
(c): a parallel system.

Distributed systems are groups of networked computers, which have the same goal for their work. The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them.^[14] The same system may be characterized both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.^[15] Parallel computing may be seen as a particular tightly coupled form of distributed computing,^[16] and distributed computing may be seen as a loosely coupled form of parallel computing.^[6] Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:

In parallel computing, all processors may have access to a shared memory to exchange information between processors.^[17]
In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.^[18]

The figure on the right illustrates the difference between distributed and parallel systems. Figure (a) is a schematic view of a typical distributed system; as usual, the system is represented as a network topology in which each node is a computer and each line connecting the nodes is a communication link. Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links. Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.

The situation is further complicated by the traditional uses of the terms parallel and distributed algorithm that do not quite match the above definitions of parallel and distributed systems (see below for more detailed discussion). Nevertheless, as a rule of thumb, high-performance parallel computation in a shared-memory multiprocessor uses parallel algorithms while the coordination of a large-scale distributed system uses distributed algorithms.

History[edit]

The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in the 1960s.^[19] The first widespread distributed systems were local-area networks such as Ethernet, which was invented in the 1970s.^[20]

ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET e-mail was invented in the early 1970s. E-mail became the most successful application of ARPANET,^[21] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET, and its successor, the Internet, other early worldwide computer networks included Usenet and FidoNet from the 1980s, both of which were used to support distributed discussion systems.

The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985.

Architectures[edit]

Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system.

Distributed programming typically falls into one of several basic architectures: client–server, three-tier, n-tier, or peer-to-peer; or categories:loose coupling, or tight coupling.

Client–server: architectures where smart clients contact the server for data then format and display it to the users. Input at the client is committed back to the server when it represents a permanent change.
Three-tier: architectures that move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are three-tier.
n-tier: architectures that refer typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.
Peer-to-peer: architectures where there is no special machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and as servers.

Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slaverelationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database.^[22]

Applications[edit]

Reasons for using distributed systems and distributed computing may include:

The very nature of an application may require the use of a communication network that connects several computers: for example, data produced in one physical location and required in another location.
There are many cases in which the use of a single computer would be possible in principle, but the use of a distributed system isbeneficial for practical reasons. For example, it may be more cost-efficient to obtain the desired level of performance by using a clusterof several low-end computers, in comparison with a single high-end computer. A distributed system can provide more reliability than a non-distributed system, as there is no single point of failure. Moreover, a distributed system may be easier to expand and manage than a monolithic uniprocessor system.^[23]

Ghaemi et al. define a distributed query as a query "that selects data from databases located at multiple sites in a network" and offer as an SQL example:

SELECT e.name, d.name
FROM company.emp e, company.dept@sales.goods d
WHERE e.deptno = d.deptno^[24]

^{[unreliable source?]}

All that is comes from the mind

전체 페이지뷰

2016년 3월 9일 수요일

Distributed computing

https://en.wikipedia.org/wiki/Distributed_computing

Distributed computing

Contents

Introduction[edit]

Parallel and distributed computing[edit]

History[edit]

Architectures[edit]

Applications[edit]

댓글 없음:

댓글 쓰기