DFS Architecture
The DFS layer is a network of significant peers that possess file
hierarchies (or namespaces), file data and raw storage, and share them
according to the authoritative policy set by their owners.
Each peer may link foreign resources into its own resources. This
aggregates distributed resources and makes them accessible along the
hierarchy of a peer's own resources. This works like the links in Web
pages that make foreign content accessible from a single place.
The identity of a peer is defined cryptographically with a public-private
key pair. Peers can authenticate themselves and their communication.
They can also protect their data and their communication with
encryption. The owner of the identity may be a physical person, or
a software entity, or a cooperating group of either. Owners set access policy to
the peer's resources and policy for interaction with other peers.
The DFS provides loose, eventual, consistency semantics. Conflicting
concurrent accesses will not be resolved, but can be detected.
Peers own and completely control resources that are locally available to
them. These resources are globally identified with URIs constructed
from the identity of their owner peers together with a local
identification. Local identification can be a file path or a random
string. These URIs are the references used in peer-to-peer links, or any
other resource designation, making the identity of the peers significant
and the specific peer indispensable.
Returning to the web page analogy, owners can shape their (peers')
resources as freely as one can edit their webpage. Owners can also
link to foreign resources, but as in the web, they cannot control
the resources they link to. This is an important consideration for
the DFS architecture.
This is particularly convenient for creating shared workspaces where
everyone's resources are available for reading while owners only
write their own.
There are two types of resources; filesystem and storage. Filesystem
resources are hierarchical file namespaces including (logically) the
content of files. Storage resources refer to raw disk storage.
It is important to notice that the filesystem resource is higher-level
content-based, whereas storage is a low-level consumable computing resource.
Access to resources is requested with
Actions, which are well
defined communication tokens. Actions always include identification
for both the Authority, who owns the resource to be accessed, and the
Agent, who requests the access, providing accountability throughout the
network. Authorities have different communication endpoints for each type
of resource, called Services; the destination Service for an Action
can either be a Filesystem Service or Storage Service.
A resource URI encodes the Cryptographic Identity of the Authority,
the Authority Service and a Path or Handle for local identification
(for Filesystem or Storage resources, respectively).
Actions can be filtered through a pluggable external filter, a Policy
Enforcement Point (PEP), which communicates with its own policy service
to provide authorisation. Users may associate arbitrary data with
files, that can be included in the action tokens given to the PEP.
The PEP can use the data as credentials.
Aggregation by cross-peer Filesystem links logically brings remote files
in a local directory. Similarly, remote Storage resources can be logically
joined together to a single, aggregated virtual image, a Storage Pool,
which is managed by the VBS subsystem. Storage allocations for new
or expanding files are made from such pools. Every filesystem service
is associated with a primary storage pool. Storage can be offered to
specific peers for use in their filesystems, by registering storage
resources to the specific peer's primary storage pool.
Storage allocation is a distinct function from storage access. Allocation
is performed by the VBS, while access is requested by the DFS Clients
and is served, initially, also by the VBS. The architecture allows
pluggable modules that implement allocation and access for different
storage servers, such as HTTP, FTP,
BitTorrent, or Grid-specific ones.
As Client peers navigate through the network, they may cache remote
resources according to policy. Cached resources remain persistent
for disconnected operation. Also, attempts for remote access can be
configured to be persistently logged so that they can be retried after
network recovery or application restart.
DFS peering mechanisms include primitives for publish-subscribe
communication and remote event notification. The mechanisms are used
to handle the significant asynchrony in the network, which is further
amplified by disconnected operation. Through the DFS Client interface,
applications can also access these primitives.