Difference: Arch (1 vs. 10)

Revision 102010-04-26 - KonstantinosNikas

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Line: 20 to 20

Added:

>
>

Transactional Memory

Revision 92008-04-01 - KonstantinosNikas

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Line: 13 to 13

Simultaneous multithreading (SMT)
Cell Broadband Engine (Cell)
General-purpose computing on graphics processing units (GPGPU)

Added:

>
>

Caches for Chip Multiprocessor Architectures (CMPs)

Relevant Project Activites

Added:

>
>

Caches for CMPs

Revision 82008-03-13 - KorniliosKourtis

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Added:

>
>

One of the major concerns regarding our group's research activity in the field of computer architecture is the exploration and evaluation of modern and emerging architecture designs. Recent developments in microprocessor technology indicate a paradigm shift that is likely to alter the present programming methodologies (see DDJ article

). We aim at the exploration of these new architectures, while focusing especially on multithreaded designs. Some examples of our involvement include:

Typical (Intel Core Duo / Opteron) and Aggressive multicore designs (Niagara)
Simultaneous multithreading (SMT)
Cell Broadband Engine (Cell)
General-purpose computing on graphics processing units (GPGPU)

Relevant Project Activites

<--
 Links 
 
 wikipedia:SMT
  wikipedia:GPGPU
  wikipedia:Cell
  http://cag.csail.mit.edu/ps3/lectures.shtml
 
-->

META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 72008-03-11 - KorniliosKourtis

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Deleted:

<
<

Software Optimization

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited, yet important, class of sparse matrices that contain a small number of distinct values.

Operating Systems

Efficient sharing of block devices over an interconnection network is an important step in deploying a shared-disk parallel file system on a cluster of SMPs. We present gmbock, a client/server system for network sharing of storage devices over Myrinet, which uses an optimized data path in order to transfer data directly from the storage medium to the NIC, bypassing the host CPU and main memory bus. Its design enhances existing programming abstractions, combining the user level networking characteristics of Myrinet with Linux's virtual memory infrastructure, in order to construct the datapath in a way that is independent of the type of block device used. Experimental evaluation of a prototype system shows that remote I/O bandwidth can improve up to 36%, compared to an RDMA-based implementation. Moreover, interference on the main memory bus of the host is minimized, leading to an up to 41% improvement in the execution time of memory-intensive applications.

Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. To overcome the architectural limitation of a low number of outstanding requests in gmblock, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.

META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 62008-03-11 - NikosAnastopoulos

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Revision 52008-03-08 - AnastasiosNanos

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Line: 8 to 8

Operating Systems

Changed:

<
<

Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypasses the CPU and main memory bus. To overcome the architectural limitation of a low number of outstanding requests, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.

>
>

Added:

>
>

META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 42008-03-07 - VasileiosKarakasis

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Software Optimization

Changed:

<
<

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrix that contain a small number of distinct values.

>
>

Operating Systems

Revision 32008-03-06 - ArisSotiropoulos

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Line: 11 to 11

Deleted:

<
<

Interconnects

Publications

M. Athanasaki, E. Koukis, N. Koziris, "Efficient Scheduling of Tiled Iteration Spaces onto a Fixed Size Parallel Architecture," 9th Panhellenic Conference in Informatics, pp.178-192, Thessaloniki, Greece, November 21 23, 2003
M. Athanasaki, E. Koukis, N. Koziris, "Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes," 12th Euromicro Conference on Parallel, Distributed and Network based Processing (PDP '04), pp.424-433, A Coruna, Spain, February 11-13, 2004
Ε. Koukis and Ν. Koziris, "Memory Bandwidth Aware Scheduling for SMP Cluster Nodes," Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP '05), pp. 187-196, Lugano, Switzerland, 6-11 Feb. 2005
Ε. Koukis and Ν. Koziris, "Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs," Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS 2006), pp. 345-354, Minneapolis, MN, USA, 12-15 July, 2006
Ε. Koukis and Ν. Koziris, "Efficient Block Device Sharing over Myrinet with Memory Bypass," Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), p. 29, Long Beach, CA, USA, 26-30 March, 2007
Ε. Koukis, A. Nanos and Ν. Koziris, “Synchronized Send Operations for Efficient Streaming Block I/O over Myrinet,” Proceedings of the Workshop on Communication Architecture for Clusters (CAC 2008), held in conjunction with the 22nd International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, FL, USA, 14-18 April, 2008, to appear

META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 22008-03-04 - ArisSotiropoulos

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Computer Architecture

Line: 10 to 10

Deleted:

<
<

Publications

Deleted:

<
<

\ No newline at end of file

Added:

>
>

Interconnects

Publications

M. Athanasaki, E. Koukis, N. Koziris, "Efficient Scheduling of Tiled Iteration Spaces onto a Fixed Size Parallel Architecture," 9th Panhellenic Conference in Informatics, pp.178-192, Thessaloniki, Greece, November 21 23, 2003
M. Athanasaki, E. Koukis, N. Koziris, "Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes," 12th Euromicro Conference on Parallel, Distributed and Network based Processing (PDP '04), pp.424-433, A Coruna, Spain, February 11-13, 2004
Ε. Koukis and Ν. Koziris, "Memory Bandwidth Aware Scheduling for SMP Cluster Nodes," Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP '05), pp. 187-196, Lugano, Switzerland, 6-11 Feb. 2005
Ε. Koukis and Ν. Koziris, "Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs," Proceedings of the 12th International Conference on Parallel and Distributed Systems (ICPADS 2006), pp. 345-354, Minneapolis, MN, USA, 12-15 July, 2006
Ε. Koukis and Ν. Koziris, "Efficient Block Device Sharing over Myrinet with Memory Bypass," Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), p. 29, Long Beach, CA, USA, 26-30 March, 2007
Ε. Koukis, A. Nanos and Ν. Koziris, “Synchronized Send Operations for Efficient Streaming Block I/O over Myrinet,” Proceedings of the Workshop on Communication Architecture for Clusters (CAC 2008), held in conjunction with the 22nd International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, FL, USA, 14-18 April, 2008, to appear

META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649449" name="01655680.pdf" path="01655680.pdf" size="237494" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="" date="1204649651" name="epy2003.pdf" path="epy2003.pdf" size="459614" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT	attr="h" autoattached="1" comment="Scheduling" date="1204648880" name="01271475.pdf" path="01271475.pdf" size="510858" user="Main.ArisSotiropoulos" version="1"

Revision 12008-03-03 - GiorgosVerigakis

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="WebHome"

Computer Architecture

Software Optimization

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrix that contain a small number of distinct values.

Operating Systems

Publications

View topic | History: r10 < r9 < r8 < r7 | More topic actions...

No permission to view TWiki.WebBottomBar