(the paper ID will be required in the registration process)

Matrix Multiplication on Two Interconnected Processors


This paper presents a new partitioning algorithm to perform matrix multiplication on two

interconnected heterogeneous processors. Data is partitioned in a way which minimizes

the total volume of communication between the processors compared to more general

partitionings, resulting in a lower total execution time whenever the power ratio between

the processors is greater than 3:1. The algorithm has interesting and important

applicability, particularly as the top-level partitioning in a hierarchal algorithm that is to

perform matrix multiplication on two interconnected clusters of computers.

Analytical Network Modeling of Heterogeneous Large-Scale Cluster Systems


The study of the communication networks for distributed systems is very important,

since the overall performance of these systems is often depends on the effectiveness

of its communication network. In this paper, we address the problem of networks

modeling for heterogeneous large-scale cluster systems. We consider the large-scale

cluster systems as a typical cluster of clusters system. Since the heterogeneity is

becoming common in such systems, we take into account network as well as cluster

size heterogeneity to propose the model. To this end, we present an analytical network

model and validate the model through comprehensive simulation. The results of the

simulation demonstrated that the proposed model exhibits a good degree of accuracy

for various system organizations and under different working conditions.

Heterogeneous Computing in Remote Sensing Applications: Current Trends and Future Perspectives

Abstract 

Heterogeneous networks of computers (HNOCs) have rapidly become a very

promising commodity computing solution, expected to play a major role in the design of high

performance computing systems for many on-going and planned remote sensing missions.

Currently, only a few parallel processing strategies for remotely sensed image analysis are

available in the open literature, and most of them assume homogeneity in the underlying

computing platform. This paper develops several highly innovative heterogeneous parallel

algorithms for information extraction from high-dimensional remotely sensed images, with

particular emphasis on target detection and land-cover mapping applications. Analytical and

experimental results are presented in the context of a realistic application, using real data

collected by NASA’s Jet Propulsion Laboratory over the World Trade Center area in New York

after September 11th, 2001. Parallel performance of the proposed heterogeneous algorithms is

discussed using several (fully and partially) heterogeneous networks at University of Maryland,

and a massively parallel Beowulf cluster at NASA’s Goddard Space Flight Center. Combined,

these parts offer a thoughtful perspective on the potential and emerging challenges of applying

heterogeneous computing practices to remote sensing problems.

Open MPI: A High-Performance, Heterogeneous MPI


The growth in the number of generally available, distributed, heterogeneous com-

puting systems places increasing importance on the development of user-friendly tools

that enable application developers to efficiently use these. Open MPI provides sup-

port for several aspects of heterogeneity within a single, open-source MPI imple-

mentation. Through careful abstractions, heterogeneous support maintains efficient

use of uniform computational platforms. We describe Open MPI’s architecture for

heterogeneous network and processor support. A key design features of this imple-

mentation is the transparency to the application developer while maintaining very

high levels of performance. This is demonstrated with the results of several numerical


A 2-Approximation Algorithm for Scheduling Independent Tasks onto a Uniform Parallel Machine and its Extension to a Computational Grid


First, this paper gives a very simple 2-approximation algorithm for scheduling n

independent tasks onto a uniform parallel machine with m processors. Best known results so far

are (1+)-approximation algorithm (0<<=1) in exponential in (1/) time and 2-approximation

algorithm based on LP-rounding technique which runs in O((m+n)3.5) time.

In contrast, the proposed algorithm runs in O(n log n+mn) time. Next, this paper

proves that, if a criterion of a schedule is total computational power consumed by the schedule,

the proposed algorithm is also a 2-approximation algorithm for a uniform parallel machine

such that processor speed varies over time. Such a parallel machine corresponds to a

so-called desktop grid.

JaceP2P: an Environment for Asynchronous Computations on Peer-to-Peer Networks


Using Peer-to-Peer (P2P) networks is a way to federate a large amount of processors in order to

solve large scale scientific problems. Those networks are decentralized, highly dynamic and composed

of heterogeneous machines. The goals of our work is to compute large scale scientific iterative appli-

cations on P2P networks. We propose JaceP2P, a multi-threaded Java based library designed to build

asynchronous parallel iterative applications. Using this library, it is possible to run such applications on

a set of dynamic and heterogeneous machines organized in a decentralized and P2P fashion.

Virtual Structured P2P Network Topology for Distributed Computing


P2P and Grid computing are two paradigms more and more used in today computing

environments; their potential to provide better quality of service to users is very promising compared

to the cost it involves. This paper presents a hierarchical virtual network topology, built on top

of the real existing one and which is used to manage distributed resources in Grid environment.

The distributed resources are found on the Web following P2P techniques; and so they are very

volatile. The virtual topology constructs an efficient and robust virtual machine which will serve as

a distributed computing platform. This topology is called TreeP and it is exploited in DGET [7];

a data-grid middleware environment. Here, we study this virtual topology both theoretically and

experimentally. We show that this topology is very scalable, robust, load-balanced, and easy to

construct and maintain.

A Parallel Algorithm for Solution of the Deconvolution Problem on Heterogeneous Networks


In this work we present a parallel algorithm for solution of a given least squares problem with structured

matrices. This problem arises in many applications mainly related to digital signal processing. The parallel algo-

rithm is designed to speed up the sequential one on heterogeneous networks of computers. The parallel algorithm

follows the HeHo strategy and is implemented with the recently developed HeteroMPI programming environment.

The results obtained validate HeteroMPI as a very useful tool for programming heterogeneous parallel algorithms.

Well balanced sparse matrix-vector multiplication on a parallel heterogeneous system


This paper discusses well balanced implementations of sparse

matrix-vector multiplication on heterogeneous environments. A new heuris-

tic is proposed for balancing the computing load over the processors

proportionally to their power. This is done by defining a redistribution

model which splits the sparse matrix in k-way partitions, in order to mi-

nimize the total execution time. A implementation of the sparse matrix

vector multiplication in heterogeneous environment using parallel ob ject-

oriented programming model POP-C++, shows that this 1D-partitioning

heuristic improve greatly the performances of the product, in comparison

with block row decomposition.

TGrid – Grid runtime support for hierarchically structured task-parallel programs


In this article we introduce a grid runtime

system called TGrid which is designed to run hierar-

chically structured task-parallel programs on heteroge-

nous environments and can also be used for common

component-based grid programming. TGrid is a location-

aware runtime system which means the system keeps track

of the placement of tasks on the grid. This enables better

scheduling strategies and heuristics since the system is able

to determine each processor’s position in the grid and so,

the spatial locality leads to better performance due to less

network overhead.

A Quadratic Self-Scheduling Algorithm for Heterogeneous Distributed Computing Systems


Scheduling algorithms play an important role in heterogeneous computing systems. Development of

new scheduling strategies is an active research field. In this context, we present a general formulation

of the self-scheduling problem, deriving a new, quadratic, self-scheduling algorithm. Initial tests

comparing the performance of the new algorithm against well-established ones are carried out. Thus,

working at the application level, we allocate sets of several thousand tasks in an Internet-based Grid

of computers that involves a transatlantic connection. In all the tests, the new algorithm performs

better than the previous ones.

Self-Adapting Scheduling for Tasks with Dependencies in Stochastic Environments


This paper refers to dynamic load balancing algorithms for non-dedicated heteroge-

neous clusters of workstations. We propose an algorithm called Self-Adapting Scheduling

(SAS), targeted at nested loops with dependencies in a stochastic environment. This means

that the load entering the system, not belonging to the parallel application under execution,

follows an unpredictable pattern which can be modeled by a stochastic process. SAS takes

into account the history of previous timing results and the load patterns in order to make

accurate load balancing decisions. We study the performance of SAS in comparison with

DTSS. We established in previous work that DTSS is the most efficient self-scheduling

algorithm for loops with dependencies on heterogeneous clusters. We test our algorithm

under the assumption that the interarrival times and lifetimes of incoming jobs are expo-

nentially distributed. The experimental results show that SAS significantly outperforms

DTSS especially with rapidly varying loads.

A Framework for Adaptive Communication Modeling on Heterogeneous Hierarchical Clusters


Today, due to the wide variety of existing parallel systems consisting on collections of

heterogeneous machines, it is very difficult for a user to solve a target problem by using a

single algorithm or to write portable programs that perform well on multiple

computational supports. The inherent heterogeneity and the diversity of networks of such

environments represent a great challenge to model the communications for high

performance computing applications. Our objective within this work is to propose a

generic framework based on communication models and adaptive techniques for dealing

with prediction of communication performances on based-clusters hierarchical

platforms. Toward this goal, we introduce the concept of poly-model of communications

that corresponds to techniques to better model the communications in terms of the

characteristics of the hardware resources of the target parallel system. We apply this

methodology on collective communication operations and show that the framework

provides significant performances while determining the best combination model-

algorithm depending on the problem and architecture parameters.