Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

paluri

In a system like this, how would the nodes work in conjunction with each other to determine which paths different messages should take in order to maximize the number of concurrent messages? Does it simply have to take a more greedy approach of trying to find a free path?

ChandlerBing

@paluri I think it would also depend on the routing scheme, and not just the topology.

oulgen

You can read http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=277793. This paper demonstrates and analyses how 2D Mesh interconnect networks do routing. It also does a cost analysis of different strategies.

yuel1

Can someone reiterate the argument in class as to why this is blocking?

oulgen

@yuel1 Draw an S in the middle starting from the bottom and going up. Nothing from the left can communicate with anything from the right.

yuel1

@oulgen I recall it had something to do with the bisection bandwidth not being nearly enough to satisfy conditions need to be non-blocking.

I wonder, is there a general approach to determining whether a topology is blocking or non-blocking?

Kapteyn

In class Michael mentioned that reassembling the packets to be in the order that they were sent can be a headache. One way to reassemble packets correctly is to have a header on each packet dictating the total number of packets in the message and this packet's order number. Then the receiving node can tell A) when it's done receiving packets from the sending node and B) the correct order of those packets.

kamladi

Why is the cost O(N)? For each node you add, you only have to add a constant number of links (max 4). Therefore, wouldn't the cost be O(1)?

yuel1

@kamladi we have O(N) nodes, therefore we have O(4*N) links thus O(N)

Kapteyn

@kamladi I believe the O(N) cost is referring to the cost of constructing a mesh with N nodes. So yes, the cost per node is a constant.

Kapteyn

Haha I guess I should have obtained a lock on this comment stream, to have avoided a duplicate response to @kamladi's question.

dumbo_

We place a node that connects a lot in the middle because:
1. more bandwidth is needed in the middle area.
2. this makes the average distance from everyone lower.

Mesh structure maps well to the chip because of its typical 2d nature.

Also path diversity is good because:
1.can increase effective bandwidth between nodes
but here, if the routing from node A -> node B is not deterministic, this may cause some implementation headache because you cannot assume packets' arrival order = send order =_=|||
2.if one link breaks, we can route around it >u<

rramo

How is the average latency computed to be O(sqrt(N))?

Kapteyn

@rramo

Since we want a big O value, let's look at the average latency of any corner node, since the corner nodes have the worst average latency to all other nodes in the mesh (we discussed in class how nodes that lie closer to the middle of the mesh have better average latency).

Say the width and height of the mesh is N so there are N^2 nodes. The sum of the path lengths from a corner node to all other nodes in the mesh is:

// sum of path lengths to all nodes in farthest row:
N-1 + [(N-1) + (N-2) + (N-3) + ... + 1 + 0]

// sum of path lengths to all nodes in second farthest row:
(N-2) + [(N-1) + (N-2) + (N-3) + ... + 1 + 0]

// sum of path lengths to all nodes in third farthest row:
(N-3) + [(N-1) + (N-2) + (N-3) + ... + 1 + 0]

... etc

// sum of path lengths to all nodes in the corner node's own row:
0 + [(N-1) + (N-2) + (N-3) + ... + 1 + 0]

The sum [(N-1) + (N-2) + (N-3) + ... + 1 + 0] is O(N^2) and if we sum the path lengths to all nodes in all rows, we see that this series occurs in the sum N+1 times so the sum of all path lengths to all nodes from the corner node is O(N^3). Divide this sum by the total number of nodes, N^2 to obtain O(N) as the average latency of the corner node. Since there are N^2 nodes, O(N) is big-O of the square root of the total number of nodes (note that I'm using N^2 as the total number of nodes here so the average latency is O(N) and not O(sqrt(N)).

toothless

I'm not sure about the exact definition of blocking, so I looked this up:

According to http://web.mit.edu/6.263/www/l10.pdf page 4, there are two types of (non-)blocking. Strictly non-blocking means that you're allowed to choose adversarial routes to block future routes. More formally, a network is NOT strictly non-blocking if there exists an integer k and k pairs of switches such that if you (adversarially) route the first k-1 pairs to each other, then the last pair becomes blocked.

In wide-sense (WS) non-blocking, you can no longer be the adversary. Formally, a network is WS non-blocking if for any integer k, for any k pairs of switches, there EXIST k edge-disjoint routes that connect each of the k pairs.

Obviously, strict non-blocking implies WS.

The 2D mesh above is not WS non-blocking, because if you select any 9 nodes from the upper half and any 9 nodes from the bottom half, it is impossible to route all 9 from the upper half to all 9 from the lower half, since the graph only has "width" 8. More rigorously, there is a cut of size 8 that cuts the upper graph from the lower graph, so by max-flow-min-cut, the max flow from the top 9 nodes to the bottom 9 nodes is at most 8. So one pair must always be left out.

Of course, since it is not WS non-blocking, it is not strict non-blocking either.

abist

@rramo the way I think about it is by thinking about the mesh as a square with N nodes on each side and average latency being the diagonal length (because it's symmetric) which is 2sqrt(N) or O(sqrt(N))