Slide View : Parallel Computer Architecture and Programming : 15-418/618 Spring 2015

kayvonf

Since this slide doesn't give justice to the animation shown in class, someone might want to summarize cut-through routing here. Even better, give a step-by-step account of how the packet moves through the network, assuming it takes four clocks for the packet to be transmitted over a link. How many total steps are required?

Step 1: Packet part 1 transmitted from 1st to 2nd node (parts 2,3,4 in buffered in node 1, part 1 buffered in node 2)
Step 2: Packet part 2 transmitted from 1st to 2nd node (parts 3,4 in buffered in node 1, parts 1,2 buffered in node 2), node 2 reads routing information from part 1.
Step 3: Packet part 3 transmitted from 1st to 2nd node. (WHAT ELSE HAPPENS IN THIS STEP?)

kayvonf

Question: Why does cut-through reduce to store-and-forward under heavy network contention?

Faust

Under heavy network contention, let's say the 3rd node is not ready to receive and we already start sending the packet, the entire packet will first accumulate in the 2nd node. Then, let's say the contention moves to the destination node. Then, the entire packet will accumulate in the 3rd node. This is basically store-and-forward because we end up moving the entire packet step by step.

ankit1990

I am a little unsure about how we get 5 units of latency from source to destination for cut through routing. Looking at the definition, it appears that we should be transmitting a header as soon as we receive it. Shouldn't it be 3?

jazzbass

@ankit1990 I think we need an extra cycle on each intermediate node to process the header and make decisions about routing before sending the header on the next cycle. I'm not entirely sure though, it would be great if somebody else could confirm this. :)

BryceToTheCore

I agree with @jazzbass. I think it the header is processed as follows:

OPS:

transmitted

read

transmitted

read

transmitted

Each op takes one cycle to complete assuming the header is small enough that it can be transferred entirely in one cycle. Thus the total latency for the header is 5 clock cycles.

Kapteyn

I don't understand why each router must wait until the receiver has enough buffer space to hold the entire packet to begin forwarding. If the point of forwarding is to begin sending without having to receive the entire packet, why is it required to have enough space for the entire packet before a router can begin receiving parts of the packet?

If a receiver senses that its buffer is full, can't it just tell the sender to hold off on sending the following parts of the packet until its buffer frees up? Each router should always try to send as much data as it can to the next router in the path to hide latency.

Transfer of data would still be occurring at the granularity of a packet (unlike wormhole flow control which allows interleaving of flits from different packets) but we would have more latency hiding because we don't have to wait for the buffer to clear up entirely to begin sending parts of the packet and as it clears we can send the remaining parts of the packet.

landuo

Under heavy network contention, packets cannot be sent as soon as the header is received. However, the remaining of the packet is still being sent to the switch that has the header. As a result, the entire packet will end up with staying at the switch, which is the way store-and-forward uses to transfer packets

ESINNG

Well, I'm still confused with cut-through and wormhole method. What's the difference between them?

From my understanding, one is that the wormhole doesn't require the router have enough space to store the entire packet which is needed in cut-through. And cut-through is based on packet and wormhole is based on flit. But since cut-through split the packet into several parts, what's the difference between the part of packet with flit defined in wormhole. Besides, cut-through seems really similar with wormhole, what's the main difference between them?

amaliujia

@ESINNG. I guess because packet is the granularity of transfer in cut-through, parts of a packet is not splitted on purpose. Maybe only because buffer is formed by fixed size blocks, like 4 bytes blocks. But for wormhole, it looks like this time, size of flit can be controlled. Can someone confirm my guess?