Adventure42

the answer to the ultimate question of life, the universe, and everything

Traffic Flow Attributes

04 Aug 2022 » trafficClassification


Traffic Flow Attributes

주요 terms from “Tools, Data, and Flow Attributes for Understanding Network Traffic without Payload” (Furlong, 2007)

traffic session: transfer of data from the perspective of the application

traffic flow: network traffic generated by network session (more focused on perspective of one intercepting such traffic)

traffic session and flow 둘다 refer to data sent across a network in the course of an application의 operation

latency: “normal” amount of time taken for a packet to reach node B from node A (mean of time measurements)

jitter: variations in the amount of time taken (std variation of time measurements)

client-server architecture: The applications that we consider employ the client-server architecture, in which one node, the server, advertises a service that it provides, and the other node, the client, connects to the server and uses the service.

client-side vs. server-side traffic: Client-side traffic is traffic sent by the client to the server, and server-side traffic is traffic sent by the server to the client

Networked applications:

  1. SMTP: Simple Mail Transport Protocol

    is registered as using port TCP/25

  2. HTTP: HyperText Transfer Protocol

    is registered as using port TCP/80

  3. POP3: Post Office Protocol version3

    designed to allow e-mail clients used directly by a human user, to interface with mailhosts responsible for forwarding mail across an internetwork (via a protocol such as SMTP)

    is registered as using port TCP/25

  4. FTP: File Transfer Protocol (subdivided into FTP-control and FTP-data)

    FTP session: consists of two communication channels - control channel & data channel - which are used for diff purposes and are expected to exhibit entirely diff behaviours

    FTP-control is registered as using port TCP/21, and FTP-data as using port TCP/20

  5. Telnet

    provide interface between terminal devices and processes such as terminals slaved to a mainframe and the mainframe terminal process (e.g., in the Internet, it is often used for remote access to a command shell)

    is resgistered as using port TCP/23

flow attributes that are often used in traffic classification:

statistical values of inter-packet delay, inter-arrival delays, packet length, total data volume in bytes, duration, number of packets, packet directions. Theses attributes can be grouped as the following:

  • timing-based attributes
  • measurements based on packet lengths - general measurements such as averages or heuristics based on packet lengths
  • measurements based on data and packet volumes
  • heuristics based on packet sizes and flags

sub-flow attributes used in traffic classification:

these attributes deal with application behaviors rather than dealing with application holistically (e.g., use a Naive Bayes classifier to attempt to identify traffic from a 3D networked game based on a sliding window of the most recent packets)

how to aggregate packets into flows:

note: use network traffic to identify application behaviours, then use these bahviours to identify which application is in operation

applications are not necessarily homogeneous: a single application can exhibit different behaviours when performing different tasks

a classifier trained against the beginning of a traffic flow is better at identifying traffic from beginning of another flow than the middle of the flow

aggregate packets into a logically meaningful unit - sequence of packets using the same transport layer protocol and having the same two endpoints, where an endpoint is a network layer address paired with a transport layer protocol

flow is:

  • transmitted between port a on node A and port b on node B.

  • split into two half-flows (or directions) - forward and reverse directions - value of attributes are often quite different between the two directions of the flow
  • 64 seconds of “timeout” - no packets is considered to be separate flows divided by this gap of 64 seconds

application behaviour distortions:

e.g., fragmentation of network data by transport layer protocols (e.g., Transmission Control Protocol (TCP)) that takes a stream data given to it for transmission across a network and break it up into smaller chunks (often less than 1500 bytes to accomodate Ethernet’s maximum Transmission Unit(MTU)), often 1460 bytes (allowing 40 bytes for IP and TCP headers). This leads to sequences of consecutive packets each carrying 1460 bytes of payload <– this is an effect of the TCP protocol, but the underlying application behaviour

flow attributes:

  • list of attributes:

    • timing attributes

      • possible issues - distortion due to network latency and jitter (distortions are sometimes mentioned as a possible explanation for why ML algorithms tend to prefer non-timing-related flow attributes over timing-related ones.)

        1. duration of flow:

        2. arrival time of a packet : time that the monitor finished receiving the last byte of that packet

        3. inter-packet delays: amount of time passed from arrival of one packet to the arrival of the next packet in its natural context

          two types of inter-packet delays: “unidirectional”(where the natural context is the half-flow i.e., “time_delta” is the length of time since the last packet in the same direction) and “bidirectional”(where the natural context is the flow i.e., “time_delta” is the length of time since the last packet in either direction)

          unidirection만 고려하면, may reveal more about activities of the application generating the traffic, and not be distorted as strongly by network congestion and similar effects

          one example would be “mean delay between packets” - this attribute can be meaningful when a single large delay between packets drastically change the mean inter-packet delay and make a flow much more difficult to classify

        4. inter-packet delay variability : measure of how widely varied the inter-packet delays are within a flow (in previous works, this metric was used to distinguish between FTP-data and streaming media, using a metric that is based on std dev of inter-packet delays divided by the mean inter-packet delay in bidirectional flow)

    • packet lengths (in units of bytes)

      1. mean packet length

      2. mean payload length

      3. mean nonempty payload length (nonempty packets: packets that contain some application layer data)
      4. variability in packet lengths (e.g., standard deviation of packet size)
    • data volume : deal with amount/volume of data in a network flow and data rates

      • possible issue: how do u know which bytes to count? it can sometimes be necessary to consider all of the data from the network-layer up, particularly if there is some sort of encrypted tunneling mechanism being used that obscures the transport-layer header and thus prevents the calculation of the payload length of the packet

      • two parameters to consider in defining data volume attributes: time granularity & data of interest

        time granularity - length of each interval over which the data rate is calculated (every second? or per 5 seconds?)

        data of interest - at what layer we are measuring the data (data rate as calculated at the IP layer will be different from the data rate calculated in terms of TCP payload(the amount of data in payload portions of TCP packets))

        1. number of packets

        2. number of packets with payload

        3. total volume of data sent (number of bytes sent)

        4. total volume of payload sent (number of bytes sent)

        5. mean payload sent per unit time (mean data rate)
        6. directionality of data

        7. ratio of data sent fwd to reverse directions
    • packet proportions heuritstic attributes :

      1. small packet heuristics (e.g. small packet heuristics are useful for identifying command-shell interactive traffic)

        this attribute indicates how much of the consecutive small packet activity appears, by timing, to be interactive human driven (not machine driven) activity

        distinguish applications that send mostly small non-empty packets (such as interactive applications) from those that send many empty packets as well as small packets (such as some machine driven applications)

        If there are neither empty nor small packets in the flow, we once more consider that to be a lack of evidence of interactivity

      2. large packet heuristics

        fragmentation is a notable issue with large packets - particularly at the transport layer. fragmentation leads to large chunks of data sent by an application being split into many packets of the same max size - may indicate the fact that large chunks of data are being sent, but may be difficult to analyze sizes of the chunks of data actually being sent by an application

      3. properties of packets with flag X

        TCP packet flags are used by TCP to signal control information to the TCP implementation at the remote end (could potentially reflect the way in which the application is using TCP)

        packets with specified flag (e.g., ACK, PSH, etc)

        e.g., a TCP packet’s header contains an octet, each bit of which has some significance. can refer to that octet as flags (for instance, ACK or acknowledgement flag)

  • sometimes flow attributes are also affected by peripheral effects:

    • load on the host sending the traffic
    • congestion in the network, which can alter the timing characteristics of the traffic
    • fragmentation of large messages at the TCP layer, which can alter packet length characteristics

filtering to get packet aggregates

P{condition to filter packets from a packet aggregate}

예시) P{transport.len == 20} defines a packet aggregate P 0 consisting of the packets in the packet aggregate P in which the transport-layer packet is exactly 20 bytes long (such a 20-byte TCP header with no options and no payload, or a UDP packet with its 8-byte header followed by 12 bytes of payload).



References

  1. “Tools, Data, and Flow Attributes for Understanding Network Traffic without Payload” (Furlong, 2007)