UNSW IoT traffic data with packets, flows, and protocols

Wannigama, Savindu 1 ; Sivanathan, Arunan 2 ; Habibi Gharakheili, Hassan 2

Published Aug 29, 2025 on Dryad. https://doi.org/10.5061/dryad.w0vt4b94b

Data files

Aug 29, 2025 version files 28.47 GB

device_pcap_summary.csv

1.84 KB
flows.zip

95.51 MB
pcaps.zip

13.92 GB
protocol_diversity_prevalence.csv

384 B
protocols.zip

186.96 MB
README.md

11.19 KB
scripts.zip

34.38 MB
UNSW-IoTraffic.zip

14.24 GB

Abstract

The UNSW IoT traffic data (UNSW-IoTraffic) is a dataset comprising (a) raw network packet traces with full headers and payload, (b) flow-level metadata summarizing fine-grained bidirectional activity behaviors, and (c) protocol parameters describing network protocol characteristics. The dataset also includes scripts written in Java for flow extraction and protocol matching using protocol data models, along with data models for six dominant protocols (TLS, HTTP, DNS, DHCP, SSDP, and NTP). The dataset contains 95.5 million packets of IoT communications captured over 203 days, organized into 27 per-device packet capture (PCAP) files. Derived flow data, categorized based on the 5-tuple attributes (source IP address, destination IP address, protocol number, source port number, destination port number), are provided as 27 per-device CSV files. Additionally, protocol-specific parameters for 70% flows are extracted into a total of 450 CSV files across 27 device types, covering 25 protocols, each with request and response data. Our dataset's three-level structure—containing packets, flows, and protocols—supports a diverse range of users, from students learning data networking concepts to experienced researchers and industry professionals. It enables the behavioral analysis of consumer IoT devices, the detection of anomalies, and the validation of protocols.

Dataset DOI: 10.5061/dryad.w0vt4b94b

UNSW-IoTraffic is a multi-resolution network traffic dataset of consumer IoT devices captured from a lab testbed. It includes device-specific raw PCAPs (full headers and payloads), flow-level CSVs (bidirectional 5-tuple flows with statistics), and protocol-parameter CSVs (request/response attributes for selected protocols). The capture covers 27 devices, spans ≈203 days of operation (setup, idle, and interactions), and totals 95,543,405 packets, 4,944,041 flows, and ≈26.9 GB of PCAPs. All timestamps are recorded in UTC.

Description of the data and file structure

We provide five ZIP archives so you can fetch only what you need:

UNSW-IoTraffic.zip (root)
pcaps.zip – 27 device-specific raw packet traces (.pcap)
flows.zip – 27 device-specific flow-level CSVs
protocols.zip – protocol models/ and parameters/
scripts.zip – reusable analysis utilities (each folder has its own README)

Download options: UNSW-IoTraffic.zip (root) contains the entire dataset, including pcaps/, flows/, protocols/, and scripts/. The individual archives—pcaps.zip, flows.zip, protocols.zip, and scripts.zip—are duplicate subsets provided for convenience so users can download only the parts they need.

Structure of the UNSW-IoTraffic dataset:

UNSW-IoTraffic/
├─ pcaps/
│  ├─ AmazonEcho_44650d56ccd3.pcap
│  ├─ AwairAirQuality_70886b100fc6.pcap
│  └─ ...
├─ flows/
│  ├─ AmazonEcho_44650d56ccd3_flows.csv
│  ├─ AwairAirQuality_70886b100fc6_flows.csv
│  └─ ...
├─ protocols/
│  ├─ models/
│  │  ├─ DHCP_model.json
│  │  ├─ HTTP_model.json
│  │  └─ ...
│  └─ parameters/
│     └─ <DeviceName>_<DeviceMAC>/
│        ├─ requests/
│        │  ├─ dhcpAttributes.csv
│        │  ├─ httpAttributes.csv
│        │  └─ ...
│        └─ responses/
│           ├─ dhcpAttributes.csv
│           ├─ httpAttributes.csv
│           └─ ...
└─ scripts/
   ├─ random-forest-classifier/
   ├─ state-diagrams/
   └─ summaries/

Three data layers

Raw PCAPs (per device)
- Filename pattern: "<DeviceName>_<DeviceMAC>.pcap"
- Contents: full packet headers and payloads (multiple protocols).
- Use cases: packet-level inspection, custom parsing, reproducible derivations.

Traffic was captured pre-NAT on the LAN side of a TP-Link Archer C7 v2 (OpenWrt) gateway using tcpdump, with packet timestamps sourced from the gateway clock.

Flow-level metadata (per-device CSV)
- Filename pattern: "<DeviceName>_<DeviceMAC>_flows.csv"
- Flows are bidirectional 5-tuple groupings (src/dst IP, protocol, src/dst port).
- Packets are aggregated into the same flow while the bidirectional inter-arrival gap in either direction is ≤ 120 s.
- The unidirectional side that sends the first packet is the initiator; fields use src*/dst* prefixes accordingly.
Protocol parameters (per device / protocol / direction CSVs)
- Foldering: protocols/parameters/<Device>_<MAC>/<requests|responses>/
- Filename per protocol: "<protocol>Attributes.csv"
- Each row corresponds to a single flow labeled with that protocol; request (initiator) and response (responder) parameters are stored separately.
- Join key back to the flow CSV:
  (srcMac, dstMac, ethType, srcIp, dstIp, ipProto, srcPort, dstPort, flowSeqNum).

Device-level packet capture summary

The device_pcap_summary.csv file provides an overview of per-device PCAP traces for 27 IoT device types. It contains six columns: Device Name, Device MAC Address, First Seen (date), Last Seen (date), Number of Packets, and Number of Flows.

Protocol diversity and prevalence

The protocol_diversity_prevalence.csv file provides a summary of the prevalence of 25 specific application-layer protocols across 27 IoT device types. It contains three columns: Protocol, Number of Devices, and Number of Flows.

Field definitions in CSV files

Table 1 — Data fields in flow-level CSV files (per device).

Column	Description	Units / Type
`time`	Timestamp of the first packet in the flow (UTC).	seconds since epoch (float/int)
`srcMac`, `dstMac`	MAC addresses of initiator (`src`) and responder (`dst`).	string
`ethType`	Ethernet type (e.g., `0x0800` = IPv4).	hex string
`srcIp`, `dstIp`	IP addresses of initiator and responder.	string
`ipProto`	IP protocol number (e.g., `6`=TCP, `17`=UDP).	integer
`srcPort`, `dstPort`	Transport-layer ports (initiator/responder).	integer
`flowSeqNum`	Sequential index of the flow within the device capture.	integer
`srcNumPackets`, `dstNumPackets`	Packet counts by direction.	integer
`srcPayloadSize`, `dstPayloadSize`	Total payload bytes by direction.	bytes (integer)
`srcAvgPayloadSize`, `dstAvgPayloadSize`	Mean payload size by direction.	bytes (float)
`srcMaxPayloadSize`, `dstMaxPayloadSize`	Maximum payload size by direction.	bytes (integer)
`srcStdDevPayloadSize`, `dstStdDevPayloadSize`	Std. dev. of payload size by direction.	bytes (float)
`flowDuration`	Time between first and last packets in the flow.	seconds (float)
`srcAvgInterarrivalTime`, `dstAvgInterarrivalTime`, `avgInterarrivalTime`	Average inter-arrival times (per direction and overall).	seconds (float)
`srcStdDevInterarrivalTime`, `dstStdDevInterarrivalTime`, `stdDevInterarrivalTime`	Std. dev. of inter-arrival times (per direction and overall).	seconds (float)
`protocol`	Dominant application-layer protocol inferred for the flow.	string

Table 2 — Data fields in protocol-parameter CSV files (per device / protocol / direction).

Column	Description	Units / Type
`time`, `srcMac`, `dstMac`, `ethType`, `srcIp`, `dstIp`, `ipProto`, `srcPort`, `dstPort`, `flowSeqNum`	Identify and join back to the flow-level row.	(see above)
`<parameters>`	Protocol-specific attributes extracted from payloads for the given direction.	string

Special values and missing data

* — literal asterisk used in the fields:
- ipProto, srcIp, dstIp: * indicates the flow’s packets have no IP layer.
- srcPort, dstPort: * indicates the flow uses a protocol without transport-layer ports (may or may not have an IP layer).
null — literal string used in the fields:
- srcMac, dstMac: null indicates the Ethernet header did not contain source/destination MAC addresses for packets in that flow.

Note: * and null are not wildcards; they are explicit placeholders to mark “not applicable / not present.”

Protocol labeling summary

Flows are labeled by matching payload/sequence characteristics against protocol data models covering 55 protocol types.
Across the dataset, 25 distinct protocols are detected in labeled flows.
The protocols/models/ folder includes JSON models used for parameter extraction (e.g., TLS, HTTP, DNS, DHCP, SSDP, NTP).

How a new user might get started

PCAPs: open in Wireshark/tshark, or parse with Python (e.g., Scapy).
Flow CSVs: read into your data environment of choice (e.g., pandas) and filter by device or protocol.
Protocol-parameter CSVs: join to the corresponding flow rows using the key shown above to enrich flows with request/response attributes.

Traces include interactions with devices and autonomous background activities. No ground-truth annotations of events or interactions are provided.

Citation information

S. Wannigama, A. Sivanathan, and H. Habibi Gharakheili, "Descriptor: UNSW IoT Traffic Data with Packets, Flows, and Protocols (UNSW-IoTraffic)", IEEE Data Descriptions, Aug 2025. DOI: 10.1109/IEEEDATA.2025.3602010

Code/Software

Reusable analysis tools are included under scripts/ and each subfolder contains its own README with details and usage:

random-forest-classifier/ — baseline device fingerprinting using flow-level features (results and plots in that folder’s results/).
state-diagrams/ — notebook to generate per-device state-transition diagrams from flow sequences.
summaries/ — notebooks to compute device-level flow counts and per-protocol prevalence.

Typical tools to work with the data (non-exhaustive):

PCAPs: Wireshark / tshark, or Python libraries (e.g., Scapy) for custom parsing.
CSVs: Any data-analysis environment (e.g., Python pandas).

For installation, usage steps, and outputs, refer to the READMEs inside each scripts/ subfolder.