Attack data for Network Flood for medium and heavy duty vehicles
Abstract
Dataset DOI: https://doi.org/10.5061/dryad.q2bvq83xj
1. Dataset Overview
This dataset contains Controller Area Network (CAN) traffic logs captured during a network flooding (Denial of Service, DoS) attack on an SAE J1939 bus in a medium-duty vehicle.
The dataset demonstrates how high-frequency message injection can dominate arbitration on the CAN bus, leading to degraded or disrupted communication between Electronic Control Units (ECUs).
The data is intended for:
- Intrusion detection system (IDS) development
- Forensic analysis of vehicular networks
- Reverse engineering of CAN/J1939 traffic
- Research in cyber-physical system security
2. File Structure
The dataset consists of the following file:
dos.csv– Saleae Logic Analyzer decoded CAN traffic during a network flooding attack
All files are provided in comma-separated values (.csv) format.
3. Data Acquisition
- Hardware: Saleae Logic Pro
- Software: Saleae Logic Viewer (protocol analyzer)
- Vehicle Platform: 2014 Kenworth T270
- Network Protocol: SAE J1939 (29-bit extended CAN identifiers)
The Saleae software was used to capture and decode CAN frames into structured CSV format at the protocol-field level.
4. Column Definitions
Each row represents a single field within a CAN frame, not a complete frame.
| Column Name | Description |
|---|---|
name |
Protocol name identified by Saleae (typically "CAN") |
type |
Indicates the specific component of a CAN frame as decoded by Saleae. Possible values include: identifier_field, control_field, data_field, crc_field, and ack_field. |
start_time |
Timestamp at which the field begins, in seconds from start of capture |
duration |
Duration of the field, in seconds |
identifier |
CAN arbitration ID (29-bit for J1939), represented in hexadecimal (present only in identifier_field) |
extended |
Boolean indicating extended frame format (TRUE = 29-bit ID, FALSE = 11-bit ID) |
num_data_bytes |
Data Length Code (DLC), indicating payload size (0–8 bytes), present in control_field |
data |
Payload byte value in hexadecimal (one byte per data_field row) |
crc |
Cyclic Redundancy Check value for frame integrity (present in crc_field) |
ack |
Acknowledgment flag indicating whether the frame was acknowledged on the bus (present in ack_field) |
5. Units and Conventions
- Time values (
start_time,duration) are expressed in seconds - CAN identifiers are in hexadecimal format
- Payload data is represented as hexadecimal byte values
- Boolean values (
extended,ack) are represented asTRUEorFALSE
6. Data Structure and Frame Reconstruction
Each CAN frame is decomposed into multiple rows, where each row corresponds to a specific field of the frame.
A complete CAN frame consists of:
- One
identifier_fieldrow (arbitration ID) - One
control_fieldrow (data length code) - Multiple
data_fieldrows (one per payload byte) - One
crc_fieldrow - One
ack_fieldrow
To reconstruct a full CAN frame:
- Group consecutive rows starting from an
identifier_fieldrow through the subsequentack_fieldrow - These grouped rows collectively represent a single CAN message
This structure reflects the internal decoding format of the Saleae Logic Analyzer and preserves protocol-level detail.
7. Interpretation Notes
- SAE J1939 uses 29-bit extended identifiers, which encode:
- Priority
- Parameter Group Number (PGN)
- Source Address
- Repeated identifiers and high-frequency transmissions in this dataset correspond to network flooding behavior, where a node dominates bus arbitration.
8. Missing and Empty Values
Some columns contain empty cells due to the structure of protocol-level decoding.
- Fields are only populated when relevant:
identifierappears only inidentifier_fieldrowsdataappears only indata_fieldrowscrcappears only incrc_fieldrowsackappears only inack_fieldrows
- Empty cells indicate:
- The field is not applicable for that row type
- Not missing data or measurement error
- No placeholder values (e.g., "n/a") are used
Users should interpret empty cells as structurally unused fields rather than missing information.
9. Limitations
- Data represents a single experimental setup
- Saleae decoding provides protocol-level abstraction and may omit electrical-layer artifacts
- Frames must be reconstructed from multiple rows for higher-level analysis
10. Suggested Use
This dataset is suitable for:
- Time-series anomaly detection
- CAN traffic reconstruction
- Bus load and arbitration analysis
- Security evaluation of J1939 networks
11. Contact
For questions or clarifications regarding the dataset, please contact the dataset author.
