Dataclass fields and generated behavior

You have written a class with __init__, __repr__, __eq__ boilerplate more times than you can count. @dataclass eliminates that busywork. The important technical questions are which methods it synthesizes, what storage model it implies, and what invariants it leaves entirely to you.

Think of a dataclass like a factory machine. You feed it a blueprint (annotations and field() declarations) and it stamps out a class with __init__, __repr__, __eq__, and optional ordering — all generated from the field list. The machine handles the repetitive welding. You still own the design.

Core answer

Use field() when a plain default is not enough: mutable defaults, hidden constructor fields, metadata, comparison control, or factories.

# [OLDER / 3.9, CURRENT - 3.10-3.14] Works on Python 3.9+ [PEP 585]
from dataclasses import dataclass, field
@dataclass
class Batch:
    rows: list[str] = field(default_factory=list)

default_factory runs at instance creation time, which is why it solves the shared-mutable-default problem instead of reproducing it.

Mechanism and generated methods

Dataclasses inspect class annotations and synthesize methods such as:

__init__
__repr__
__eq__
optionally ordering methods
optionally hash behavior depending on eq, frozen, and unsafe_hash

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class Version:
    major: int
    minor: int
    patch: int = 0
print(Version(3, 14) > Version(3, 10))

Ordering follows field order, not domain meaning. If field order does not encode business ordering, order=True can generate deceptively wrong semantics.

frozen=True prevents ordinary attribute assignment through the generated API, but it is not deep immutability and it is not a security boundary.

Storage model and slots

Without slots=True, a normal dataclass instance usually has an instance __dict__. With slots=True, the instance stores attributes in slot descriptors and does not expose a __dict__ unless you add one deliberately.

Measured locally on CPython 3.12.3, 64-bit Linux for a simple two-field dataclass:

plain dataclass instance: 48 bytes plus about 280 bytes for __dict__
slotted dataclass instance: 48 bytes and no instance __dict__

# [CURRENT - 3.10-3.14] Requires Python 3.10+
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
from dataclasses import dataclass
@dataclass
class B:
    x: int
    y: int
@dataclass(slots=True)
class A:
    x: int
    y: int
print(hasattr(B(1, 2), "__dict__"))
print(hasattr(A(1, 2), "__dict__"))
print(sys.getsizeof(B(1, 2)))
print(sys.getsizeof(B(1, 2).__dict__))
print(sys.getsizeof(A(1, 2)))

This is a CPython memory-layout concern, not a language guarantee, but it is one of the main reasons slots=True matters for large object populations.

Version context

Dataclasses were added in Python 3.7 PEP 557. slots=True and match_args=True were added in Python 3.10. weakref_slot=True was added in Python 3.11, and the docs require it to be paired with slots=True. Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.

# [CURRENT - 3.10-3.14] Requires Python 3.10+
from dataclasses import dataclass
@dataclass(slots=True)
class Point:
    x: int
    y: int

Dataclasses also generate __match_args__ by default in Python 3.10+, which makes them participate naturally in structural pattern matching. Keyword-only fields are excluded from __match_args__.

# [CURRENT - 3.10-3.14] Requires Python 3.10+
from dataclasses import dataclass
@dataclass
class Point:
    x: int
    y: int
print(Point.__match_args__)

Field categories

ClassVar marks class attributes that are not dataclass instance fields. InitVar creates an initialization-only parameter that is passed to __post_init__ but not stored as a field.

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
from dataclasses import InitVar, dataclass, field
from typing import ClassVar
@dataclass
class User:
    table: ClassVar[str] = "users"
    name: str
    raw_email: InitVar[str]
    email: str = field(init=False)
    def __post_init__(self, raw_email):
        self.email = raw_email.strip().casefold()

Those distinctions matter because dataclasses are driven by field classification, not just by visible attributes in the class body.

Edge cases and gotchas

Hash behavior is where many production bugs start. Mutable dataclasses should usually not be hashable. unsafe_hash=True exists, but the name is honest: it is only safe when the fields involved in hashing are effectively immutable.

Do not use unsafe_hash=True as a convenience toggle. If hashed instances can mutate, dict and set behavior can become incorrect after insertion.

Field ordering still follows Python's rule that non-default parameters must come before defaulted ones. field(default_factory=...) counts as a defaulted field for that purpose.

Production usage

Use ──────────────────────────────────────────────

__post_init__ for validation and normalization
frozen=True for value objects
slots=True for many small instances after measuring
field(init=False) for derived stored fields

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
from dataclasses import dataclass
@dataclass(frozen=True)
class Port:
    value: int
    def __post_init__(self):
        if not 0 < self.value < 65536:
            raise ValueError("port out of range")

Field annotations connect directly to , but runtime validation remains your responsibility.

Further depth

Core answer

Use field() when a plain default is not enough: mutable defaults, hidden constructor fields, metadata, comparison control, or factories.

# [OLDER / 3.9, CURRENT - 3.10-3.14] Works on Python 3.9+ [PEP 585]
from dataclasses import dataclass, field
@dataclass
class Batch:
    rows: list[str] = field(default_factory=list)

default_factory runs at instance creation time, which is why it solves the shared-mutable-default problem instead of reproducing it.

Mechanism and generated methods

Dataclasses inspect class annotations and synthesize methods such as:

__init__
__repr__
__eq__
optionally ordering methods
optionally hash behavior depending on eq, frozen, and unsafe_hash

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class Version:
    major: int
    minor: int
    patch: int = 0
print(Version(3, 14) > Version(3, 10))

Ordering follows field order, not domain meaning. If field order does not encode business ordering, order=True can generate deceptively wrong semantics.

frozen=True prevents ordinary attribute assignment through the generated API, but it is not deep immutability and it is not a security boundary.

Storage model and slots

Measured locally on CPython 3.12.3, 64-bit Linux for a simple two-field dataclass:

plain dataclass instance: 48 bytes plus about 280 bytes for __dict__
slotted dataclass instance: 48 bytes and no instance __dict__

# [CURRENT - 3.10-3.14] Requires Python 3.10+
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
from dataclasses import dataclass
@dataclass
class B:
    x: int
    y: int
@dataclass(slots=True)
class A:
    x: int
    y: int
print(hasattr(B(1, 2), "__dict__"))
print(hasattr(A(1, 2), "__dict__"))
print(sys.getsizeof(B(1, 2)))
print(sys.getsizeof(B(1, 2).__dict__))
print(sys.getsizeof(A(1, 2)))

This is a CPython memory-layout concern, not a language guarantee, but it is one of the main reasons slots=True matters for large object populations.

Version context

# [CURRENT - 3.10-3.14] Requires Python 3.10+
from dataclasses import dataclass
@dataclass(slots=True)
class Point:
    x: int
    y: int

Dataclasses also generate __match_args__ by default in Python 3.10+, which makes them participate naturally in structural pattern matching. Keyword-only fields are excluded from __match_args__.

# [CURRENT - 3.10-3.14] Requires Python 3.10+
from dataclasses import dataclass
@dataclass
class Point:
    x: int
    y: int
print(Point.__match_args__)

Field categories

ClassVar marks class attributes that are not dataclass instance fields. InitVar creates an initialization-only parameter that is passed to __post_init__ but not stored as a field.

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
from dataclasses import InitVar, dataclass, field
from typing import ClassVar
@dataclass
class User:
    table: ClassVar[str] = "users"
    name: str
    raw_email: InitVar[str]
    email: str = field(init=False)
    def __post_init__(self, raw_email):
        self.email = raw_email.strip().casefold()

Those distinctions matter because dataclasses are driven by field classification, not just by visible attributes in the class body.

Edge cases and gotchas

Do not use unsafe_hash=True as a convenience toggle. If hashed instances can mutate, dict and set behavior can become incorrect after insertion.

Field ordering still follows Python's rule that non-default parameters must come before defaulted ones. field(default_factory=...) counts as a defaulted field for that purpose.

Production usage

Use ──────────────────────────────────────────────

__post_init__ for validation and normalization
frozen=True for value objects
slots=True for many small instances after measuring
field(init=False) for derived stored fields

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
from dataclasses import dataclass
@dataclass(frozen=True)
class Port:
    value: int
    def __post_init__(self):
        if not 0 < self.value < 65536:
            raise ValueError("port out of range")

Field annotations connect directly to , but runtime validation remains your responsibility.

Further depth

Python in Depth

Dataclass fields and generated behavior

Python in Depth