Wednesday, July 23, 2025

Pydantic or Dataclass or Namedtuple or Just a Class with Attributes

 

Introduction

While working for a client, I was given code with the following structure.

There was a SomeResults class

>>> class SomeResults:
...     topic_1: list[str] | None
...     topic_2: list[str] | None
...     topic_3: list[str] | None
...
>>>

There was a some function that that had to return SomeResults

>>> def some_fcn() -> SomeResults:
...     raise NotImplemented()
...
>>>

My task was to create the implementation. There were clear instructions on what the implementation was to output and personal interactions were provided on how to get started.

However, I was puzzled as to how to structure the code. Do I acccess object attributes directly? Do I use a namedtuple? Do I use a dataclass? Do I use Pydantic? We will explore each of these below with their associated pros and cons.

Approach 1: Access Object's Attributes Directly

Let me begin by stating that approach 1 should never be used. It is an anti-pattern.

Approach 1 is included for completeness.

The final code would look something like following

>>> def some_fcn() -> SomeResults:

...     topic_a = ["topic_1x", "topic_1y", "topic_1z"]
...     topic_b = ["topic_2x", "topic_2y", "topic_2z"]
...     topic_c = ["topic_3x", "topic_3y", "topic_3z"]

...     result = SomeResults()
...     result.topic_1 = topic_a
...     result.topic_2 = topic_b
...     result.topic_3 = topic_c

...     return result
...

>>> print(some_fcn().topic_1)
['topic_1x', 'topic_1y', 'topic_1z']

>>> print(some_fcn().topic_2)
['topic_2x', 'topic_2y', 'topic_2z']

>>> print(some_fcn().topic_3)
['topic_3x', 'topic_3y', 'topic_3z']

The computation of the topics is complex and so that would be done separately. This is mimicked by

topic_a = ["topic_1x", "topic_1y", "topic_1z"]

topic_b = ["topic_2x", "topic_2y", "topic_2z"]

topic_c = ["topic_3x", "topic_3y", "topic_3z"]

Once the topics are computed, they are gathered together to create some result.

result = SomeResults()

result.topic_1 = topic_a
result.topic_2 = topic_b
result.topic_3 = topic_c

The advantage of the above approach is that it is quick. No manual implementation of methods like __init__().

The disadvantage of the above approach is that setting individual attributes directly is horrifying. Also, there is no universally expected __init__() constructor.

Let me wrap-up approach 1 by repeating that it should never be used. It is an anti-pattern.

Approach 2: Use Namedtuple

The final code would look something like following

>>> from collections import namedtuple

>>> SomeResults = namedtuple('SomeResults', ['topic_1', 'topic_2', 'topic_3'])

>>> def some_fcn() -> SomeResults:
...     topic_a = ["topic_1x", "topic_1y", "topic_1z"]
...     topic_b = ["topic_2x", "topic_2y", "topic_2z"]
...     topic_c = ["topic_3x", "topic_3y", "topic_3z"]
...     return SomeResults(
...         topic_1=topic_a,
...         topic_2=topic_b,
...         topic_3=topic_c
...     )
...

>>> print(some_fcn().topic_1)
['topic_1x', 'topic_1y', 'topic_1z']

>>> print(some_fcn().topic_2)
['topic_2x', 'topic_2y', 'topic_2z']

>>> print(some_fcn().topic_3)
['topic_3x', 'topic_3y', 'topic_3z']

>>>

The advantage of namedtuples is that they are immutable. This immutability attribute is helpful because you want to combine the the results of each of the topics once at the end. You don't to combine the the results of each of the topics over and over.

The disadvantage of namedtuples is that cannot default values to None. This may seem trivial because can just set the value to None. Unfortunately, this particular client had so many topics that it would be annoying to initially set them all to None.

If you want to brush up on namedtuples, consider using the artilce "Write Pythonic and Clean Code With namedtuple" by Leodanis Pozo Ramos.

Approach 3: Use Dataclass

The final code would look something like following

>>> from dataclasses import dataclass

>>> from typing import List, Optional

>>> @dataclass
... class SomeResults:
...     topic_1: Optional[List[str]]
...     topic_2: Optional[List[str]]
...     topic_3: Optional[List[str]]
...

>>> def some_fcn() -> SomeResults:
...     topic_a = ["topic_1x", "topic_1y", "topic_1z"]
...     topic_b = ["topic_2x", "topic_2y", "topic_2z"]
...     topic_c = ["topic_3x", "topic_3y", "topic_3z"]
...     return SomeResults(
...         topic_1=topic_a,
...         topic_2=topic_b,
...         topic_3=topic_c
...     )
...

>>> print(some_fcn())
SomeResults(topic_1=['topic_1x', 'topic_1y', 'topic_1z'], topic_2=['topic_2x', 'topic_2y', 'topic_2z'], topic_3=['topic_3x', 'topic_3y', 'topic_3z'])

The advantage of using a dataclass is that it is a "natural" fit because SomeResults is a class primarily used for storing data. Also, it automatically generates boilerplate methods.

The disadvantage of dataclasses is that there is no runtime data validation.

If you want to brush up on dataclasses, consider using the artilce "Data Classes in Python 3.7+ (Guide)" by Geir Arne Hjelle.

Approach 4: Use Pydantic

The final code would look something like following

>>> from pydantic import BaseModel

>>> from typing import List, Optional

>>> class SomeResults(BaseModel):
...     topic_1: Optional[List[str]]
...     topic_2: Optional[List[str]]
...     topic_3: Optional[List[str]]
...

>>> def some_fcn() -> SomeResults:
...     topic_a = ["topic_1x", "topic_1y", "topic_1z"]
...     topic_b = ["topic_2x", "topic_2y", "topic_2z"]
...     topic_c = ["topic_3x", "topic_3y", "topic_3z"]
...     return SomeResults(
...         topic_1=topic_a,
...         topic_2=topic_b,
...         topic_3=topic_c
...     )
...

>>> print(some_fcn())
topic_1=['topic_1x', 'topic_1y', 'topic_1z'] topic_2=['topic_2x', 'topic_2y', 'topic_2z'] topic_3=['topic_3x', 'topic_3y', 'topic_3z']

This particular client was processing web pages from the internet and so automatic runtime data validation was needed. This makes Pydantic a natural fit.

A silly con to state would be that Pydantic introduces an external dependency. Actually, it is beyod silly because the point of Python is to have an ecosystem to provide choices. It is not realistic to build systems using just Python's Standard Library.

However, a real con is that there is a higher overhead that arises from the validation.

Also, notice that the original class SomeResults is modified to be a subclass of BaseModel. For this particular client, this is not just a con but a deal breaker. The original class SomeResults cannot be modified.

If you want to brush up on pydantic, consider using the artilce "Pydantic: Simplifying Data Validation in Python" by Harrison Hoffman.

Approach 5: Use Pydantic dataclass Decorator

The final final code would look something like following

>>> from pydantic.dataclasses import dataclass

>>> from typing import List, Optional

>>> @dataclass
... class SomeResults:
...     topic_1: Optional[List[str]] = None
...     topic_2: Optional[List[str]] = None
...     topic_3: Optional[List[str]] = None
...

>>> def some_fcn() -> SomeResults:
...     topic_a = ["topic_1x", "topic_1y", "topic_1z"]
...     topic_b = ["topic_2x", "topic_2y", "topic_2z"]
...     topic_c = ["topic_3x", "topic_3y", "topic_3z"]
...     return SomeResults(
...         topic_1=topic_a,
...         topic_2=topic_b,
...         topic_3=topic_c
...     )
...

>>> print(some_fcn().topic_1)
['topic_1x', 'topic_1y', 'topic_1z']

>>> print(some_fcn().topic_2)
['topic_2x', 'topic_2y', 'topic_2z']

>>> print(some_fcn().topic_3)
['topic_3x', 'topic_3y', 'topic_3z']

>>>

Pydantic dataclass decorator satisfies all the reqirements of the client. It supports runtime data validation. Also, no changes have been made to the original class SomeResults.

Summary

As show above, there are many ways to ensure that some function returns a specific type of output. It is realized that the 5 approaches are not a thorough listing of all the possible approaches. However, they are illustrative and there are length constraints imposed by people casually reading blog posts.

We started out with "Approach 1" which is the simplest. We then used namedtuples. Unfortunately, this particular client could not use them because they needed the ability to default values to None. This forced us to move on to dataclasses. However, this particular client needed runtime data validation and so Pydantic was needed. We still did not meet the client's requirements because we modified the orig class SomeResults. We then used Pydantic's dataclass decorator so that we did not have to modify the class SomeResults.

As a side note, if you are ever in a situation where can't modify the code but at the same time you have to modify the code, think decorators.

Would like to end by reiterating that approach 1 should never be used. It is an anti-pattern.