Tuesday, April 13, 2021

Python: Nested Dictionaries vs Dictionary Plus a Dataclass

The use case involves reading user attributes like name, status, and gender given the associated user UUID. This is a write-once and read many times scenario. Also, it is assumed that all the data will fit into memory.

Nested Dictionaries

The nested dictionary creation will look something like the following:

nested_dict_approach = {  
    '19acc7df-9c8b-11eb-9022-cc2f71aeb20c' : {
        'name': 'fred',
        'status': 'active',
        'gender': 'M' 
    },
    '19acc7df-9c8b-11eb-9022-cc2f71aeb20d' : {
        'name': 'barney',
        'status': 'inactive',
        'gender': 'M' 
    },
    '19acc7df-9c8b-11eb-9022-cc2f71aeb20e' : {
        'name': 'wilma',
        'status': 'unknown',
        'gender': 'F' 
    },
}

The constant repetition of the nested key names is annoying and results in verbose and error-prone code.

To access a particular user via a UUID is straightforward.

nested_dict_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c']

{'name': 'fred', 'status': 'active', 'gender': 'M'}

The downside of the above output is that it is just a list of attributes with no unifying principle.

To access a particular attribute of a user via a UUID is also straightforward.

nested_dict_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c']['name']

'fred'

Dictionary Plus a Dataclass

The dictionary plus a dataclass creation will look something like the following.

from dataclasses import dataclass

@dataclass
class User:
    name: str
    status: str
    gender: str

user_1 = User('fred', 'active', 'M')
user_2 = User('barney', 'inactive', 'M')
user_3 = User('wilma', 'unknown', 'F')

dict_plus_data_class_approach = {  
    '19acc7df-9c8b-11eb-9022-cc2f71aeb20c' : user_1,
    '19acc7df-9c8b-11eb-9022-cc2f71aeb20d' : user_2,
    '19acc7df-9c8b-11eb-9022-cc2f71aeb20e' : user_3,
}

The constant repetition of the nested key names is eliminated. The other benefit is that we can specify the data types of each of the attributes.

Accessing a particular user via a UUID is the same.

dict_plus_data_class_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c']

User(name='fred', status='active', gender='M')

However, notice that the output now is not just a list of attributes. The attributes are organized into a User.

To access a particular attribute of a user via a UUID just use a dot (".") as opposed to square brackets ("[]").

dict_plus_data_class_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c'].name

'fred'

Summary

In summary, if you are using nested dictionaries, stop and consider using a dictionary combined with a dataclass.

4 comments:

  1. Via a private email, it was suggested that a performance comparison be made of the two approaches. Unfortunately, it is not a priority for me at this point in time. Perhaps someone else would be interested in doing that?

    ReplyDelete
  2. If you are interested in

    namedtuple vs Dictionary vs Data Class vs typing.NamedTuple

    check out the link below

    https://realpython.com/python-namedtuple/#using-namedtuple-vs-other-data-structures

    ReplyDelete
  3. This medium post does a performance analysis between namedTuples, objects, and data classes. Nested dicts not included, though.

    https://medium.com/@jacktator/dataclass-vs-namedtuple-vs-object-for-performance-optimization-in-python-691e234253b9

    ReplyDelete
  4. I believe the comparison can become more specific when we consider the specific use cases. For me a dict is usually good when we need search of constant complexity. Thus if there are many nested objects and we want fast search, it is a good option.
    For the nested object, if it also includes many items of the same type, then a nested dict might be still the optimal; for the case here where it is contains heterostructural items and each has a "name" of distinct property, then a dataclass, namedtuple, or enum is a better choice. Two cents.

    ReplyDelete