The use case involves reading user attributes like name, status, and gender given the associated user UUID. This is a write-once and read many times scenario. Also, it is assumed that all the data will fit into memory.
Nested Dictionaries
The nested dictionary creation will look something like the following:
nested_dict_approach = { '19acc7df-9c8b-11eb-9022-cc2f71aeb20c' : { 'name': 'fred', 'status': 'active', 'gender': 'M' }, '19acc7df-9c8b-11eb-9022-cc2f71aeb20d' : { 'name': 'barney', 'status': 'inactive', 'gender': 'M' }, '19acc7df-9c8b-11eb-9022-cc2f71aeb20e' : { 'name': 'wilma', 'status': 'unknown', 'gender': 'F' }, }
The constant repetition of the nested key names is annoying and results in verbose and error-prone code.
To access a particular user via a UUID is straightforward.
nested_dict_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c'] {'name': 'fred', 'status': 'active', 'gender': 'M'}
The downside of the above output is that it is just a list of attributes with no unifying principle.
To access a particular attribute of a user via a UUID is also straightforward.
nested_dict_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c']['name'] 'fred'
Dictionary Plus a Dataclass
The dictionary plus a dataclass creation will look something like the following.
from dataclasses import dataclass @dataclass class User: name: str status: str gender: str user_1 = User('fred', 'active', 'M') user_2 = User('barney', 'inactive', 'M') user_3 = User('wilma', 'unknown', 'F') dict_plus_data_class_approach = { '19acc7df-9c8b-11eb-9022-cc2f71aeb20c' : user_1, '19acc7df-9c8b-11eb-9022-cc2f71aeb20d' : user_2, '19acc7df-9c8b-11eb-9022-cc2f71aeb20e' : user_3, }
The constant repetition of the nested key names is eliminated. The other benefit is that we can specify the data types of each of the attributes.
Accessing a particular user via a UUID is the same.
dict_plus_data_class_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c'] User(name='fred', status='active', gender='M')
However, notice that the output now is not just a list of attributes. The attributes are organized into a User.
To access a particular attribute of a user via a UUID just use a dot (".") as opposed to square brackets ("[]").
dict_plus_data_class_approach['19acc7df-9c8b-11eb-9022-cc2f71aeb20c'].name 'fred'Summary
In summary, if you are using nested dictionaries, stop and consider using a dictionary combined with a dataclass.
Via a private email, it was suggested that a performance comparison be made of the two approaches. Unfortunately, it is not a priority for me at this point in time. Perhaps someone else would be interested in doing that?
ReplyDeleteIf you are interested in
ReplyDeletenamedtuple vs Dictionary vs Data Class vs typing.NamedTuple
check out the link below
https://realpython.com/python-namedtuple/#using-namedtuple-vs-other-data-structures
This medium post does a performance analysis between namedTuples, objects, and data classes. Nested dicts not included, though.
ReplyDeletehttps://medium.com/@jacktator/dataclass-vs-namedtuple-vs-object-for-performance-optimization-in-python-691e234253b9
I believe the comparison can become more specific when we consider the specific use cases. For me a dict is usually good when we need search of constant complexity. Thus if there are many nested objects and we want fast search, it is a good option.
ReplyDeleteFor the nested object, if it also includes many items of the same type, then a nested dict might be still the optimal; for the case here where it is contains heterostructural items and each has a "name" of distinct property, then a dataclass, namedtuple, or enum is a better choice. Two cents.