This page was generated from docs\source\user_guide.ipynb.

User Guide#

The user guide walks you through the main concepts and possible pitfalls of the library.

[1]:
from sigmaepsilon.deepdict import DeepDict
from pprint import pprint

How to create a DeepDict?#

You can simply create a DeepDict the same way you would create an ordinary dict:

[2]:
data = DeepDict(a=1, b=DeepDict(c=2))
data["d"] = 2

pprint(data)
DeepDict({'a': 1, 'b': DeepDict({'c': 2}), 'd': 2})

You can create the same dictionary like this:

[3]:
data = DeepDict(a=1, d=2)
data["b", "c"] = 2

pprint(data)
DeepDict({'a': 1, 'd': 2, 'b': DeepDict({'c': 2})})

The only difference here is the order. No surprise, since you provided the values with different order. We can easily make up for this:

[4]:
data = DeepDict(a=1)
data["b", "c"] = 2
data["d"] = 2

pprint(data)
DeepDict({'a': 1, 'b': DeepDict({'c': 2}), 'd': 2})

Now this is truly the same as the first one.

Wrapping#

It is also possible to wrap conventional dictionaries:

[5]:
d = {
    "a" : {"aa" : 1},
    "b" : 2,
    "c" : {"cc" : {"ccc" : 3}},
}

DeepDict.wrap(d)["c", "cc", "ccc"]
[5]:
3

Note that it is also possible to just provide the dictionary to the creator (remember, DeepDicts are dictionaries)

[6]:
d = {
    "a" : {"aa" : 1},
    "b" : 2,
    "c" : {"cc" : {"ccc" : 3}},
}
try:
    DeepDict(d)["c", "cc", "ccc"]
except AttributeError as e:
    print(e)
'dict' object has no attribute '__missing__'

but you need to treat it as one:

[7]:
DeepDict(d)["c"]["cc"]["ccc"]
[7]:
3

You can control how the values in the original dictionary are treated with the arguments copy and deepcopy, refer to the API reference for the details.

Iterating over a DeepDict#

Create a simple dictionary:

[8]:
data = DeepDict()
data['a', 'b', 'c', 'e'] = 1
data['a']['b']['d'] = 2
b = data['a', 'b']
b['e'] = 3
b['f'] = 1, 2, 3

pprint(data)
{'a': {'b': {'c': DeepDict({'e': 1}),
             'd': 2,
             'e': 3,
             'f': (1, 2, 3)}}}

A DeepDict instance works the same way as a simple dictionary would:

[9]:
for item in data.values():
    pprint(item)
{'b': DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})}

Indeed, the outmermost dictionary (the ‘data’ object) has only one value, and it is printed as expected. If you call values with the argument deep=True, all values are returned, even the ones in the innermost dictionary.

[10]:
for v in data.values(deep=True):
    print(v)
1
2
3
(1, 2, 3)

The same applies for keys:

[11]:
for k in data.keys(deep=True):
    print(k)
e
d
e
f

As you can see, the result is a bit ambigous, since the key ‘e’ was returned twice. In general, there is nothing against different subdictionaries having values with identical keys. The other problem is that you don’t know where to use the keys. How should we get the value of the key ‘f’? In which subdictionary is it located? To make up for these issues, you can ask for addresses, rather than keys:

[12]:
for addr in data.keys(deep=True, return_address=True):
    print(addr)
['a', 'b', 'c', 'e']
['a', 'b', 'd']
['a', 'b', 'e']
['a', 'b', 'f']

You can use addresses to get values:

[13]:
data[['a', 'b', 'f']]
[13]:
(1, 2, 3)

or simply

[14]:
data['a', 'b', 'f']
[14]:
(1, 2, 3)

The same applies for items:

[15]:
for addr, v in data.items(deep=True, return_address=True):
    print(f"{addr} : {v}")
['a', 'b', 'c', 'e'] : 1
['a', 'b', 'd'] : 2
['a', 'b', 'e'] : 3
['a', 'b', 'f'] : (1, 2, 3)

You can also loop over the inner dictionaries:

Iterating over sub-dictionaries#

[16]:
for c in data.containers():
    pprint(c)
{'b': DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})}
DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})
DeepDict({'e': 1})

Maybe you noticed, that ‘data’ itself was not printed. You can call containers with the argument inclusive=True, in which case the outermost container is also included:

[17]:
for c in data.containers(inclusive=True):
    pprint(c)
{'a': {'b': {'c': DeepDict({'e': 1}),
             'd': 2,
             'e': 3,
             'f': (1, 2, 3)}}}
{'b': DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})}
DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})
DeepDict({'e': 1})

If you only want to get the containers that have no subdictionaries, you can do this:

[18]:
list(filter(lambda d: d.is_leaf(), data.containers(inclusive=True)))
[18]:
[DeepDict({'e': 1})]

The containers method also accepts the argument deep, but it is True by default.

Freezing the layout#

Previously you have seen, that a DeepDict instance can be created like this:

[19]:
data = DeepDict()
data['a', 'b', 'c', 'e'] = 1

This rises some questions. Can a DeepDict isntance raise a KeyError at all? The answer is that it depends. Be default, they can’t. Whenever a key is missing, a deeper level is created immediately. When you type data['a'] = 1, first a DeepDict is assigned to data with the key ‘a’, then it gets overwritten by the value 1. However, you can freeze the layout of a DeepDict when you feel that you are ready building your dataset.

[20]:
data.lock()
data.locked
[20]:
True

Now adding a missing key would raise a KeyError.

[21]:
try:
    data["b"] = 1
except KeyError as e:
    print(e)
"Missing key 'b' and the object is locked!"

Of course you can defrost the your DeepDict

[22]:
data.unlock()
data.locked
[22]:
False

And you can add your new data

[23]:
data["b"] = 1

Locking your DeepDict is essential in some situations, otherwise there is no way to tell if you are in the wrong or not. Typos are a real threat here.

Layout information#

Every container inside a DeepDict has a parent. The only container that has no parent is the outermost container itself (here ‘data’).

[24]:
data = DeepDict()
data['a', 'b', 'c', 'e'] = 1
data['a', 'b', 'c'].parent.key
[24]:
'b'

As you might have already guessed, nested containers also know how they are stored in their parent via attributes like key and address.

[25]:
data['a', 'b', 'c'].parent.key, data['a', 'b', 'c'].parent.address
[25]:
('b', ['a', 'b'])

The nested containers also keep a reference to the outermost container (or none of these):

[26]:
data['a', 'b', 'c'].root
[26]:
DeepDict({'a': DeepDict({'b': DeepDict({'c': DeepDict({'e': 1})})})})

You can easily check if a container is a root, or a leaf:

[27]:
data['a', 'b', 'c'].is_root(), data['a', 'b', 'c'].is_leaf()
[27]:
(False, True)

Differences between dict and DeepDict#

In most cases a DeepDict works identically to regular dictionaries. One difference is how they provide access to deep levels.

Let say we create a dictionary like this:

[28]:
{(1, 2): 'A'}
[28]:
{(1, 2): 'A'}

Since tuples ar immutable, you can use them as keys in a dictionary. If you do the same with a DeepDict, the result is going to be different:

[29]:
d = DeepDict()
d[(1, 2)] = "A"
d
[29]:
DeepDict({1: DeepDict({2: 'A'})})

As you can see, in the second case, the value ‘A’ is in a nested dictionary with key 2, which itself is in a dictionary with key 1. The reason for this is that the previous cell is identical to the following one.

[30]:
d = DeepDict()
d[1, 2] = "A"
d
[30]:
DeepDict({1: DeepDict({2: 'A'})})

To keep the array-like index mechanism is more important and is a design decision here. The good news is that at the end of the day, the behaviour is the same (at least in tis case):

[31]:
{(1, 2): 'A'}[(1, 2)]
[31]:
'A'
[32]:
DeepDict.wrap({(1, 2): 'A'})[(1, 2)]
[32]:
'A'
[33]:
(1, 2) in d
[33]:
True

Important This evaluated to true not because the tuple (1,2) is contained in ‘d’, but because d[(1, 2)] evaluates without a KeyError.

If you really want (1, 2) to be a single key, you can use the Key helper class:

[34]:
from sigmaepsilon.deepdict import Key

d = DeepDict()
d[Key((1, 2))] = "A"
d
[34]:
DeepDict({(1, 2): 'A'})
[35]:
d[Key((1, 2))]
[35]:
'A'
[36]:
(1, 2) in d
[36]:
False
[37]:
Key((1, 2)) in d
[37]:
True

Printing#

It is possible to print a DeepDict, or a regular dict instance as a tree, using the asciitree package. Install it with

$ pip install asciitree

and use the asciiprint method from sigmaepsilon.deepdict:

[38]:
from sigmaepsilon.deepdict import asciiprint

d = {
    "a" : {"aa" : 1},
    "b" : 2,
    "c" : {"cc" : {"ccc" : 3}},
}

data = DeepDict.wrap(d)
data.name = "Data"

asciiprint(data)
Data
 +-- a
 +-- c
     +-- cc

For more comprehensive and detailed information about the asciitree library, please refer to the official documentation.