{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# User Guide" ] }, { "cell_type": "markdown", "metadata": { "vscode": { "languageId": "plaintext" } }, "source": [ "The user guide walks you through the main concepts and possible pitfalls of the library." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sigmaepsilon.deepdict import DeepDict\n", "from pprint import pprint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to create a DeepDict?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can simply create a DeepDict the same way you would create an ordinary `dict`: " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DeepDict({'a': 1, 'b': DeepDict({'c': 2}), 'd': 2})\n" ] } ], "source": [ "data = DeepDict(a=1, b=DeepDict(c=2))\n", "data[\"d\"] = 2\n", "\n", "pprint(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create the same dictionary like this:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DeepDict({'a': 1, 'd': 2, 'b': DeepDict({'c': 2})})\n" ] } ], "source": [ "data = DeepDict(a=1, d=2)\n", "data[\"b\", \"c\"] = 2\n", "\n", "pprint(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only difference here is the order. No surprise, since you provided the values with different order. We can easily make up for this:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DeepDict({'a': 1, 'b': DeepDict({'c': 2}), 'd': 2})\n" ] } ], "source": [ "data = DeepDict(a=1)\n", "data[\"b\", \"c\"] = 2\n", "data[\"d\"] = 2\n", "\n", "pprint(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now this is truly the same as the first one." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wrapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to wrap conventional dictionaries:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {\n", " \"a\" : {\"aa\" : 1},\n", " \"b\" : 2,\n", " \"c\" : {\"cc\" : {\"ccc\" : 3}}, \n", "}\n", "\n", "DeepDict.wrap(d)[\"c\", \"cc\", \"ccc\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that it is also possible to just provide the dictionary to the creator (remember, DeepDicts are dictionaries)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'dict' object has no attribute '__missing__'\n" ] } ], "source": [ "d = {\n", " \"a\" : {\"aa\" : 1},\n", " \"b\" : 2,\n", " \"c\" : {\"cc\" : {\"ccc\" : 3}}, \n", "}\n", "try:\n", " DeepDict(d)[\"c\", \"cc\", \"ccc\"]\n", "except AttributeError as e:\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "but you need to treat it as one:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeepDict(d)[\"c\"][\"cc\"][\"ccc\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can control how the values in the original dictionary are treated with the arguments `copy` and `deepcopy`, refer to the *API reference* for the details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iterating over a DeepDict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a simple dictionary:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'a': {'b': {'c': DeepDict({'e': 1}),\n", " 'd': 2,\n", " 'e': 3,\n", " 'f': (1, 2, 3)}}}\n" ] } ], "source": [ "data = DeepDict()\n", "data['a', 'b', 'c', 'e'] = 1\n", "data['a']['b']['d'] = 2\n", "b = data['a', 'b']\n", "b['e'] = 3\n", "b['f'] = 1, 2, 3\n", "\n", "pprint(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A DeepDict instance works the same way as a simple dictionary would:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'b': DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})}\n" ] } ], "source": [ "for item in data.values():\n", " pprint(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeed, the outmermost dictionary (the 'data' object) has only one value, and it is printed as expected. If you call `values` with the argument `deep=True`, all values are returned, even the ones in the innermost dictionary." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "2\n", "3\n", "(1, 2, 3)\n" ] } ], "source": [ "for v in data.values(deep=True):\n", " print(v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same applies for `keys`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "e\n", "d\n", "e\n", "f\n" ] } ], "source": [ "for k in data.keys(deep=True):\n", " print(k)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the result is a bit ambigous, since the key 'e' was returned twice. In general, there is nothing against different subdictionaries having values with identical keys. The other problem is that you don't know where to use the keys. How should we get the value of the key 'f'? In which subdictionary is it located? To make up for these issues, you can ask for addresses, rather than keys:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'b', 'c', 'e']\n", "['a', 'b', 'd']\n", "['a', 'b', 'e']\n", "['a', 'b', 'f']\n" ] } ], "source": [ "for addr in data.keys(deep=True, return_address=True):\n", " print(addr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use addresses to get values:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 2, 3)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[['a', 'b', 'f']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or simply" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 2, 3)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['a', 'b', 'f']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same applies for `items`:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'b', 'c', 'e'] : 1\n", "['a', 'b', 'd'] : 2\n", "['a', 'b', 'e'] : 3\n", "['a', 'b', 'f'] : (1, 2, 3)\n" ] } ], "source": [ "for addr, v in data.items(deep=True, return_address=True):\n", " print(f\"{addr} : {v}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also loop over the inner dictionaries:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterating over sub-dictionaries" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'b': DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})}\n", "DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})\n", "DeepDict({'e': 1})\n" ] } ], "source": [ "for c in data.containers():\n", " pprint(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maybe you noticed, that 'data' itself was not printed. You can call `containers` with the argument `inclusive=True`, in which case the outermost container is also included:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'a': {'b': {'c': DeepDict({'e': 1}),\n", " 'd': 2,\n", " 'e': 3,\n", " 'f': (1, 2, 3)}}}\n", "{'b': DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})}\n", "DeepDict({'c': DeepDict({'e': 1}), 'd': 2, 'e': 3, 'f': (1, 2, 3)})\n", "DeepDict({'e': 1})\n" ] } ], "source": [ "for c in data.containers(inclusive=True):\n", " pprint(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you only want to get the containers that have no subdictionaries, you can do this:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[DeepDict({'e': 1})]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(filter(lambda d: d.is_leaf(), data.containers(inclusive=True)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `containers` method also accepts the argument `deep`, but it is `True` by default." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Freezing the layout" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Previously you have seen, that a DeepDict instance can be created like this:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "data = DeepDict()\n", "data['a', 'b', 'c', 'e'] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This rises some questions. Can a DeepDict isntance raise a `KeyError` at all? The answer is that it depends. Be default, they can't. Whenever a key is missing, a deeper level is created immediately. When you type `data['a'] = 1`, first a DeepDict is assigned to `data` with the key 'a', then it gets overwritten by the value 1. However, you can freeze the layout of a DeepDict when you feel that you are ready building your dataset." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.lock()\n", "data.locked" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now adding a missing key would raise a `KeyError`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"Missing key 'b' and the object is locked!\"\n" ] } ], "source": [ "try:\n", " data[\"b\"] = 1\n", "except KeyError as e:\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course you can defrost the your DeepDict" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.unlock()\n", "data.locked" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And you can add your new data" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "data[\"b\"] = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Locking your `DeepDict` is essential in some situations, otherwise there is no way to tell if you are in the wrong or not. Typos are a real threat here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Layout information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Every container inside a DeepDict has a parent. The only container that has no parent is the outermost container itself (here 'data')." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'b'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = DeepDict()\n", "data['a', 'b', 'c', 'e'] = 1\n", "data['a', 'b', 'c'].parent.key" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you might have already guessed, nested containers also know how they are stored in their parent via attributes like `key` and `address`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('b', ['a', 'b'])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['a', 'b', 'c'].parent.key, data['a', 'b', 'c'].parent.address" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The nested containers also keep a reference to the outermost container (or none of these):" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DeepDict({'a': DeepDict({'b': DeepDict({'c': DeepDict({'e': 1})})})})" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['a', 'b', 'c'].root" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can easily check if a container is a root, or a leaf:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(False, True)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['a', 'b', 'c'].is_root(), data['a', 'b', 'c'].is_leaf()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Differences between `dict` and `DeepDict`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In most cases a DeepDict works identically to regular dictionaries. One difference is how they provide access to deep levels." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let say we create a dictionary like this:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{(1, 2): 'A'}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{(1, 2): 'A'}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since tuples ar immutable, you can use them as keys in a dictionary. If you do the same with a DeepDict, the result is going to be different:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DeepDict({1: DeepDict({2: 'A'})})" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = DeepDict()\n", "d[(1, 2)] = \"A\"\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, in the second case, the value 'A' is in a nested dictionary with key 2, which itself is in a dictionary with key 1. The reason for this is that the previous cell is identical to the following one." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DeepDict({1: DeepDict({2: 'A'})})" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = DeepDict()\n", "d[1, 2] = \"A\"\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To keep the array-like index mechanism is more important and is a design decision here. The good news is that at the end of the day, the behaviour is the same (at least in tis case):" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'A'" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{(1, 2): 'A'}[(1, 2)]" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'A'" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DeepDict.wrap({(1, 2): 'A'})[(1, 2)]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(1, 2) in d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **Important**\n", "> This evaluated to true not because the tuple (1,2) is contained in 'd', but because `d[(1, 2)]` evaluates without a `KeyError`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you really want (1, 2) to be a single key, you can use the `Key` helper class:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DeepDict({(1, 2): 'A'})" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sigmaepsilon.deepdict import Key\n", "\n", "d = DeepDict()\n", "d[Key((1, 2))] = \"A\"\n", "d" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'A'" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d[Key((1, 2))]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(1, 2) in d" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Key((1, 2)) in d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Printing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to print a `DeepDict`, or a regular `dict` instance as a tree, using the `asciitree` package. Install it with\n", "\n", "```console\n", "$ pip install asciitree\n", "```\n", "\n", "and use the `asciiprint` method from `sigmaepsilon.deepdict`:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data\n", " +-- a\n", " +-- c\n", " +-- cc\n" ] } ], "source": [ "from sigmaepsilon.deepdict import asciiprint\n", "\n", "d = {\n", " \"a\" : {\"aa\" : 1},\n", " \"b\" : 2,\n", " \"c\" : {\"cc\" : {\"ccc\" : 3}}, \n", "}\n", "\n", "data = DeepDict.wrap(d)\n", "data.name = \"Data\"\n", "\n", "asciiprint(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more comprehensive and detailed information about the `asciitree` library, please refer to the [official documentation](https://pythonhosted.org/asciitree/#)." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "data = DeepDict(a=2)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "data[\"a\"] = 1.0" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[\"a\"]" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "data[\"a\"] = None" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DeepDict\n" ] } ], "source": [ "asciiprint(data)" ] } ], "metadata": { "kernelspec": { "display_name": ".deepdict", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }