Skip to content
Snippets Groups Projects
Commit d68e8596 authored by Jirka Fink's avatar Jirka Fink
Browse files

Add hints to the find duplicates assignment

parent cc154e42
Branches
No related tags found
No related merge requests found
......@@ -15,7 +15,7 @@ tests works only on Linux (and not on Windows), and of course also in ReCodEx.
You can use full standard library of Python and C++ in this assignment,
including data structure implementations (also, `bytearray` might come handy).
Your solution must also work on other input data of the same size with similar
number of duplicates. Hence solutions depending on the fact that each string is
number of duplicates. Hence, solutions depending on the fact that each string is
uniquely determined by some its substring or similar properties of the input
will not be accepted.
......@@ -25,3 +25,11 @@ Note that due to the space constraints of the Python solutions, tests `10M` and
not used and are always considered successful by ReCodEx.
Source code templates can be found in [git](https://gitlab.kam.mff.cuni.cz/datovky/assignments/-/tree/master).
Hints:
* Array [ False ] * 2**20 requires approximately 8 MB since Python stores it as an array of pointers to one value False. Use bytearray instead.
* Read carefully the documentation of bytearray and distinguish the terms bit and byte.
* In Python, do not import numpy or other libraries consuming more memory to load than available.
* The memory limit prevents storing all keys, so trying trivial solutions which store all keys in a dictionary is a waste of time.
* Count the number of duplicates and candidates for duplicates. For properly implemented hashing, those two numbers should be very close.
* Use profilers to trace memory usage; see e.g. https://docs.python.org/3/library/tracemalloc.html or https://valgrind.org/.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment