Add hints to the find duplicates assignment

d68e8596 · Jirka Fink · cc154e42 · d68e8596
Commit d68e8596 authored 6 months ago by Jirka Fink
--- a/09-find_duplicates/task.md
+++ b/09-find_duplicates/task.md
@@ -15,7 +15,7 @@ tests works only on Linux (and not on Windows), and of course also in ReCodEx.
 You can use full standard library of Python and C++ in this assignment,
 including data structure implementations (also, `bytearray` might come handy).
 Your solution must also work on other input data of the same size with similar
-number of duplicates. Hence solutions depending on the fact that each string is
+number of duplicates. Hence, solutions depending on the fact that each string is
 uniquely determined by some its substring or similar properties of the input
 will not be accepted.

@@ -25,3 +25,11 @@ Note that due to the space constraints of the Python solutions, tests `10M` and
 not used and are always considered successful by ReCodEx.

 Source code templates can be found in [git](https://gitlab.kam.mff.cuni.cz/datovky/assignments/-/tree/master).
+
+Hints:
+* Array [ False ] * 2**20 requires approximately 8 MB since Python stores it as an array of pointers to one value False. Use bytearray instead.
+* Read carefully the documentation of bytearray and distinguish the terms bit and byte.
+* In Python, do not import numpy or other libraries consuming more memory to load than available.
+* The memory limit prevents storing all keys, so trying trivial solutions which store all keys in a dictionary is a waste of time.
+* Count the number of duplicates and candidates for duplicates. For properly implemented hashing, those two numbers should be very close.
+* Use profilers to trace memory usage; see e.g. https://docs.python.org/3/library/tracemalloc.html or https://valgrind.org/.