From d68e859681983c48312db6fff00b3a21d9336da4 Mon Sep 17 00:00:00 2001
From: Jirka Fink <fink@ktiml.mff.cuni.cz>
Date: Wed, 11 Dec 2024 21:43:50 +0100
Subject: [PATCH] Add hints to the find duplicates assignment

---
 09-find_duplicates/task.md | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/09-find_duplicates/task.md b/09-find_duplicates/task.md
index 4953643..fa21e0d 100644
--- a/09-find_duplicates/task.md
+++ b/09-find_duplicates/task.md
@@ -15,7 +15,7 @@ tests works only on Linux (and not on Windows), and of course also in ReCodEx.
 You can use full standard library of Python and C++ in this assignment,
 including data structure implementations (also, `bytearray` might come handy).
 Your solution must also work on other input data of the same size with similar
-number of duplicates. Hence solutions depending on the fact that each string is
+number of duplicates. Hence, solutions depending on the fact that each string is
 uniquely determined by some its substring or similar properties of the input
 will not be accepted.
 
@@ -25,3 +25,11 @@ Note that due to the space constraints of the Python solutions, tests `10M` and
 not used and are always considered successful by ReCodEx.
 
 Source code templates can be found in [git](https://gitlab.kam.mff.cuni.cz/datovky/assignments/-/tree/master).
+
+Hints:
+* Array [ False ] * 2**20 requires approximately 8 MB since Python stores it as an array of pointers to one value False. Use bytearray instead.
+* Read carefully the documentation of bytearray and distinguish the terms bit and byte.
+* In Python, do not import numpy or other libraries consuming more memory to load than available.
+* The memory limit prevents storing all keys, so trying trivial solutions which store all keys in a dictionary is a waste of time.
+* Count the number of duplicates and candidates for duplicates. For properly implemented hashing, those two numbers should be very close.
+* Use profilers to trace memory usage; see e.g. https://docs.python.org/3/library/tracemalloc.html or https://valgrind.org/.
-- 
GitLab