From d68e859681983c48312db6fff00b3a21d9336da4 Mon Sep 17 00:00:00 2001 From: Jirka Fink <fink@ktiml.mff.cuni.cz> Date: Wed, 11 Dec 2024 21:43:50 +0100 Subject: [PATCH] Add hints to the find duplicates assignment --- 09-find_duplicates/task.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/09-find_duplicates/task.md b/09-find_duplicates/task.md index 4953643..fa21e0d 100644 --- a/09-find_duplicates/task.md +++ b/09-find_duplicates/task.md @@ -15,7 +15,7 @@ tests works only on Linux (and not on Windows), and of course also in ReCodEx. You can use full standard library of Python and C++ in this assignment, including data structure implementations (also, `bytearray` might come handy). Your solution must also work on other input data of the same size with similar -number of duplicates. Hence solutions depending on the fact that each string is +number of duplicates. Hence, solutions depending on the fact that each string is uniquely determined by some its substring or similar properties of the input will not be accepted. @@ -25,3 +25,11 @@ Note that due to the space constraints of the Python solutions, tests `10M` and not used and are always considered successful by ReCodEx. Source code templates can be found in [git](https://gitlab.kam.mff.cuni.cz/datovky/assignments/-/tree/master). + +Hints: +* Array [ False ] * 2**20 requires approximately 8 MB since Python stores it as an array of pointers to one value False. Use bytearray instead. +* Read carefully the documentation of bytearray and distinguish the terms bit and byte. +* In Python, do not import numpy or other libraries consuming more memory to load than available. +* The memory limit prevents storing all keys, so trying trivial solutions which store all keys in a dictionary is a waste of time. +* Count the number of duplicates and candidates for duplicates. For properly implemented hashing, those two numbers should be very close. +* Use profilers to trace memory usage; see e.g. https://docs.python.org/3/library/tracemalloc.html or https://valgrind.org/. -- GitLab