Skip to content
Snippets Groups Projects
Select Git revision
  • 767916bc323915bb8a9aaf1f7a24075e64c94e2c
  • master default protected
2 results

task.md

Blame
  • task.md 1.50 KiB

    In this assignment, you are given a large file on input. Your goal is to find duplicated lines and return every duplicated line once.

    The challenging part of this assignment is the fact, that your program has to run in a limited memory, using at most 64MB for C++ and 12MB for Python (and Python itself requires about 5MB), and the input file can be considerably larger than this memory limit. However, you can rely on the fact that the number of duplicated lines is considerably smaller (so that all duplicated lines fit in the memory at the same time).

    Instead of handling a real file, you are given a data generator (an iterator in C++ and a generator in Python). Note that limiting memory during the tests works only on Linux (and not on Windows), and of course also in ReCodEx.

    You can use full standard library of Python and C++ in this assignment, including data structure implementations (also, bytearray might come handy). Your solution must also work on other input data of the same size with similar number of duplicates. Hence solutions depending on the fact that each string is uniquely determined by some its substring or similar properties of the input will not be accepted.

    As usual, you should submit only the find_duplicates.{h,py} file.

    Note that due to the space constraints of the Python solutions, tests 10M and 16M are not used and are always considered successful by ReCodEx.

    Source code templates can be found in git.