EasyRepository SystemsPython 3

Duplicate File Groups

Group duplicate repository files by content with deterministic output and path validation.

25m2 sample tests5 hidden tests

Implement find_duplicate_files(files), a repository scan helper that groups files with identical content.

Requirements

  • Input is a dictionary from path to text content.
  • Return only duplicate groups with at least two paths.
  • Sort paths inside each group.
  • Sort groups by their first path.
  • Reject empty paths, . segments, .. traversal, and repeated slashes.
  • Treat path changes as different files even when content matches.

Example

python
1files = {"a.py": "x", "b.py": "x", "c.py": "y"} 2assert find_duplicate_files(files) == [["a.py", "b.py"]]

Constraints

  • Keep the scan in memory.
  • Work only from the input map.
  • Make output deterministic.

Editor