EasyRepository SystemsPython 3
Duplicate File Groups
Group duplicate repository files by content with deterministic output and path validation.
25m2 sample tests5 hidden tests
Implement find_duplicate_files(files), a repository scan helper that groups files with identical content.
Requirements
- Input is a dictionary from path to text content.
- Return only duplicate groups with at least two paths.
- Sort paths inside each group.
- Sort groups by their first path.
- Reject empty paths,
.segments,..traversal, and repeated slashes. - Treat path changes as different files even when content matches.
Example
python
1files = {"a.py": "x", "b.py": "x", "c.py": "y"}
2assert find_duplicate_files(files) == [["a.py", "b.py"]]Constraints
- Keep the scan in memory.
- Work only from the input map.
- Make output deterministic.
Editor