Merge branch 'main' of... (e51ef837) · Commits · IVAS Codec Public Collaboration / IVAS Processing Scripts

README.md

+7 −2

Original line number	Diff line number	Diff line
		@@ -118,9 +118,14 @@ After the processing is finished, the outputs will be present in the respective

		- These scripts collect items from each experiments `proc_output*` folder(s) and puts the needed files for the listening test into a `proc_final` folder. This folder needs to be uploaded for the dry run and the final delivery of the listening items to the labs.

		### Hash generation
		### Hash generation and checking for duplicates

		The hashes for the `proc_final` can be generated using the [get_md5.py](other/get_md5.py) script:
		The hashes for the `proc_final` can be generated using the [get_md5.py](other/get_md5.py) script.
		This script also checks for identical hashes and thus identifies duplicates in the output files which are reported in a printout.
		When generating hashes one should check if duplicates are reported and if yes, what files are identical - note that there might be duplicates between the actual test and the preliminaries/training which is ok.
		If there is a case with three or more items being the same or two items being the same inside the test or the preliminaries, the input files should be checked for duplicates.

		Script usage:

		```shell
		> python other/get_md5.py --help

+19 −5

Original line number	Diff line number	Diff line
		@@ -44,17 +44,31 @@ def get_hash_line_for_file(file: Path, output_dir: Path):
		return hashline


		def get_duplicates(hashlines: list) -> dict:
		count = Counter([line.split()[-1] for line in hashlines])
		duplicates = {}
		for hash, count in count.items():
		if count == 1:
		continue

		files = [line.replace(hash, "").strip() for line in hashlines if hash in line]
		duplicates[hash] = files

		return duplicates


		def main(output_dir, out_file):
		wav_files = sorted(output_dir.glob("//c[0-9][0-9].wav"))

		hashlines = [get_hash_line_for_file(f, output_dir) for f in wav_files]
		count = Counter([line.split()[-1] for line in hashlines])
		duplicates = [line for line in hashlines if count[line.split()[-1]] != 1]
		duplicates = get_duplicates(hashlines)

		if len(duplicates) != 0:
		print("Found duplicate hashes in these lines:")
		for dup in duplicates:
		print(dup)
		print(
		"Found duplicate hashes! The following hashes were found in multipe files:"
		)
		for hash, files in duplicates.items():
		print(f"{hash} - {', '.join(files)}")

		with open(out_file, "w") as f:
		f.writelines(hashlines)