diff --git a/MASA_DOA/LICENSE.md b/MASA_DOA/LICENSE.md new file mode 100644 index 0000000000000000000000000000000000000000..e61e32f5f425039b50ac134d232e241178b29005 --- /dev/null +++ b/MASA_DOA/LICENSE.md @@ -0,0 +1,55 @@ +Diverse audio Capturing System (DaCAS) example evaluation script Non-commercial License + +Version 1.0 + +Copyright (c) 2026, Nokia Technologies Ltd All rights reserved. + +By downloading this Software , you as the Licensee (“You”) accept and agree to be bound by the terms and conditions of this Diverse audio Capturing System (DaCAS) evaluation script Source Code Non-commercial License (“License"). + +**Definitions.** + +“Specification” means a draft or final version of the Metadata-Assisted Spatial Audio format (MASA) as it is part of the 3GPP IVAS Codec standard specification, 3GPP TS 26.258 Annex A. + +“Software” means the DaCAS example evaluation script made available by Nokia excluding THIRD PARTY MATERIALS listed below. + +“Licensed Field” means using the Software for the non-commercial purposes of evaluation, testing and contribution to standards setting organizations in accordance with the License Grant, to evaluate and test DaCAS example solutions - submitted under the 3GPP DaCAS work - that produce output signals that comply with the Specification. + +“Necessarily Infringed” means that it is not possible on technical grounds to use the Software without infringing such claim of a patent. + +“Licensed Patent” means a claim of a patent owned and licensable by Nokia Technologies Ltd Necessarily Infringed by using the Software within the Licensed Field. For the avoidance of doubt, Licensed Patents are limited solely to patent claims implemented in the Software. + +**License Grant.** + +Subject to the terms of this License and solely within the Licensed Field, Nokia Technologies Ltd (“Nokia”) hereby grants to You a non-transferrable, non-sublicensable, worldwide, non-exclusive, fully paid up, irrevocable (except as stated in this License) limited license, under its copyrights only, to use, run, copy, redistribute and modify (only for internal use and/or for the purpose of providing feedback to Nokia, subject to the section Feedback) and copy the Software within the Licensed Field. Under the License Grant you may redistribute the Software, subject to the section Redistribution. + +**Redistribution.** + +Any redistribution of the Software by You to the Software is subject to the terms of this License. You may redistribute the Software only in the context and purposes of the License Grant. You must retain the complete text of this License in redistributions of the Software thereto in related source code files. You must make available free of charge copies of the complete source code of the Software. The complete source code includes, without being limited to, the source code of the Software, the necessary scripts and configuration files for compiling and installing the executable as well as any relevant documentation. You may not charge fees for anyone to use, copy or distribute the Software. Nokia, Nokia Technologies, or any Nokia trade name may not be used to endorse or promote products or services derived from this Software without Nokia’s prior written permission. + +**No Other Licenses.** + +Any other use of the Software than strictly for the Licensed Field is forbidden. The Software is licensed and not sold to You. Upon this grant of License, no intellectual property rights are assigned to You. Except for the License explicitly granted herein, no other rights or licenses are granted by Nokia or its affiliates, whether express, implied, arising by estoppel or otherwise. For the avoidance of doubt, no licenses are granted with respect to any Nokia or affiliate patents or patent applications. + +**Feedback.** + +You may provide Feedback (defined below) regarding the Software. “Feedback” means any suggestions, comments, corrections, modifications, improvements, or other input, solely relating to Software. You hereby grant Nokia a fully paid up, non-exclusive license, to use, disclose, copy, perform, display, distribute, and create derivative works of Feedback solely for the purpose of the evaluation, testing, research and development, contribution to standards setting organizations, and distribution, and for no other purpose. You may decide, in Your sole discretion, how much, if any, Feedback you provide. Nokia may decide, in its sole discretion, how much, if any, Feedback to use in accordance with this section. Nokia shall not make any reference to You in its products, documentation, or other materials with reference to the Feedback (other than copyright notices). Any Feedback is provided “AS IS” and You disclaim all warranties of any kind, express or implied, except that a) You represent and warrant that You have the necessary rights to disclose Feedback to Nokia, b) Your Feedback does not contain confidential or proprietary information related to Your own activities or those of any third party, c) Nokia is not under any obligation of confidentiality with respect to the Feedback; and d) You are not entitled to any compensation of any kind from Nokia. + +**Termination.** + +This License will terminate immediately without notice from Nokia if You fail to comply with any provision of this License. If You, directly, or indirectly via a controlled affiliate or subsidiary, agent, or exclusive licensee, initiate legal or administrative proceedings or file a claim of patent infringement against any entity alleging that the use of the Software in accordance with this License in whole or in part constitutes direct or contributory patent infringement, or inducement of patent infringement, then this License, including any rights granted to You under this License, shall automatically terminate retroactively as of the date You first received the license grant. Upon termination, You must destroy all copies of the Software. + +**No Warranty or Liability.** + +THIS SOFTWARE IS PROVIDED BY NOKIA "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, AND THAT THE SOFTWARE WILL NOT INFRINGE ANY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS OF THIRD PARTIES, ARE DISCLAIMED. IN NO EVENT SHALL NOKIA OR ITS AFFILIATES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +**Applicable Law and Dispute Resolution.** + +This License is governed by the laws of Finland (excluding its choice of law provisions). All disputes arising from or relating to this License shall be settled by a single arbitrator appointed by the Finland Chamber of Commerce. The arbitration procedure shall take place in Helsinki, Finland in the English language. The foregoing shall be without prejudice to the right of Nokia to seek injunctive relief or other equitable compensation before any court in any place where any unauthorized use of the Software occurs or threatens to occur. + +THIRD PARTY MATERIALS: + +MASA-IO tool (license: Metadata-Assisted Spatial Audio Format (MASA) metadata file reading and writing library (MASA-IO) Non-commercial License Version 1.0)\ +SciPy (license: BSD-3 Clause)\ +NumPy (license: BSD-3 Clause)\ + + diff --git a/MASA_DOA/MASA_DOA.py b/MASA_DOA/MASA_DOA.py new file mode 100644 index 0000000000000000000000000000000000000000..50025616c539eb26c6701d8e3c69612bf1662559 --- /dev/null +++ b/MASA_DOA/MASA_DOA.py @@ -0,0 +1,342 @@ +""" +Evaluation script for calculating Direction Information estimate for signals in Metadata-assisted Spatial Audio -format + +Copyright (c) 2026, Nokia Technologies Ltd. All rights reserved. +""" + +import pyivasmasa_io +import numpy as np +import scipy +import argparse +import sys + +# Constants +FRAME_LEN = 960 +FFT_SIZE = 480 +HOP_SIZE = 240 +N_BANDS = 24 + +# Analysis band: 0 - 8000 Hz +ANALYSIS_BAND_LOW = 0 +ANALYSIS_BAND_HIGH = 20 + +BAND_TOP_HZ = [ + 400, + 800, + 1200, + 1600, + 2000, + 2400, + 2800, + 3200, + 3600, + 4000, + 4400, + 4800, + 5200, + 5600, + 6000, + 6400, + 6800, + 7200, + 7600, + 8000, + 10000, + 12000, + 16000, + 24000, +] + + +def parse_arguments(argv: list[str]): + """ + Parse and return command-line arguments + """ + + argparser = argparse.ArgumentParser() + + argparser.add_argument( + "--input_audio", + action="store", + type=str, + default=None, + help="input transport filename, expected filename extensions are .wav or .pcm", + required=True, + ) + argparser.add_argument( + "--input_meta", + action="store", + type=str, + default=None, + help="input metadata filename with .met filename extension", + required=True, + ) + + args = argparser.parse_args(argv) + + return args + + +def stft240(input): + """ " + Calculate MASA metadata subframe centered Short-time Fourier transform (stft) of input signal + + Returns stft in (channels, bins, subframes) shaped numpy.float64 array + """ + + # Length of the input signal, number of stft-tiles + len_in = input.shape[0] + len_ft = int(np.floor(len_in / HOP_SIZE)) + + # Delay input by 120 samples in order to center the frames + if len(input.shape) > 1: + x = np.concatenate((np.zeros((120, 2)), input)) + out = np.zeros((2, 241, len_ft), dtype=np.complex128) + else: + x = np.concatenate((np.zeros(120), input)) + out = np.zeros((1, 241, len_ft), dtype=np.complex128) + + # Short-time fourier transform, size 480 samples, overlap 240 samples + win = np.hanning(FFT_SIZE) + STFT = scipy.signal.ShortTimeFFT( + win, hop=HOP_SIZE, fs=48000, mfft=FFT_SIZE, scale_to="magnitude" + ) + Sx = STFT.stft(x.T) # perform the STFT + + # Due to the Scipy sliding window properties, the first and last frames are removed (representing t=0 and t=end centered frame) + if len(input.shape) > 1: + out[:, :, :] = Sx[:, :, 1 : len_ft + 1] + else: + out[:, :, :] = Sx[:, 1 : len_ft + 1] + + return out + + +def read_meta(meta_fname): + """ + Read MASA metadata parameters related to the Directional Information calculation + + Returns number of channels and MASA metadata parameters in meta struct comprising {direct_to_total_ratio, azimuth, elevation} + """ + + py_io_lib = pyivasmasa_io.PyIvasMasaIO() + meta_frames = py_io_lib.ivasmasa_read(masa_meta_filename=meta_fname) + + # Number of channels + num_ch = meta_frames[0].descriptiveMeta.numberOfChannels + 1 + + # Read MASA metadata parameters + ratio1_mat = np.concatenate( + [one_meta_frame.dir1Meta.directToTotalRatio for one_meta_frame in meta_frames], + axis=0, + ).transpose() + azi1_mat = np.concatenate( + [one_meta_frame.dir1Meta.azimuth for one_meta_frame in meta_frames], axis=0 + ).transpose() + ele1_mat = np.concatenate( + [one_meta_frame.dir1Meta.elevation for one_meta_frame in meta_frames], axis=0 + ).transpose() + + ratio2_mat = np.concatenate( + [one_meta_frame.dir2Meta.directToTotalRatio for one_meta_frame in meta_frames], + axis=0, + ).transpose() + azi2_mat = np.concatenate( + [one_meta_frame.dir2Meta.azimuth for one_meta_frame in meta_frames], axis=0 + ).transpose() + ele2_mat = np.concatenate( + [one_meta_frame.dir2Meta.elevation for one_meta_frame in meta_frames], axis=0 + ).transpose() + + meta = { + "dirToTot1": ratio1_mat, + "azimuth1": azi1_mat, + "elevation1": ele1_mat, + "dirToTot2": ratio2_mat, + "azimuth2": azi2_mat, + "elevation2": ele2_mat, + } + + return meta, num_ch + + +def read_pcm(transport_fname, n_ch): + """ " + Read transprot channels audio from headerless 16-bit little-endian PCM file, where channels are interleaved. + + Returns samples in (samples, ch) shaped numpy.float64 array + """ + + samples = np.fromfile( + file=transport_fname, dtype=np.dtype(" 1: + if np.mod(len(data), FRAME_LEN) != 0: + num_ch = data.shape[1] + data = np.concatenate( + ( + data, + np.zeros( + (FRAME_LEN - np.mod(len(data), FRAME_LEN), num_ch), + dtype=data.dtype, + ), + ) + ) + else: + if np.mod(len(data), FRAME_LEN) != 0: + data = np.concatenate( + ( + data, + np.zeros( + (FRAME_LEN - np.mod(len(data), FRAME_LEN)), dtype=data.dtype + ), + ) + ) + + if data.dtype == "int16": + data = data.astype("float") / (2.0**15) + elif data.dtype == "int32": + data = data.astype("float") / (2.0**31) + else: + data = data.astype("float") + + return data + + +def estimate_doa(x, meta): + """ + Calculate directional information according to the 3GPP TS 26.260 Section 5.6.4.2 for Metadata-assisted Spatial Audio format + + Returns estimated direction-of-arrival azimuth and elevation angles + """ + + # Calculate time-frequency -domain representation of input signal, output shape = (ch, bin, frames) + xf = stft240(x) + + # Init constants + bin_hz = np.linspace(0, 240, 241) / 240 * 24000 + + band_top = [] + for band in range(0, 24, 1): + band_top.append(int(sum(bin_hz < BAND_TOP_HZ[band]))) + + band_bottom = [x for x in band_top[0:-1]] + band_bottom.insert(0, 0) + + # Calculate energy estimate for each frame + n_subframes = meta["dirToTot1"].shape[1] + energy_stft = np.zeros((N_BANDS, n_subframes)) + for j in range(0, n_subframes): + for k in range(0, N_BANDS): + bottom = band_bottom[k] + top = band_top[k] + energy_stft[k, j] = np.sum(np.abs(xf[:, bottom:top, j]) ** 2) + + # Energy weighted dir-to-tot ratio + vec_len = np.multiply( + meta["dirToTot1"][ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :], + energy_stft[ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :], + ) + # Convert azimuth and elevation to radians + azi_rad = np.multiply(meta["azimuth1"], (np.pi / 180)) + ele_rad = np.multiply(meta["elevation1"], (np.pi / 180)) + + # Calculate direction vector in cartesian coordinates + x_cart = np.multiply( + np.multiply( + np.cos(azi_rad[ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :]), + np.cos(ele_rad[ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :]), + ), + vec_len, + ) + y_cart = np.multiply( + np.multiply( + np.sin(azi_rad[ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :]), + np.cos(ele_rad[ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :]), + ), + vec_len, + ) + z_cart = np.multiply( + np.sin(ele_rad[ANALYSIS_BAND_LOW:ANALYSIS_BAND_HIGH, :]), vec_len + ) + + # Sum-vector over time + x_sum = np.sum(x_cart) + y_sum = np.sum(y_cart) + z_sum = np.sum(z_cart) + + # Estimate azimuth and elevation based on the directional sum-vector + azi_est = np.arctan2(y_sum, x_sum) * (180 / np.pi) + ele_est = np.arctan2(z_sum, np.sqrt((x_sum**2) + (y_sum**2))) * (180 / np.pi) + + return azi_est, ele_est + + +def main(args): + + # Read MASA metadata file + if args.input_meta[-4:] == ".met": + meta, num_ch = read_meta(args.input_meta) + else: + raise ValueError("MASA metadata file with .met filename extensions is expected") + + # Read audio file + if args.input_audio[-4:] == ".wav": + x = read_wav(args.input_audio) + + # Calculate Directional Information estimation + azi_est, ele_est = estimate_doa(x, meta) + + elif args.input_audio[-4:] == ".pcm": + x = read_pcm(args.input_audio, num_ch) + + # Calculate Directional Information estimation + azi_est, ele_est = estimate_doa(x, meta) + else: + raise ValueError( + "Not supported audio format, supported audio formats are: .wav and .pcm" + ) + + print( + "------------------------------------------------------------------------------------" + ) + print( + "--- Directional information estimate according to 3GPP TS 26.260 for MASA signal ---" + ) + print( + "------------------------------------------------------------------------------------" + ) + print(" ") + print(f"Estimated azimuth: {azi_est:.2f}°") + print(f"Estimated elevation: {ele_est:.2f}°") + print(" ") + print( + "------------------------------------------------------------------------------------" + ) + + +if __name__ == "__main__": + """ + Usage example: + python MASA_DOA.py --input_audio input.wav --input_meta input.met + + The script calculates estimated Directional information angles according to the 3GPP TS 26.260 Section 5.6.4.2 for Metadata-assisted Spatial Audio -format. + """ + + args = parse_arguments(sys.argv[1:]) + main(args) diff --git a/MASA_DOA/README.md b/MASA_DOA/README.md new file mode 100644 index 0000000000000000000000000000000000000000..453e98aa904e160cd34bf2d6691d2d96cae6f30f --- /dev/null +++ b/MASA_DOA/README.md @@ -0,0 +1,40 @@ +# DaCAS Evaluation Script + +The `MASA_DOA.py` script is provided to evaluate example solutions developed under the "Diverse audio Capturing System for UEs" (DaCAS) work-item in 3GPP SA4, that produces Metadata Assisted Spatial Audio (MASA) signals. + +## Estimation of directional information from Metadata Assisted Spatial Audio (MASA) signals + +The calculations for estimating the direction informaton of MASA signals is based on the 3GPP TS 26.260 section 5.6.4.2 [1]. The directional information is calculated via transport channel energy weighted cartesian direction-of-arrival vectors based on the MASA metadata, which are further converted to the spherical angels (azimuth and elevation). + +## Dependencies to be installed by the user + +To run the evaluation script, user must install following dependencies: + +- Python >v3.9 (development done with v3.13) +- MASA-IO tool: https://github.com/Nokia-Bell-Labs/MASA_tools_for_IVAS/tree/main/MASA-IO +- NumPy >v1.22.4 (required version depends on the used python version, development done with v2.4) +- Scipy >v1.12 (required version depends on the used python version, development done with v1.16) + + +## Usage + +The script supports 16/24/32-bit WAV files and 16-bit headerless PCM files with interleaved channels. The audio signal is expected to have 48 kHz sampling rate, and the expected file extension for the metadata file is .met. + +Script can be run with the following command-line call: + +```shell +python MASA_DOA.py --input_audio input.wav/.pcm --input_meta input.met +``` + +The program prints the estimated directional information calculated according to the 3GPP TS 26.260 section 5.6.4.2 for MASA signals. + +## License + +Copyright (c) 2026, Nokia Technologies Ltd. + +Licensed under the "Diverse audio Capturing System (DaCAS) example evaluation script Non-commercial License". See [LICENSE.md](Add license path) for details + + +## References + +[1] 3GPP TS 26.260, "Objective test methodologies for the evaluation of immersive audio systems", Release 19