+35
−3
+96
−0
src/tdoc_crawler/crawlers/hybrid.py
0 → 100644
+493
−0
File added.
Preview size limit exceeded, changes collapsed.
+331
−0
Loading
* Add `fetch_meeting_document_list` function to retrieve TDoc metadata from Excel files. * Implement Excel parsing with `parse_excel_document_list` and related helper functions. * Introduce error handling for document list fetching and parsing. * Create a new module `meeting_doclist.py` for document list functionalities. * Update `parallel.py` to include subinterpreter support for document list fetching. * Enhance `tdocs.py` to use the new executor adapter for parallel crawling. * Add configuration options in `tdocs.py` for document list and parallel crawling behavior. * Create tests for document list fetching and parsing in `test_meeting_document_list.py`. * Implement tests for the executor adapter in `test_executor_adapter.py`.
File added.
Preview size limit exceeded, changes collapsed.