Post
321
βοΈ The PubMed Open-Access (OA) subset shares a metadata for 35 Million articles. Suddenly, the existing article parser represents a Hugging Face dataset that was supported up until 2024.
ncbi/pubmed
Moreover, the pubmed data represent a compressed XLM which is beneficial for efficiency but limits processing technique application.
π’ To bridge this gap, excited to share pubmed_articles_iter project, which bridges this gap by providing:
βοΈ 1. Downloader for the raw files
βοΈ 2. No-string iterator over pubmed articles, utilized for converting them into JSON.
π¨βπ» Code: https://github.com/nicolay-r/pubmed_articles_iter
Moreover, the pubmed data represent a compressed XLM which is beneficial for efficiency but limits processing technique application.
π’ To bridge this gap, excited to share pubmed_articles_iter project, which bridges this gap by providing:
βοΈ 1. Downloader for the raw files
βοΈ 2. No-string iterator over pubmed articles, utilized for converting them into JSON.
π¨βπ» Code: https://github.com/nicolay-r/pubmed_articles_iter