This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;
About this Dataset
Title | Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development" |
---|---|
Description | This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py; |
Modified | 2021-12-31 00:00:00 |
Publisher Name | National Institute of Standards and Technology |
Contact | mailto:[email protected] |
Keywords | Natural language processing , NLP , electron microscopy , controlled vocabulary , ontology |
{ "identifier": "ark:\/88434\/mds2-3198", "accessLevel": "public", "contactPoint": { "hasEmail": "mailto:[email protected]", "fn": "June W. Lau" }, "programCode": [ "006:045" ], "landingPage": "https:\/\/data.nist.gov\/od\/id\/mds2-3198", "title": "Code used to produce terms list in the work \"NLP-Driven Electron Microscopy Ontology Development\"", "description": "This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication \"NLP-Driven Electron Microscopy Ontology Development\". The scripts are written in Python, to be used in the following order:1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model;8. word2vecscript.py and gensim_visual.py;", "language": [ "en" ], "distribution": [ { "downloadURL": "https:\/\/data.nist.gov\/od\/ds\/ark:\/88434\/mds2-3198\/PythonFiles_Maurice_clean.zip", "description": "This zip file contains a set of scripts that extracts frequently occurring words from the conference proceedings of Microscopy & Microanalysis between the years of 2002 and 2019.", "mediaType": "application\/zip", "title": "NLP code to produce words about electron microscopy" } ], "bureauCode": [ "006:55" ], "modified": "2021-12-31 00:00:00", "publisher": { "@type": "org:Organization", "name": "National Institute of Standards and Technology" }, "theme": [ "Information Technology:Data and informatics", "Materials:Modeling and computational material science", "Materials:Materials characterization" ], "keyword": [ "Natural language processing", "NLP", "electron microscopy", "controlled vocabulary", "ontology" ] }