Exploring solar cell materials with text-based machine learning
Because solar cells provide clean energy, optimizing and studying their constituent materials is crucial. Big data, coupled with machine learning, can expedite this material design process.
However, much of the literature on solar cell materials exists in text form and is difficult to compare to other types of data. To address this, Zhang and Mu created a text-based model to read papers, extract important information, and predict solar cell materials and their properties.
“Reading these papers can be extremely troublesome for many researchers,” said author Lei Zhang. “As a result, automating the literature reading process in an unsupervised manner can be important. The computers do not get tired easily and can read the papers 24 hours a day, 7 days a week!”
The researchers use the natural language process technique to read the text data, which is vastly different than traditional machine learning data types.
As the model reads in abstracts from an online database, it summarizes the major materials and their functionalities. It predicts solar cell materials, most of which are well known, but some of which are unexpected possibilities. These new candidates are further examined with first principles calculations to reveal their optoelectronic properties.
The team believes the model can act as a platform for future machine learning studies of materials and plans to extend the model to read non-English data.
“Reading papers can be extremely difficult for many foreign researchers and students in non-English-speaking countries,” said Zhang. “On the other hand, a lot of important data are reported in the non-English-written journals, patents and reports, which are relevant for the data-driven studies but are frequently overlooked.”
Source: “Unsupervised machine learning for solar cell materials from literature,” by Lei Zhang and Mu He, Journal of Applied Physics (2022). The article can be accessed at https://doi.org/10.1063/5.0064875 .