Mitigating bias in materials science data
Developing a new material with specific properties can be an enormous challenge. The scope of possible materials is nearly infinite and necessitates using computational models to narrow the search space. These models are often data-driven, relying on previous studies and collected databases of materials to identify likely candidates.
However, relying on limited data to inform models can introduce bias, leading to models that do not generalize well in underrepresented material classes. Zhang et al. developed an entropy-targeted active learning method to mitigate that bias.
“The knowledge of materials is not evenly distributed,” said author Wei Chen. “Researchers have different focuses and preferences of what materials to study; some experiments and simulations are easier to conduct than others. These all contribute to bias.”
The team settled on information entropy as a metric to measure bias. Using this metric, they organized existing data into regions in a materials space and constructed their algorithm to more heavily prioritize regions that received less attention. They also identified regions that would most benefit from future research efforts to reduce data bias.
The authors intend to raise awareness of data bias among other researchers in materials science and engineering, and they are working to implement their model and improve the quality of collected data.
“We would like to apply the bias mitigation method to real database construction,” said Chen. “We have been discussing such opportunities with several creators and maintainers of widely used materials databases and resources.”
Source: “ET-AL: entropy-targeted active learning for bias mitigation in materials data,” by Hengrui Zhang, Wei (Wayne) Chen, James M. Rondinelli, and Wei Chen, Applied Physics Reviews (2023). The article can be accessed at https://doi.org/10.1063/5.0138913 .