Data from: Data-efficient methods for determining Flory–Huggins χ parameters in multicomponent polymer formulations
Abstract
Polymer formulations are essential in diverse applications, including personal care products, coatings, paints, adhesives, and plastic materials. Designing these formulations requires navigating large, complex design spaces, where phase and self-assembly behavior critically impact performance. The Flory-Huggins $\chi$ parameter, which quantifies segmental miscibility, is widely used to parameterize the excess free energy of mixing in formulation models. In this work, we introduce two data-efficient, top-down methods for estimating $\chi$ parameters using the Random Phase Approximation (RPA): (i) Boundary Nonlinear Regression (Boundary-NLR), which fits theoretical spinodal boundaries to experimental phase boundaries, and (ii) Surrogate Model Inverse Parameter Estimation (SMIPE), which uses a Gaussian Process Classifier to fit sparse phase maps via a surrogate model. Both methods allow rapid parameterization of polymer field-theoretic models without the need for additional experiments. We evaluate these approaches on datasets involving polymer–solvent–nonsolvent ternary mixtures and block copolymer–solvent systems, demonstrating their robustness to experimental noise and their relevance for real-world formulation design.
Dataset DOI: 10.5061/dryad.s1rn8pkmt
Description of the data and file structure
No experiments were conducted, and all results are from numerical computations. Experimental data that is included has been published elsewhere.
Files and variables
Figure 1
* SK_data.csv: Data for the plot of inverse structure factors.
* k: Wavevector in units of $1/R_g$, where $R_g$ is the radius of gyration.
* invdetSK_{x}: The inverse of the determinant of the structure factor matrix (unitless).
* x: The Flory-Huggins interaction parameter, $\chi$, which ranges from 0.06 to 0.12 in this dataset (unitless).
* Ternary_data.csv: Data for the ternary phase diagram, showing the spinodal curve for a polymer-solvent-nonsolvent system.
* phiS: Volume fraction of the solvent, $\phi_S$.
* phiP: Volume fraction of the polymer, $\phi_P$.
* phiN: Volume fraction of the nonsolvent, $\phi_N$.
Figure 2
* data_PN.csv and data_PS.csv: Data showing the effect of measurement uncertainty on $\chi$ parameter estimation.
* sigma: Measurement uncertainty applied to the volume fraction, $\sigma$ (unitless).
* MSE: Mean Squared Error between the reference $\chi$ values and those estimated using Boundary-NLR with perturbed data (unitless).
* SEM: Standard Error of the Mean, calculated from the standard deviation of the MSE values over multiple trials, representing the precision of the MSE estimate.
Figure 3 & 9
These figures use an identical file layout.
* exp_data_{nonsolvent}_{polymer}_{solvent}.csv: Experimental cloud-point data. Filenames indicate the components, e.g., exp_data_EtOH_PSf_THF.csv.
* phiS: Volume fraction of the solvent, $\phi_S$.
* phiP: Volume fraction of the polymer, $\phi_P$.
* phiN: Volume fraction of the nonsolvent, $\phi_N$.
* fit_data_{nonsolvent}_{polymer}_{solvent}.csv: Predicted spinodal boundaries obtained from the Boundary-NLR method. Filenames correspond to the experimental data, e.g., fit_data_EtOH_PSf_THF.csv.
* phiS: Volume fraction of the solvent, $\phi_S$.
* phiP: Volume fraction of the polymer, $\phi_P$.
* phiN: Volume fraction of the nonsolvent, $\phi_N$.
Figure 4
* training_data.csv: The synthetic dataset used to train the Gaussian Process Classifier (GPC).
* phi_p: Volume fraction of the diblock copolymer, $\phi_p$.
* N_A/N_total: Composition of the diblock, expressed as the volume fraction of the A block, $f_A = N_A/N$.
* class: Phase classification: 0 for homogeneous, 1 for macrophase, 2 for microphase.
* classification_result.csv: The predicted phase map generated by the trained GPC. Variables are identical to training_data.csv.
* rpa_solution.csv: The phase map predicted by the Random Phase Approximation (RPA) after fitting to the GPC map. Variables are identical to training_data.csv.
Figure 5
* coverage_acc.csv: Data comparing the accuracy of margin-based and random sampling strategies as a function of data coverage ($\epsilon$).
* epsilon: Data coverage, defined as the fraction of the total parameter space that has been sampled.
* A_margin: Accuracy achieved using margin sampling.
* A_random: Accuracy achieved using random sampling.
* margin_map.csv: Data for a representative margin map over the parameter space.
* phi_p: Volume fraction of the diblock copolymer, $\phi_p$.
* N_A/N_total: Composition of the diblock, $f_A = N_A/N$.
* margin: The absolute difference between the probabilities of the two most likely phase classifications at a given point.
* acc.csv: The maximum achievable model accuracy at different levels of measurement uncertainty.
* sigma: Simulated measurement error in the phase map (unitless).
* A_margin: Accuracy of margin sampling.
* A_random: Accuracy of random sampling.
* normalized_acc.csv: Accuracy data normalized to the zero-error case ($\sigma=0$) to show the relative drop in performance with increasing measurement uncertainty.
* sigma: Simulated measurement error in the phase map (unitless).
* A_margin: Normalized accuracy of margin sampling.
* A_random: Normalized accuracy of random sampling.
Figure 6
* chi_MSE_SEM.csv: Data showing the accuracy of SMIPE-estimated $\chi$ parameters as a function of the GPC-predicted map's accuracy.
* A: Accuracy of the GPC-generated phase map.
* MSE: Mean Squared Error between the SMIPE-estimated $\chi$ values and the reference values.
* SEM: Standard Error of the Mean for the MSE values.
Figure 7
* chi_data.csv: Data tracking the evolution of estimated $\chi$ parameters during an active learning (SMIPE) workflow as data coverage increases.
* epsilon: Data coverage.
* chi: The estimated Flory-Huggins interaction parameters (e.g., $\chi_{AS}$, $\chi_{BS}$).
* chi_*_err: Standard error of the estimated $\chi$ parameter, used to construct error bars.
* Config_{configuration}.csv: Sample phase maps generated by the GPC at different stages of data coverage in the paper this is denoted in subfigures (b) and (c) during the active learning process.
* phi_p: Volume fraction of the diblock copolymer, $\phi_p$.
* N_A/N_total: Composition of the diblock, $f_A = N_A/N$.
* class: Phase classification: 0 for homogeneous, 1 for macrophase, 2 for microphase.
Figure 8
* phase_boundaries.csv: Data for phase boundaries from three sources: experimental measurements, a model parameterized by SMIPE (this work), and a model parameterized by relative entropy coarse-graining (previous work).
* phase_1: The phase on the lower polymer volume fraction side of the boundary (e.g., 'dis' for disordered).
* phase_2: The phase on the higher polymer volume fraction side of the boundary (e.g., 'bcc' for spheres).
* phi_p: The polymer volume fraction, $\phi_p$, at which the phase transition occurs.
Figure 10
* scft_free_energies.csv: Self-Consistent Field Theory (SCFT) results showing the free energies of different candidate phases as a function of conformational asymmetry.
* bA_bB: The ratio of statistical segment lengths, $b_A/b_B$, representing conformational asymmetry.
* FLAM: The free energy of the Lamellar (L) phase.
* FHEX: The free energy of the Hexagonally-packed cylinder (H) phase.
