Skip to main content

Electronic properties of oligothiophenes

Cite this dataset

Lee, Chee-Kong et al. (2021). Electronic properties of oligothiophenes [Dataset]. Dryad.


Despite the remarkable progress of machine learning (ML) techniques in chemistry, modeling the optoelectronic properties of long conjugated oligomers and polymers with ML remains challenging due to the difficulty in obtaining sufficient training data. Here we use transfer learning to address the data scarcity issue by pre-training graph neural networks using data from short oligomers. With only a few hundred training data, we are able to achieve an average error of about 0.1 eV for excited state energy of oligothiophenes against TDDFT calculations. We show that the success of our transfer learning approach relies on the relative locality of low-lying electronic excitations in long conjugated oligomers. Finally, we demonstrate the transferability of our approach by modeling the lowest-lying excited-state energies of poly(3-hexylthiopnene) (P3HT) in its single-crystal and solution phases using the transfer learning models trained with data of gas-phase oligothiophenes. The transfer learning predicted excited-state energy distributions agree quantitatively with TDDFT calculations and capture some important qualitative features observed in experimental absorption spectra.


The data were obtained from quantum chemical calculations on oligothiophenes (2T to 16T). More information about data generation can be found in the preprint ( The output and the xyz coordinates of the molecules are stored in the json and xyz files respectively, and the data for oligothiophenes of different lengths are saved in different subfolders with self-explanatory folder names. 

Usage notes

After the tar.gz file is downloaded, one can use the Linux command "tar -xzvf  data_for_repository.tar.gz" to extract the contents. In the extractetd "data_for_repository" folder, there are subfolders titled tddft_traj_tX, where X=2,...,16. For example, "tddft_traj_t15" contains all the quantum chemical data for 15T, and the json file names are the indices for the 15T configurations harvested from molecular dynamics simulations, whose xyz coordinates are all saved in the xyz file in the same folder (e.g., The json file is a structured plain text file, and was generated using Python's json module. The xyz file is a common plain text file for chemical structure, and was generated using the VMD program.