Data and code from: Image reconstruction from an elastically distorted scan
Data files
Mar 24, 2026 version files 399.20 MB
-
dewarping_compressed.zip
399.18 MB
-
README.md
20.88 KB
Abstract
We include the code, sample images, and generated figures used to implement the forward and inverse problems of simulating and correcting the artifacts associated with scanning the page of an open book subjected to physical deformation, as detailed in our publication "Image reconstruction from an elastically distorted scan." In particular, the directory titled "dewarping" includes Python scripts and interactive Python notebooks used to simulate scanning artifacts, i.e., implement a solution of the forward problem, on sample images, as well as correct such artifacts, i.e., implement a solution of the inverse problem, on both simulated and real scans. This directory also contains the output figures of our code, some of which are included in our publication and supplementary material. The sample images used are chiefly two paintings by Monet and Seurat, which are dedicated to the public domain. For a more detailed documentation of the files included, please refer to our README and the documentation within the Python scripts.
This dataset contains the code and sample images used in the paper titled "Image reconstruction from an elastically distorted scan." In this paper, we treat a scanned book page as an elastic sheet, and use the physics describing the resulting deformations of the page to inform the inverse problem of correcting optical artifacts associated with this distortion.
Description of the data and file structure
File: dewarping_compressed.zip
Description: directory containing all code used in the forward and inverse problems, along with sample images and generated figures. Each subdirectory and its contents are described below. All Python notebooks are ready to be executed, using sample images and parameters which can be adjusted by the user as desired.
Folder: data_driven
Description: directory containing scripts for implementing our formulation of the inverse problem, combined with simple data-driven techniques to infer physical parameters. The contents of this folder are detailed below.
- File: blurring.ipynb. Description: makes use of radial correlations in a blurred image to infer the degree of blurring, and thus the distance of a scanned page from the focal plane.
- File: darkening.ipynb. Description: analyzes the 2-dimensional Fourier spectrum of a darkened image, and uses the low-wavenumber components to infer the presence of darkening.
- File: distortion.ipynb. Description: analyzes the 2-dimensional Fourier spectrum of vertical strips in a distorted image, and uses the increase in the mean horizontal wavenumber of these spectra to infer the presence of distortion.
- File: forward.ipynb. Description: a minimal implementation of the forward problem, used for generating simulations of scanned images to be analyzed in blurring.ipynb, darkening.ipynb, and distortion.ipynb.
- File: physics.py. Description: a helper file with functions to compute the page shape of a downward-facing book from elasticity theory. These functions compute the angle of the book relative to the horizontal at each point along the page and/or the set of 2-dimensional points parameterizing the shape of the page.
- File: plot_utils.py. Description: a helper file containing a function to streamline plotting of a two-plot-by-two-plot figure.
- File: utils.py. Description: a helper file containing functions to streamline the implementation of linear algebraic methods, Fourier transformations, and specific elements of the forward and inverse problems.
- Folder: output_figs. Description: subdirectory containing the output figures of the above Python notebooks, primarily those figures which are used in the manuscript.
Folder: dewarping
Description: directory containing scripts for implementing our formulations of the forward and inverse problems of simulating and correcting the artifacts associated with book scanning. The contents of this folder are detailed below.
- Folder: book_photos. Description: subdirectory containing several photos of curved book pages, as well as verifications of the physical model (see "fit_curve_to_page.py") using these photos. The raw photos are contained in the files titled "IMG_*", while the contents of the subdirectory are detailed below.
- Folder: pages. Description: subdirectory containing a brief description of the book, which was scanned (in "description.txt"), as well as two subdirectories titled "inputs" and "outputs." The former contains several photos of curved book pages from the same book, with the file names "p*.jpg" containing the page to which the book was turned in place of "*". The latter contains plots showing the result of fitting the physical model to each of these photos, where each file name again contains the page number to which the book was turned.
- Folder: output_figs. Description: subdirectory containing the figures generated by solving the forward and inverse problems of simulating and correcting artifacts associated with a xerographic scan. The contents of this folder are detailed below.
- Folder: disk. Description: subdirectory containing the figures generated by the "disk" example (see "fig_generation_disk.ipynb"), where the source image to be used in the forward and inverse problem consists of a white circular disk on a black background, with several smaller black disks within. The purpose of this example is to clearly show the various artifacts associated with scanning. For this example as well as the others, the naming convention is as follows: "src.pdf" refers to the source image, before processing has taken place. "dist.pdf" is the image after distortion (the first step of the forward problem) has been applied. "dark.pdf" is the image after distortion and darkening, and "blur.pdf" is the image after distortion, darkening, and blurring (the solution of the forward problem). "deblur.pdf" is the result of applying deblurring to "blur.pdf" (the first step of the inverse problem), "undark.pdf" is the result of undoing the blurring and the darkening, and "dewarp.pdf" is the result of undoing the blurring, darkening, and distortion (the solution of the inverse problem). Finally, "error.pdf" shows the pixel-by-pixel difference between "src.pdf" and "dewarp.pdf". For theoretically ideal implementations of the forward and inverse problems, this difference would be uniformly zero, and so its deviations from zero show the regions of the image where the approximations inherent to our implementation have induced imperfections.
- Folder: monet_haystacks. Description: subdirectory containing the figures generated by the "haystacks" example, where the source image is accessible at https://www.wikiart.org/en/claude-monet/haystacks-white-frost-sunrise. We have also added a checkered bar at the top of this image to ease visualization. The naming of each figure follows the same convention as in the disk example.
- Folder: monet_haystacks_text. Description: subdirectory containing the figures generated by the "haystacks with text" example, where the source image is the same as in the "haystacks" example, but with filler text added at the bottom to demonstrate the effect of our implementation on a textual image. The naming of each figure follows the same convention as in the disk example.
- Folder: open_book. Description: subdirectory containing the figures generated by the "open book" example, where the source is the same as in the "haystacks with text" example, but the implementation for the forward and inverse problems is that of an upwards-facing book (see "fig_generation_open.ipynb"). The naming of each figure follows the same convention as in the disk example. This folder also contains the file "page_shape.pdf", which shows the result of fitting the physical model for an upwards-facing book page (see "physics_open.ipynb") to a photo of such a book page.
- Folder: real_scan. Description: subdirectory containing the figures generated by the "real scan" example, where the inverse problem is applied to a real scan of a book page (see "fig_generation_for_real_scan.ipynb"). "scan.pdf" and "dark_scan.pdf" contain scans of a painting by Seurat (accessible at https://www.artic.edu/artworks/27992/a-sunday-on-la-grande-jatte-1884) from a curved book page, while "scan_flat.pdf" is a scan of the same painting from a flat page. "udark_scan.pdf" is the result of undoing the darkening on this scan, while "ududub_scan.pdf" is the result of undoing darkening, blurring, and distortion, thus representing the solution of the forward problem. "haystacks_and_bar.pdf", "sample_text.pdf", and "merged.pdf" are images generated from the Monet painting, which were also used to test the inverse problem implementation.
- Folder: monet. Description: subdirectory containing the figures generated by the "real scan" example applied to the Monet painting. "merged.pdf" was the figure printed and subjected to a scan, "scan.pdf" is the resulting scanned image, and "udark_scan.pdf", "udud_scan.pdf", and "ududud_scan.pdf" are the results of passing this scanned image through various stages of the inverse problem implementation.
- Folder: seurat. Description: subdirectory containing the input images used by the "real scan" example applied to the Seurat painting. "seurat_and_bar.pdf" is the primary image which was printed and scanned, then treated by the implementation of the inverse problem.
- Folder: shape_verification. Description: contains several of the figures generated by the verification of the physical model with photos of a curved book page (see "shape_verification_.ipynb"). Files titled "_book.pdf" are photos of book pages, those titled "_fit.pdf" contain plots showing the fits of the physical model to these photographed pages, those titled "_residuals.pdf" show the residuals of these fits, and those titled "*_curvefit.pdf" show more polished plots of residuals including uncertainties.
- Folder: shape_vs_number. Description: contains plots output by "shape_verification_serial.ipynb", showing the dependence of the best-fit physical parameters on the page number of the book being scanned. The plot in "l_and_theta_vs_page_number.pdf" is included in our manuscript in order to demonstrate how the best-fit physical parameters of one or a few pages can be extrapolated to the remainder of a book.
- File: "open_bookmonet_haystacks_text_src.pdf". Description: the source image, containing a Monet painting as well as filler text, used in the implementation of the forward and inverse problems applied to an open book page (see "fig_generation_open.ipynb").
- File: "open_bookmonet_haystacks_text_src.pdf". Description: the source image, containing a Monet painting as well as filler text, used in the implementation of the forward and inverse problems applied to an open book page (see "fig_generation_open.ipynb").
- File: "open_bookmonet_haystacks_text_dark.pdf". The result of passing the above source image through the forward problem applied to an open book page. This image thus contains simulated artifacts associated with scanning, namely distortion, blurring, and darkening (see "fig_generation_open.ipynb").
- File: fig_generation.ipynb. Description: an interactive Python notebook containing an implementation of the forward and inverse problems applied to an example image, assuming the physical model associated with a downward-facing open book. This script first reads in a source image (in this case, the aforementioned Monet), then applied the forward problem of simulating artifacts associated with scanning in a sequential process. First distortion is applied, then darkening, then blurring. The script then uses the resulting image of this forward problem as the input for an implementation of the inverse problem, thus removing the artifacts associated with scanning to recover the original image. This takes place in the reverse order of the forward problem: the image is first deblurred, then undarkened, and finally corrected for distortion. Crucially, in its solution of the inverse problem, the script does not rely on knowledge of the intermediate steps in the forward problem; rather, it uses only the simulated scan and the physical model of the page shape (see "physics.py") to remove the artifacts associated with scanning.
- File: fig_generation_disk.ipynb. Description: an interactive Python notebook containing an implementation of the forward and inverse problems applied to a generated image of a disk. This script proceeds similarly to "fig_generation.ipynb", but rather than reading in a source image, it generates such an image (in this case, of a white disk on a black background with several smaller black disks within, see "output_figs/disk") for the sake of clarifying each of the artifacts associated with scanning.
- File: fig_generation_for_real_scan.ipynb. Description: an interactive Python notebook which produces the images to be used in real scans (see "real_scan_monet.ipynb" and "real_scan_seurat.ipynb"). These images are the aforementioned Monet and Seurat paintings, with checkered calibration bars added at the top margin and sample text added at the bottom.
- File: fig_generation_open.ipynb. Description: an interactive Python notebook containing an implementation of the forward and inverse problems applied to an example image, assuming the physical model associated with an upward-facing open book. The steps in this script are similar to those in "fig_generation.ipynb", but now the physical model used is that describing an upward-facing book, resulting in a different page shape (see "physics_open.ipynb").
- File: fit_curve_to_page.py. Description: a helper file containing the function "fit_curve_to_page()" which employs edge finding on a photo of the profile of a book page, then calculates its angle relative to the horizontal as a function of arc length and along the page, and fits the physical model (see equation 2.4 of our article) to this curve. In doing so, it determines the physical parameters associated with the page's shape.
- File: inverse_utils.py. Description: a helper file containing a few functions associated with inverting a matrix, used to streamline linear algebraic calculations.
- File: page_params.txt. Description: a plain text file containing parameters associated with several of the book photos. These parameters include information such as the photo's filename, the page number of the book it came from, and the ideal locations for cropping before edge detection is applied.
- File: physics.py. Description: a helper file with functions to compute the page shape of a downward-facing book from elasticity theory. These functions compute the angle of the book relative to the horizontal at each point along the page and/or the set of 2-dimensional points parameterizing the shape of the page.
- File: physics_open.py. Description: a helper file with functions to compute the page shape of an upward-facing book from elasticity theory. These functions compute the angle of the book relative to the horizontal at each point along the page and/or the set of 2-dimensional points parameterizing the shape of the page.
- File: read_dicts_from_text.py. Description: a helper file with one function, "read_dicts_from_txt()", which is used to read the book photo parameters from "page_params.txt" and return this information as a Python dictionary.
- Files: real_scan_monet.ipynb, real_scan_seurat.ipynb. Description: interactive Python notebooks containing an implementation of the inverse problem applied to a source image of a real scan, either the Monet or Seurat, as the filename suggests. The purpose of these scripts is to demonstrate the efficacy of our solution to the inverse problem for correcting the artifacts of a real scan, rather than applying it to a simulated scan as in "fig_generation.ipynb".
- File: real_scan_seurat_highdist.ipynb. Description: a script similar to real_scan_seurat.ipynb, but with a scanned image with a greater degree of distortion. The purpose of this example is to make the action of the inverse problem implementation more clear.
- File: shape_verifaction.ipynb. Description: an interactive Python notebook which fits the page shape from the physical model to a photo of a book page (see "fit_curve_to_page.py"). This script uses an example where the physical model performs poorly (due to boundary conditions not matching those we assume) to demonstrate its limitations.
- File: shape_verifaction_serial.ipynb. Description: an interactive Python notebook which fits the page shape from the physical model to a photo of several book pages of the same book. The purpose of doing so is to experimentally establish the relationship between the parameters of the physical model and the page number to which the book is turned.
- File: shape_verifaction_theta.ipynb. Description: an interactive Python notebook which fits the page shape from the physical model to a photo of a book page. This script uses the angle of the page relative to the horizontal as a function of arc length along the page to parameterize the page's shape.
- File: shape_verifaction_xy.ipynb. Description: an interactive Python notebook which fits the page shape from the physical model to a photo of a book page. This script uses a set of two-dimensional Cartesian coordinates to parameterize the page's shape, which results in a fit identical to that of the alternative angle parameterization (see shape_verification_theta.ipynb).
- File: utils.py. Description: a helper file containing functions to streamline the implementation of linear algebraic methods, Fourier transformations, and specific elements of the forward and inverse problems.
Folder: sample_inputs
Description: directory containing the sample input images used in the implementations of the forward and inverse problems. The contents of this folder are detailed below.
- File: georges_seurat_a_sunday_on_the_grande_jatte_lg_hires.jpg. Description: an image of an 1884 painting by Georges Seurat, used as an input image for demonstrating our implementation of the forward and inverse problems. Accessed from https://www.artic.edu/artworks/27992/a-sunday-on-la-grande-jatte-18840. This work is in the public domain.
- File: monet_bridge.jpg: Description: an image of an 1899 painting by Claude Monet, also used for demonstration of our implementation. Accessed from https://artmuseum.princeton.edu/art/collections/objects/31852. This work is in the public domain.
- File: monet_haystacks.jpg. Description: an image of an 1889 painting by Claude Monet, also used for the demonstration of our implementation. Accessed from https://www.wikiart.org/en/claude-monet/haystacks-white-frost-sunrise. This work is in the public domain.
- File: monet_haystacks_text.jpg. Description: the above image, with added sample text and a calibration bar to make the results of our implementation more clear as well as to demonstrate these results on textual images.
- Files: sample_text.pdf, sample_text.png. Description: two file formats containing the same image of sample text used in the generation of "monet_hapstacks_text.jpg".
Folder: scanned_inputs
Description: directory containing the scanned input images used in the implementation of the inverse problem. The contents of this folder are detailed below.
- File: curved_scan.png. Description: a scan of an image of emperor penguins, with some calibration markings above. This image was placed into the curved page of a book before being scanned, and as a result, its scan demonstrates the artifacts we seek to simulate and invert.
- File: flat_scan.png. Description: a scan of the same image in the above, but printed on a flat piece of paper to provide an example of a scan without the artifacts we consider.
- File: monet_scan.png. Description: a scan of an image of a Monet painting placed into a curved book page, used as an input to the inverse problem of correcting the artifacts associated with scanning a curved page (see "dewarping/fig_generation_for_real_scan.ipynb").
- Files: scan_seurat.pdf, scan_seurat_highdist.pdf. Description: scans of an image of a Seurat painting with an added calibration bar and sample text placed into a curved book page, used as an input to the inverse problem of correcting the artifacts associated with scanning a curved page (see "dewarping/fig_generation_for_real_scan.ipynb"). The latter file contains the same image as the first, but placed into a page with greater curvature, which thus demonstrates the artifacts we consider to a greater extent.
- File: scan_seuratsmall_scale_highdisp.pdf. Description: a scan of an image of a Monet painting with an added calibration bar and sample text, placed into a curved book page, also used as an input to the inverse problem.
Sharing/Access information
Other versions of the code are available upon request to the authors. Sample images were derived from the following sources (both works are in the public domain):
