Skip to main content

Redefining floristic zones in the Korean Peninsula using high-resolution georeferenced specimen data and self-organizing maps

Cite this dataset

Jung, Songhie; Cho, Yong-chan (2021). Redefining floristic zones in the Korean Peninsula using high-resolution georeferenced specimen data and self-organizing maps [Dataset]. Dryad.


The use of biota to analyze the distribution pattern of biogeographic regions is essential to gain a better understanding of the ecological processes that cause biotic differentiation and biodiversity at multiple spatiotemporal scales. Recently, the collection of high-resolution biological distribution data (e.g., specimens) and advances in analytical theory have led to the quantitative analysis and more refined spatial delineation of biogeographic regions. This study was conducted to redefine floristic zones in the southern part of the Korean Peninsula and to better understand the eco-evolutionary significance of the spatial distribution patterns. Based on 309,333 distribution data of 2,954 vascular plant species in the Korean Peninsula, we derived floristic zones using self-organizing maps. We compared the characteristics of the derived regions with those of historical floristic zones and ecologically important environmental factors (climate, geology, and geography). In the clustering analysis of the floristic assemblages, four distinct regions were identified, namely, the cold floristic zone (Zone I) in high-altitude regions at the center of the Korean Peninsula, cool floristic zone (Zone II) in high-altitude regions in the south of the Korean Peninsula, warm floristic zone (Zone III) in low-altitude regions in the central and southern parts of the Korean Peninsula, and maritime warm floristic zone (Zone IV) including the volcanic islands Jejudo and Ulleungdo. Totally, 1,099 taxa were common to the four floristic zones. Zone IV showed the highest abundance of specific plants (those found in only one zone), with 404 taxa. Our study improves floristic zone definitions using high-resolution regional biological distribution data. It will help better understand and re-establish regional species diversity. In addition, our study provides key data for hotspot analysis required for the conservation of plant diversity.


The vascular plant distribution data based on specimen and coordinate data for plants collected between 2003 and 2015 in the southern part of the Korean Peninsula at Korea National Arboretum. The vascular plant distribution maps contained coordinate data for 309,333 specimens, corresponding to 2,954 taxa in 175 families and 919 genera. A grid system was overlaid on a national topographic map to combine the taxonomic groups located in each cell of the grid (cell size, 11.2 km × 13.9 km) with the location coordinates in a single data set.

Using distribution data for the 771 grid cells and 2,954 plant taxa, a SOM training data set was constructed in the form of a presence-absence matrix (771 rows × 2,954 columns). The ‘kohonen’ R package was used for the SOM algorithm, and the output layer was composed of 81 output nodes arranged in a square lattice. To determine the types, hierarchical cluster analysis was applied to the weight vectors of the SOM map units after conversion to Euclidean distance metrics (via the function hclust in R using the complete linkage method). The optimal number of types was calculated by applying the silhouette coefficient to the range of 2–15 types. In mapping the regionalization results, the grid cells that were empty because of exclusion from the survey were filled using the maximum frequency value from the surrounding eight cells. As some island regions (Ulleungdo Island and Dokdo Island) showed heterogeneous values because of their distance from the adjacent grid cells, mapping was performed using type values within the local range.

The correlations in species composition among the floristic zones were analyzed using Venn diagrams (with the “VennDiagram” package) based on lists of species in each zone (Chen and Boutros, 2011). After listing all species in each zone, the common taxa (those appearing in all zones) and specific taxa (those appearing in only specific zones) were distinguished. Thereafter, floristic compositions were investigated by analyzing the identification of specific taxa at the family level.

Geographic and climatic factors were analyzed as macro-environmental factors, using the defined floristic zones. For the geographic factors, the latitude and longitude were used, and for climatic factors, air temperature and precipitation data—provided by the Korean Meteorological Administration (2020) and collected from 583 points between 1970 and 2010—were used. In addition to the direct environmental data, the warmth index (WI) and coldness index (CI) were calculated and used as indirect climate data (Kira, 1945). The values for these environmental factors were converted to values covering the entire southern part of the Korean Peninsula by linear interpolation, accounting for topography and altitude, using ArcGIS program (ver. 10.0). The mean value of the environmental factors in each grid cell was then calculated and used in the analysis.

As physical factors affecting plant distribution, parent materials, topography, effective soil depth, and soil texture in the southern part of the Korean Peninsula were used (Rural Development Administration, 2010). Parent materials were categorized as acidic rock, metamorphic rock, sedimentary rock, quaternary deposit, volcanic ash, and other. Topography was categorized into mountain, hill, pediment, interrill area, fan, lava terrace, or other. Effective soil depth was categorized into four classes (<20, 20–50, 50–100, and >100 cm). Soil texture was categorized as sandy gravel, silt and sandy loam, clay loam, and clay.

To test the effect of environmental factors on the floristic composition and zonation, the geographic and climatic (mean annual temperature, annual precipitation, warmth index, and coldness index) data were analyzed using the one-way analysis of variance (ANOVA) and Tukey’s test. The categorical physical factors (parent materials, topography, effective soil depth, and soil texture) were analyzed using box plots for each zone. The “ggplot2” R package was used for data visualization. Statistical analyses were performed using R.


Korea National Arboretum, Award: KNA1-2-26, 16-4; KNA1-2-32, 18-3

Korea National Arboretum, Award: KNA1-2-26, 16-4; KNA1-2-32, 18-3