Context
Geographers are particularly interested in analysing how the social implications of seismic activity, such as casualties and property destruction, can impact local communities. In light of this inquiry, their objective is to develop practical solutions that counteract these adverse outcomes. Thus, the revision of emergency response systems is vital to improving preparation and recovery efforts in earthquake-prone areas. However, the challenge of disaster management revolves around the demand for accuracy. Given the prevalence of human error and the time-consuming nature of manual data entry, attempts to alleviate the social consequences of earthquakes can be difficult. Geographers could, therefore, benefit from utilising machine learning algorithms to overcome these limitations. (Lavallin and Downs 2021, 7) define machine learning as “the process of teaching a computer to perform a specific task by enhancing its performance through experience”. According to this definition, productivity can be increased while determining effective mitigation procedures by administering historical seismic data through a trained algorithm.
Therefore, this article will discuss how machine learning algorithms can be integrated into geography by improving earthquake mitigation strategies. The following section will explore two algorithms – anomaly detection and random forest – and contextualise these processes within an environmental disaster, demonstrating their ability to isolate devastated areas and produce evacuation routes. Furthermore, this article will consider the costs and benefits of these algorithms, shedding light on how geographers might use them to devise mitigation strategies. The article will conclude by critically examining the potential ethical implications of incorporating data science tools into geography.
Machine Learning And Earthquake Mitigation Strategies
Earthquakes are incredibly destructive events, and predicting when they occur has proven to be one of the most significant issues currently facing seismologists (Zhang et al. 2021). Accuracy is crucial for effective disaster management, as misinterpreting seismic waves can generate false warning alerts and instigate mass panic. According to Chin et al. (2019), conventional outlier detection methods often disregard the possibility that seismic activity could occur outside predefined statistical boundaries and in locations with distinctive geological features. Geographers are therefore encouraged to employ machine learning algorithms in post-disaster contexts to withstand the anticipated impact of an earthquake, mainly because it is challenging to forecast and take preventive action in advance of sporadic tragedies.
Anomaly Detection
To facilitate emergency response efforts, Mabu, Fujita, and Kuremoto (2019) proposed an anomaly detection method that distinguishes abnormal behaviour following a natural disaster. The researchers of this paper argue that data science tools could be leveraged to pinpoint vulnerable areas by building a model that utilises unsupervised learning techniques, including convolutional autoencoder (CAE) and one-class support vector machine (OCSVM). By discerning deviations from an area’s normal state, the model can identify abnormal (disaster) areas shortly after an earthquake strikes.
The synthetic aperture radar (SAR) images, which form the basis of this outlier detection mechanism, are available to researchers through various open data sources, including Stanford University’s InSAR Scientific Computing Environment. This software captures satellite images worldwide using the Advanced Land Observing Satellite-2 (ALOS-2), which enables real-time observations of the Earth’s surface area (Mabu, Fujita, and Kuremoto 2019). However, the data from this source needs to be preprocessed prior to being used for training data science tools. SAR images present an arduous task as they are susceptible to terrain-induced effects, such as shadow regions, making the identification of disaster sites a complex process (Haddad, Abdelfattah, and Ajili 2012). Therefore, researchers must employ shadow filtering techniques to eliminate these obscurities from remote sensing data, in order to reveal the underlying properties of a region left distorted by dark silhouettes. In addition, SAR images typically require geometric corrections to amend errors caused by landscape reliefs. Geospatial experts must orthorectify SAR data in order to standardise radar image comparisons by using a constant georeferenced coordinate system (Haddad, Abdelfattah, and Ajili 2012). This rectification process removes topographic distortions, thus making it possible for the precise analysis of spatial data.
In their research, Mabu, Fujita, and Kuremoto (2019, 48) demonstrate how CAE is a neural network that uses input data composed of SAR images to perform “feature extraction” and is trained to replicate this data in an output layer. These extracted features are then used to train OCSVM, which sets a parameter using satellite images to determine the proportion of ‘abnormal’ and ‘normal’ areas. By training the outlier detection algorithm with SAR images, the model can identify disaster areas by contrasting them with their normal conditions and classifying regions as either anomalous or normal (Bai et al. 2017). Subsequently, the real-time monitoring of radar images enables geographers to locate impacted areas and elicit the need for emergency response efforts to seismic activity. However, researchers must be cautious when defining OCSVM’s parameter as there is a “trade-off” between the production of false positives and negatives, which could result in an error in detecting the true extent of an earthquake’s impact (Mabu, Fujita, and Kuremoto 2019, 49).
Nonetheless, it is essential to note that OCSVM is predicated on the notion that anomalies are rare occurrences that deviate significantly from standard patterns (Ji, Liu, and Buchroithner 2018). This suggests that OCSVM could fail to accurately distinguish the different types of irregularities that emerge during an earthquake, which could lead to inaccurate conclusions. Consequently, this data science tool might not be practical for discerning evolving or minor anomalies that do not exhibit alarming deviations from the initial images used to train the algorithm (Ji, Liu, and Buchroithner 2018). Despite this limitation, the advantage of utilising data science tools for disaster area detection lies in their versatility. OCSVM can process large datasets and adapt to changing conditions, which is imperative during calamities because a substantial amount of data is produced by satellite imagery (Li, Xu, and Guo 2010). Geographers could thus benefit from this model for developing mitigation strategies because it can accommodate rapid environmental shifts through real-time monitoring and pinpoint abnormal regions in need of aid.
Random Forest
Rescue operations were hindered during the 2016 Kumamoto earthquake, as many evacuees were unaware of the proper evacuation routes and sought refuge in undesignated spaces (Furukawa and Koshimizu 2021). Navigation systems also influenced this outcome by suggesting directions based on the shortest distance to an evacuation centre. Furukawa and Koshimizu (2021) brought attention to how these systems also failed to consider hazardous road conditions and potential accidents during post-disaster evacuations. This indicates that opting for the shortest route is not necessarily the most appropriate or safest option. Therefore, geographers can use machine learning algorithms, such as random forests, to optimise evacuation routes.
Data can be acquired for the random forest algorithm using open data sources supplied by municipal governments to ascertain factors influencing pre-evacuation behaviour. For example, the Fire and Disaster Management Agency in Japan publishes statistics on evacuations, including details on previous events and compliance rates, which also contain demographic information about evacuees (Okumura and Tokuno 2015). Although government bodies may supply this intel, it could feature missing values, affecting how accurately the model generates functional routes. In these circumstances, it is recommended to eliminate sections where a substantial amount of data is omitted or substitute missing values with estimations (Lu et al. 2022). However, it is crucial to be wary when imputing data because doing so could significantly sway the model’s output.
The random forest model consists of numerous decision trees, each of which has been trained using a specific subset of data and is representative of a unique independent variable, such as age or proximity to evacuation shelters (Han et al. 2020). Each node of the decision tree divides based on the most suitable evacuation route as it evolves, branching out in accordance with its trained input until the stopping criteria of a maximum number of nodes have been achieved. As a result, this algorithm enables the discovery of recurring trends and relationships between input variables and the output by identifying practical routes and considering different factors that could potentially influence decision-making during the evacuation process (Lu et al. 2022). This exemplifies how the algorithm could be implemented in earthquake mitigation strategies to guide people along the safest pathways and maximise relief efforts, thus contributing to disaster management.
Zhao, Lovreglio, and Nilsson (2020) argue that random forest algorithms are relatively robust because they can filter out insignificant values and accommodate an array of variables, including categorical and numerical data. However, given the complexity of the ensemble of decision trees, it could be taxing to interpret the individual tree outputs (Zhao, Lovreglio, and Nilsson 2020). Furthermore, the transparency of the model may be compromised since it could be difficult to understand the exact decision-making processes that underlie each route proposal. Despite this limitation, the algorithm remains a valuable tool for refining disaster management because the depth of the trees is greater than the prediction abilities of a single decision tree (Lu et al. 2022). Thus, this demonstrates how geographers could benefit from machine learning to calculate optimal evacuation routes during an earthquake.
Ethical Implications
While data science tools are pivotal in advancing research in geography, they also pose various ethical implications surrounding data collection and use. Geospatial technologies capture visual data through satellite devices, which entails an invasion of privacy (Berman, La Rosa, and Accone 2018). This is because people are unaware that remote sensing technologies are inadvertently being used to collect their data through monitoring private spaces and capturing sensitive information, thus infringing upon an individual’s right to privacy (Gilman 2014). Furthermore, this raises concerns about informed consent because there are few opportunities for people to consent to this form of data collection and the subsequent use of this information for analysis and distribution purposes, including training data science tools (Berman, La Rosa, and Accone 2018).
Although these methods of data collection encroach upon privacy rights, one could argue that ethical misconduct is justified in certain situations, such as disaster management. Geospatial data, albeit obtained covertly, is essential for data science tools since it helps detect areas at risk of harm and expedite rescue efforts (Müller et al. 2016). Therefore, it could be argued that the ethical implications of data science tools are superseded in cases where fatalities are of greater concern than individual privacy. Furthermore, using data gathered for alternative purposes highlights the importance of informed consent, as people need to be made aware of how their personal data is being processed, stored and used for ulterior motives by third parties (Froomkin 2019). Therefore, when using secondary data for earthquake mitigation measures, geographers must be diligent and uphold transparency about the intentions of their work and how their chosen data science tools will process this information.
In summary, machine learning algorithms have the potential to impact and transform the field of geography profoundly. These processes can help geographers advance spatial analysis by allowing them to work with complex datasets and unveil connections between variables. Furthermore, this article has illustrated how researchers can use machine learning to identify disaster areas and formulate evacuation routes. It achieves this by applying anomaly detection and random forest models to seismic conditions. A natural disaster also generates extensive data, establishing a mutually beneficial relationship that can improve emergency response systems and enhance an algorithm’s predictive abilities. As these models train on the influx of data resulting from earthquakes, they can become more adept at informing decision-making processes for mitigation strategies (Han et al. 2020).
These tools could help improve mitigation procedures but also presented several ethical implications regarding informed consent and privacy. However, securing access and consent from individuals can be problematic in the case of geospatial data due to the vast amount of data generated. Despite this restriction, researchers must handle such information with caution and endeavour to inform the public about the collection and use of their data. By addressing these concerns, geographers can reap the benefits of machine learning while maintaining ethical conduct and safeguarding privacy. As a result, this article has analysed the significance of data science tools in revolutionising geographic research and highlighted the potential limitations of these techniques.
Using AI: ChatGPT
The article incorporated the use of ChatGPT, which was used to edit and proofread the text. This involved finding synonyms and revising the flow of specific sentences to ensure they were comprehensive for a general audience. As a result, roughly 10% of this blog piece was produced by the AI tool. However, the majority of the text was authored by me and paraphrased numerous times to prevent any instances of plagiarism.