TY - JOUR
T1 - Environmental awareness in machines
T2 - a case study of automated debris removal using Generative Artificial Intelligence and Vision Language Models
AU - Chan, Jolly P.C.
AU - Ho, Heiton M.H.
AU - Wong, T. K.
AU - Ho, Lawrence Y.L.
AU - Cheung, Jackie
AU - Tai, Samson
N1 - Publisher Copyright:
© 2024 The Hong Kong Institution of Engineers.
PY - 2024/12/10
Y1 - 2024/12/10
N2 - Water channels play a crucial role in stormwater management, but the build-up of debris in their grilles can lead to flooding, endangering humans and animals, properties, and critical infrastructure nearby. While automated mechanical grab systems are necessary for efficient debris removal, their deployment in outdoor environments has been non-existent due to safety concerns. Here we report the successful use of Generative Artificial Intelligence (GenAI) and a Vision Language Model (VLM) to endow an automated mechanical grab with “awareness”, which allows it to differentiate between non-living and living objects, deciding whether to initiate or abort grabbing actions. The existing approaches such as YOLOv7 only achieve a sensitivity of 86.94% (95% CI: 83.44% to 89.93%) in detecting humans and specified animals. They systematically miss crouching workers and animals facing away from the cameras. Grounding DINO (VLM) can achieve a sensitivity of 100% (95% CI: 99.17% to 100.00%) and a specificity of 85.37% (95% CI: 77.86% to 91.09%). Together with BLIP-2 (GenAI), it acquires “awareness”, allowing it to detect animals beyond those specified. This opens up possibilities for the application of GenAI/VLM in automation sectors where human-machine mingling occurs, such as manufacturing, logistics, and construction. This innovation can potentially improve the safety and efficiency in these domains.
AB - Water channels play a crucial role in stormwater management, but the build-up of debris in their grilles can lead to flooding, endangering humans and animals, properties, and critical infrastructure nearby. While automated mechanical grab systems are necessary for efficient debris removal, their deployment in outdoor environments has been non-existent due to safety concerns. Here we report the successful use of Generative Artificial Intelligence (GenAI) and a Vision Language Model (VLM) to endow an automated mechanical grab with “awareness”, which allows it to differentiate between non-living and living objects, deciding whether to initiate or abort grabbing actions. The existing approaches such as YOLOv7 only achieve a sensitivity of 86.94% (95% CI: 83.44% to 89.93%) in detecting humans and specified animals. They systematically miss crouching workers and animals facing away from the cameras. Grounding DINO (VLM) can achieve a sensitivity of 100% (95% CI: 99.17% to 100.00%) and a specificity of 85.37% (95% CI: 77.86% to 91.09%). Together with BLIP-2 (GenAI), it acquires “awareness”, allowing it to detect animals beyond those specified. This opens up possibilities for the application of GenAI/VLM in automation sectors where human-machine mingling occurs, such as manufacturing, logistics, and construction. This innovation can potentially improve the safety and efficiency in these domains.
KW - Artificial intelligence
KW - computer vision
KW - debris clearance
KW - flood management
KW - machine learning model
KW - object detection
UR - http://www.scopus.com/inward/record.url?scp=85212971682&partnerID=8YFLogxK
U2 - 10.33430/V31N4THIE-2024-0052
DO - 10.33430/V31N4THIE-2024-0052
M3 - Journal article
AN - SCOPUS:85212971682
SN - 1023-697X
VL - 31
JO - HKIE Transactions Hong Kong Institution of Engineers
JF - HKIE Transactions Hong Kong Institution of Engineers
IS - 4
M1 - 2024005
ER -