Annonce

19 novembre 2024

PhD position "Toward Proactive Intelligence: Environmental Contextual Information and Gesture Recognition for Characterizing Affordances in Human-System Interactions"

Catégorie : Doctorant

CESI LINEACT has an open position for a PhD student to develop affordance caracterization algorithms based on contextual information. The position is located on CESI Campus Dijon.

Toward Proactive Intelligence: Environmental Contextual Information and Gesture Recognition for Characterizing Affordances in Human-System Interactions

Scientific fields: Artificial Intelligence, Computer Vision, Robotics

Keywords: Context-based object affordance, Object recognition, Gesture recognition/evaluation.

Supervision:

PhD supervisor:

Yuehua DING, Associate professor, HDR

Co-supervisors:

Youssef MOURCHID & Nicolas RAGOT, Associate professors

Research work

Scientific context & Motivations

Vision systems, previously confined to industrial settings, are now used in open environments such as homes, hospitals, and public spaces. This shift presents new challenges, particularly in Human-System Interaction (HSI), necessitating vision systems capable of more flexible and adaptive decision-making. Despite advancements in mechatronics and artificial intelligence (AI), challenges remain, particularly in interpreting human intentions and intuitively responding to unforeseen situations.

Gibson (1979) [1] and Norman (2013) [2] introduced the concept of "affordance," defined as the action possibilities an object offers to an agent based on its physical properties and the agent's abilities [5,6,9]. Affordances provide a promising means to enhance Human-System Interactions (HIS) [3,4]. However, ambiguity arises because a single object may have multiple affordances. In human-robot scenarios, where a robot must provide objects for specific tasks, such ambiguities are unacceptable as they degrade interactions.

Environmental context information plays a key role because it must allow for resolving doubts about the system's (here, a robot's) choice of certain affordances. The contextual information we are referring to includes both geometric, semantic, and map-related information of other objects positioned around the scene, as well as information related to the human, such as their gestures, actions taken, or those to come.

At CESI LINEACT, the "Engineering and Digital Tools" team has conducted research on affordances within Human-System Interaction, focusing on operator-device interactions for industrial assembly, dismantling, or maintenance tasks. Building on Simonian's definition of affordance, the research emphasizes identifying the minimal data set required to characterize object affordance from a human perspective.

This thesis complements these efforts by focusing on robot-centered affordances. The scientific challenge is to resolve ambiguity in affordance selection during human-robot interactions, leveraging contextual environmental data to improve interaction fluidity and intuitiveness. Two use cases are prioritized: 1) Collaborative human-robot industrial assembly [7]; 2) Assistance collaboration in healthcare between caregivers and robots [8].

Thesis subject

Scientific challenges

Understanding and Leveraging Object Affordances: Learning affordances relies on the ability of vision systems to explore their environment and deduce possible actions with objects. This involves autonomous exploration and the acquisition of sensorimotor data, along with the development of statistical models to identify and generalize object affordances. A major challenge is enabling vision systems to manipulate objects they have not specifically been trained to recognize and to adapt to objects with varied characteristics.
Gesture Recognition in Uncertain Contexts: Human gestures, whether explicit (such as pointing at an object) or implicit (like a hand movement), are essential means of nonverbal communication in human-machine interactions. However, current systems struggle to recognize these gestures in dynamic environments where lighting and visibility conditions fluctuate. Furthermore, the lack of facial cues (such as emotional expressions) can complicate gesture interpretation. It is therefore crucial to develop robust algorithms capable of handling these uncertainties.
Anticipating Human Intentions: One of the major challenges in human-machine interactions is the ability of systems to anticipate human actions before they are completed. This anticipation relies on the combination of multiple cues, including gestures, object affordances, and the interaction context. Thus, it is necessary to design probabilistic models that allow vision systems to reason about human intentions and act accordingly, even before the action is fully executed.
Multimodal Information Integration: Human-machine interactions involve multiple modes of communication, such as gestures and object affordances. A key scientific challenge is the coherent integration of these diverse sources of information into a unified framework, enabling vision systems to correctly interpret human intentions and respond appropriately.

Thesis Objectives

1. Developing a Theoretical Framework for Affordance Learning:

Create a modular framework for autonomous affordance learning by exploring object and semantic context.
Use sensory-motor data and statistical models to generalize human-robot interactions.
Demonstrate algorithm efficiency in practical scenarios through implementation on real robots.

2. Gesture Recognition and Affordance Integration for Action Anticipation:

Develop algorithms to recognize gestures in dynamic environments and integrate object affordances into this understanding.
Enhance vision systems' ability to anticipate human actions, fostering proactive and fluid human-robot collaboration.

3. Practical Implementation and Experimental Validation:

Validate theoretical models using datasets like INHARD [10] to demonstrate efficacy in dynamic, uncertain environments.

Work Plan

These research works will be carried out in CESI, campus Dijon and will organized as follows:

Step 1 (6 months): Conduct an in-depth literature review on the subject. Work with the data to test and implement methods from the literature. Receive training on using the laboratory's equipment (Kinect camera, sensors, etc.) and GPU server. The doctoral student will be supervised by Youssef MOURCHID and Nicolas RAGOT under the direction of Yuehua DING.
Step 2 (6 months): Develop a comprehensive understanding of the advantages and limitations of existing literature and the doctoral student’s own work at CESI, with the aim of proposing new approaches to evaluate object affordances. The student will then develop methods for object and human detection, combining them with gesture recognition/evaluation. This will build on existing foundational architectures, such as YOLO, LSTM, and STGCNs, addressing their limitations to improve the quality of evaluation results. At least one conference paper must be published during this period.
Step 3 (6 months): Enhance the proposed methods to ensure algorithm-hardware compatibility using GPU architectures. The doctoral student will leverage the expertise of the laboratory's research faculty to establish software-hardware iterations, optimizing the implementation of these methods in an intelligent system.
Step 4 (6 months): Validate the proposed methods through experiments in real-world conditions. These experiments should assess the improvements in object affordance evaluation and include a quantitative and qualitative analysis of the results. Findings from these experiments should be published in international journals and recognized conferences. At least one journal paper must be published during this period.
Step 5 (1 year): Write the thesis based on the work completed in the previous stages (state of the art, developed methods, obtained results, and real-world experiments). The thesis should propose improvements to the developed approaches and offer perspectives for future work.

Expected Scientific/Technical Outcomes

Two-journal articles submission.
Two published international conference papers.
Software modules addressing the thesis challenges and needs

Laboratory Presentation

CESI LINEACT (UR 7527), the Digital Innovation Laboratory for Businesses and Learning in support of Territorial Competitiveness, anticipates and supports technological transformations in sectors and services related to industry and construction. CESI's historical ties with businesses are a determining factor in its research activities, leading to a focus on applied research in partnership with industry. A human-centered approach coupled with the use of technologies, as well as regional networking and links with education, have enabled cross-disciplinary research that centers on human needs and uses, addressing technological challenges through these contributions.

Its research is organized into two interdisciplinary scientific teams and two application domains:

· Team 1, "Learning and Innovating," is primarily focused on Cognitive Sciences, Social Sciences, Management Sciences, Education Science, and Innovation Sciences. The main scientific objectives are understanding the effects of the environment, particularly instrumented situations with technical objects (platforms, prototyping workshops, immersive systems), on learning, creativity, and innovation processes.

· Team 2, "Engineering and Digital Tools," is mainly focused on Digital Sciences and Engineering. Its main scientific objectives include modeling, simulation, optimization, and data analysis of cyber-physical systems. Research also covers decision-support tools and studies of human-system interactions, especially through digital twins coupled with virtual or augmented environments.

These two teams cross and develop their research in the two application domains of Industry of the Future and City of the Future, supported by research platforms, primarily the Rouen platform dedicated to the Factory of the Future and the Nanterre platform dedicated to the Factory and Building of the Future.

Thesis organization

Workplace: Campus CESI Dijon
Start Date: 01/01/2025

Application process

Modalities: documents and interview. Submit the following documents to ymourchid@cesi.fr, nragot@cesi.fr, yding@cesi.fr with the subject "[Application] Thesis Title on Page 1".

Your application will contain:

Detailed CV (explain any gaps in education).
Motivation letter.
Master’s results and transcripts.
Optional GitHub link or additional supporting materials.
Name the zip file as "LASTNAME Firstname.zip".

Required skills:

Scientific and Technical skills:

· Proficiency in Python and C++.

· Advanced written/oral communication.

· Independence, rigor, and teamwork.

Relational Skills:

Be autonomous, with a sense of initiative and curiosity.
Ability to work in a team and maintain good interpersonal relationships.
Be meticulous and rigorous.

References

[1] Gibson, E. J. (2003). The world is so full of a number of things: On specification and perceptual learning. Ecological psychology, 15(4), 283-287.

[2] Norman, Donald A. (2013). The design of everyday things (Revised and expanded editions ed.). Cambridge, MA London: The MIT Press. ISBN 978-0-262-52567-1.

[3] Hassanin, Mohammed, Salman Khan, and Murat Tahtali. "Visual affordance and function understanding: A survey." ACM Computing Surveys (CSUR) 54.3 (2021): 1-35.

[4] Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., & Rogez, G. (2020). Ganhand: Predicting human grasp affordances in multi-object scenes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5031-5041).

[5] Deng, S., Xu, X., Wu, C., Chen, K., & Jia, K. (2021). 3d affordancenet: A benchmark for visual object affordance understanding. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1778-1787).

[6] Zhai, W., Luo, H., Zhang, J., Cao, Y., & Tao, D. (2022). One-shot object affordance detection in the wild. International Journal of Computer Vision, 130(10), 2472-2500.

[7] Eswaran, M., & Bahubalendruni, M. R. (2023). Augmented reality aided object mapping for worker assistance/training in an industrial assembly context: Exploration of affordance with existing guidance techniques. Computers & Industrial Engineering, 185, 109663.

[8] Kim, N. G., Effken, J. A., & Lee, H. W. (2022, May). Impaired affordance perception as the basis of tool use deficiency in Alzheimer’s disease. In Healthcare (Vol. 10, No. 5, p. 839). MDPI.

[9] Girish, D. S. (2020). Action Recognition in Still Images and Inference of Object Affordances (Doctoral dissertation, University of Cincinnati).

[10] Dallel, M., Havard, V., Baudry, D., Savatier, X., 2020. InHARD-Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics. Presented at the 2020 IEEE International Conference on Human-Machine Systems (ICHMS), IEEE, pp. 1-6. https://doi.org/10.1109/ICHMS49158.2020.9209531

Retour

Identification