Combining reinforcement learning with supervised deep learning for neural active scene understanding

Roost, Dano; Meier, Ralph; Toffetti Carughi, Giovanni; Stadelmann, Thilo

doi:10.21256/zhaw-20419

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-20419

Full metadata record

DC Field	Value	Language
dc.contributor.author	Roost, Dano	-
dc.contributor.author	Meier, Ralph	-
dc.contributor.author	Toffetti Carughi, Giovanni	-
dc.contributor.author	Stadelmann, Thilo	-
dc.date.accessioned	2020-08-31T08:09:44Z	-
dc.date.available	2020-08-31T08:09:44Z	-
dc.date.issued	2020-08-31	-
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/20419	-
dc.description	Awarded with the Dr. Waldemar Jucker award 2020 of the GST	de_CH
dc.description.abstract	While vision in living beings is an active process where image acquisition and classification are intertwined to gradually refine perception, much of today’s computer vision is build on the inferior paradigm of episodic classification of i.i.d. samples. We aim at improved scene understanding for robots by taking the sequential nature of seeing over time into account. We present a supervised multi-task approach to answer questions about different aspects of a scene such as the relationship between objects, their quantity or the their relative positions to the camera. For each question, we train a different output head which operates on input from one shared recurrent convolutional neural network that accumulates information over time steps. In parallel, we train an additional output head using reinforcement learning (RL) that uses the reduction in cumulative loss from the supervised heads as reward signal. It thereby learns to gradually improve the prediction confidence of e.g. partially occluded objects by moving the camera to a more favourable angle with respect to these objects. We present preliminary results on simulated RGB-D image sequences that show superior performance of our RL-based approach in answering questions quicker and more accurately than using static or random camera movement.	de_CH
dc.language.iso	en	de_CH
dc.publisher	University of Essex	de_CH
dc.rights	Licence according to publishing contract	de_CH
dc.subject	Active Vision	de_CH
dc.subject	Deep Learning	de_CH
dc.subject	Reinforcement Learning	de_CH
dc.subject	Neural Scene Understanding	de_CH
dc.subject	Robotic Grasping	de_CH
dc.subject	Computer Vision	de_CH
dc.subject.ddc	006: Spezielle Computerverfahren	de_CH
dc.title	Combining reinforcement learning with supervised deep learning for neural active scene understanding	de_CH
dc.type	Konferenz: Paper	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.21256/zhaw-20419	-
zhaw.conference.details	Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), online, 31 August - 4 September 2020	de_CH
zhaw.funding.eu	No	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.publication.status	acceptedVersion	de_CH
zhaw.publication.review	Peer review (Publikation)	de_CH
zhaw.webfeed	Datalab	de_CH
zhaw.webfeed	Information Engineering	de_CH
zhaw.webfeed	ZHAW digital	de_CH
zhaw.webfeed	Machine Perception and Cognition	de_CH
zhaw.author.additional	No	de_CH
zhaw.display.portrait	Yes	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2020_Roost_Combining_reinforcement_learning_with_supervised_deep_learning.pdf	Accepted Version	1.52 MB	Adobe PDF	View/Open

Show simple item record

Roost, D., Meier, R., Toffetti Carughi, G., & Stadelmann, T. (2020, August 31). Combining reinforcement learning with supervised deep learning for neural active scene understanding. Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), Online, 31 August - 4 September 2020. https://doi.org/10.21256/zhaw-20419

Roost, D. et al. (2020) ‘Combining reinforcement learning with supervised deep learning for neural active scene understanding’, in Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), online, 31 August - 4 September 2020. University of Essex. Available at: https://doi.org/10.21256/zhaw-20419.

D. Roost, R. Meier, G. Toffetti Carughi, and T. Stadelmann, “Combining reinforcement learning with supervised deep learning for neural active scene understanding,” in Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), online, 31 August - 4 September 2020, Aug. 2020. doi: 10.21256/zhaw-20419.

ROOST, Dano, Ralph MEIER, Giovanni TOFFETTI CARUGHI und Thilo STADELMANN, 2020. Combining reinforcement learning with supervised deep learning for neural active scene understanding. In: Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), online, 31 August - 4 September 2020. Conference paper. University of Essex. 31 August 2020

Roost, Dano, Ralph Meier, Giovanni Toffetti Carughi, and Thilo Stadelmann. 2020. “Combining Reinforcement Learning with Supervised Deep Learning for Neural Active Scene Understanding.” Conference paper. In Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), Online, 31 August - 4 September 2020. University of Essex. https://doi.org/10.21256/zhaw-20419.

Roost, Dano, et al. “Combining Reinforcement Learning with Supervised Deep Learning for Neural Active Scene Understanding.” Active Vision and Perception in Human(-Robot) Collaboration Workshop at IEEE RO-MAN 2020 (AVHRC’20), Online, 31 August - 4 September 2020, University of Essex, 2020, https://doi.org/10.21256/zhaw-20419.