There is something that Facebook has that most of the other companies carrying out cutting edge research in the field of Artificial Intelligence Deep Learning don’t. This immediately gives Facebook an edge and it is an access to billions of images on their social photo sharing service – Instagram.
It was presented by Facebook at F8, the annual developer conference of Facebook, the information regarding how they are using billions of photos shared publicly by users on Instagram, which have been commented with related hashtags, to help their deep-learning models learn. Support of multitude of Graphics Processing Units aka GPUs was taken to analyze the data, which in the end provided the models that surpassed the standards of the industry. The top end models among these managed to achieve accuracy levels of 85.4 on image database ImageNet.
As there is no standard process of putting up hashtags on photos, the major problem encountered by Facebook was in identifying relevant information from the multitude of images. The most exhaustive of the tests made use of 3.5 billion images and concerned 17000 hashtags. At this large a scale, as it was not possible to personally supervise and filter the data, the company had to develop techniques to scrub the data received from users.
One of the starting phases of research, also termed as pre-training research was intended to discover concerned hashtags, finding hashtags carrying similar meaning and teaching the algorithm to give priority to some particular hashtags over generic ones. This research has culminated in what is being termed by team as large-scale hashtag prediction model.
It is important to highlight here that these models and the research is intended to image recognition for objects. It is not being used to predict specific interests of users or identifying what makes some images better than others.