Master Thesis Plus Talpa Internship Opportunities in Big Data and Artificial Intelligence

Mentor at TiU: Francesco Lelli 

Mentors at Talpa: Anca Dumitrache and/or Ricardo Fabian Guevara

Talpa Internship
Talpa Internship Opportunities

There are some internships opportunities available at Talpa. You will have the opportunity to develop your master’s thesis in collaboration with the AI division of one of the most innovative media companies in the Netherlands .

If you have a go-get attitude with the desire to expand your knowledge and expertise in the area of big data and artificial intelligence, this is probably the internship that you are looking for.

Topics for this thesis include Natural Language Processing (NLP), Machine Learning and Predictors. At the same time you are encouraged to propose your own idea as well.

Knowledge and Skills:

Programming, preferably Python, and/or statistical skills.

Interested in Joining Talpa?

Here you can find a short presentation of the company:

Where to Read More:

The following ideas are possible internship opportunities and related topic for your thesis. You may want to use them for forming an idea of the kind of jobs that you will be doing at Talpa as well as for developing your research proposal (here you can find a few tips for that).

Swimlane for Trending on Social Media

Both KIJK (video streaming platform) and JUKE (audio streaming platform) present their content separated into swimlanes on the front page. For instance, one swimlane contains a list of TV shows that were popular in the previous days. We would like to create a new swimlane that contains items that were trending on social media. Taking Twitter as a data source, the project will go through the following steps:

  • (1) extract tweets about popular TV shows and/or radio shows,
  • (2) perform entity linking to match them to the shows in our database,
  • (3) aggregate the results to get the most popular shows in one swimlane.

Project: KIJK and/or JUKE

Automatic Teaser Tweet Creation

Starting with a (textual) description of a TV show episode or radio program, we would like to generate teaser tweets about the show that are meant to generate anticipation on social media. The underlying task would be a summarization problem, where the program description is mapped to a short tweet about it. The tweet should contain relevant information about the program, but not reveal any spoilers. Either an extractive (entity + relation extraction) or abstractive method could be applied.

Source: https://www.aclweb.org/anthology/N19-1398/

Project: KIJK and/or JUKE

Exploring Pair-wise Learning-to-Rank

Talpa current recommender system uses ALS, a point-wise learning-to-rank approach, where the learning objective is based on modeling the score of a given item (i.e. similarly to how regression works). Alternative methods of doing learning-to-rank are pair-wise (learning objective is to model the ranking of a pair of items relative to each other) and list-wise learning to rank (learning objective is calculated over the entire list of items). The project goal is to investigate different learning objectives and find out:

  • (1) how they perform relatively to the point-wise method,
  • (2) if there are subsets of data where this method works better/worse.

Source:https://medium.com/@nikhilbd/pointwise-vs-pairwise-vs-listwise-learning-to-rank-80a8fe8fadfd

Project: KIJK and/or JUKE

Automatic Playlist Generation

JUKE music player features a lot of non-stop music playlists that are manually created by an editor, usually selecting music from a given genre (e.g. hard rock non-stop radio). We would like to see whether these playlists can be generated by AI and what the quality is. This can be approached as a song clustering problem, where the feature space could contain the genre, artist, as well as other audio features.

Project: JUKE

Google AdWords for Video

Advertisers can buy Google search keywords to show their ad in conjunction with them. We would like to see whether it is possible to do something similar within videos, too. In this way, contextual advertising in different medias becomes feasible. The steps involved in this project are:

  • (1) generate textual metadata of video (either based on transcript, or other features of the video; there might be some existing video metadata as well),
  • (2) match video textual metadata with ad keywords.

Exploring Seasonal Trends as RecSys Features

Our domain experts know that both video and audio streaming trends are highly influenced by seasonality (e.g. Sky Radio becomes very popular over Christmas). The goal of this project is to:

  • (1) identify these trends by studying user listening data, then
  • (2) incorporating these trends into our prediction model (e.g. recommend Sky Radio to fans of Christmas music, but only over the Christmas holidays).

Challenge: a lot of data may be needed to accomplish this.

Project: KIJK and/or JUKE

Radio Station Embeddings

Goal: produce embedding with various sizes similar to Glove but for radio stations that reflect radio station similarity with euclidean distance.

Value: many of Talpa challenges, like radio to radio similarity, could be tackled with this approach. Our RecSys can also use the radio embedding as a feature to determine if a radio channel is a good match for a user. This could also greatly alleviate the cold start problem when we introduce new radio stations.

Approach: this is an unexplored problem but there has been previous research and practical work done on audio embeddings. In the simplest form, a radio station embedding could be just a bag of features like genre frequencies, languages and origins of artists, but also properties from the target audience like age range. Going beyond that, it could use an average of the embeddings of representative songs on it in a given past period. Another valuable asset of Talpa is its understanding of usage behavior, so radio station features could also be calculated based on the type of person who listens to it, the frequency of listening and other seasonal behavior.

At the same time, there is a temporal aspect to take into consideration when analyzing user preference of radio stations. Therefore, we cannot expect the embeddings to remain the same forever. Consequently a refresh period is to be expected (even word embeddings need to account for word semantic drifting, only their refresh window is larger).

Project: Juke

Podcast Embeddings for JUKE

Similar to the radio embeddings project, Talpa is interested in creating embeddings for the podcasts available on JUKE. Podcast recommendations suffer from cold start problem even more than radio, since there is usually a high volume of items that are published continuously. The features that can be used for podcast embeddings are also slightly different than typical audio embedding features based on e.g. musical genre.

Project: Juke

Interested in a Talpa internship? Check also the Following:

Webinar about Talpa Internship Opportunities (Available for Tilburg Student Only)

Interested in Applying?

Send an email to Francesco Lelli with CV, Short motivation letter, project(s) that interest you and (optional) draft research proposal. In the case you would like to propose a particular project please put extra attention to your research proposal.

A selected list of candidates will be interviewed by TALPA that has the ultimate saying on accepting you for an internship. Succes, Good luck, In Bocca al Lupo!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.