Movie Recommendation Model.

Photo by Denise Jans on Unsplash

Movie Recommendation Model.

Table of contents

What is a Recommender System?

Recommender systems are software or algorithms that provide personalized recommendations based on user behaviour and preferences. They are used in various platforms to improve user experience and engagement. There are different types of recommender systems, including collaborative filtering, content-based filtering, hybrid systems, matrix factorization, deep learning-based systems, context-aware systems, and reinforcement learning-based systems. The effectiveness of these systems depends on data quality, algorithm choice, and their ability to handle scalability and new user/item scenarios.

Types of recommender systems:

  • Collaborative Filtering: Recommends items based on the preferences and behaviour of other users

  • Content-Based Filtering: Recommends items based on their characteristics and the user's historical preferences

  • Hybrid Recommender Systems: Combines collaborative and content-based filtering techniques

  • Matrix Factorization: Factors user-item interaction data to understand underlying patterns

  • Deep Learning-Based Recommenders: Uses neural networks to capture complex patterns in user behaviour and content

  • Context-Aware Recommenders: Considers additional contextual information for more relevant recommendations

  • Reinforcement Learning-Based Recommenders: Optimizes recommendation strategies using user feedback

Applications of recommender systems:

  • E-commerce

  • Streaming services

  • Social media

Effectiveness of recommender systems:

  • Depends on data quality, algorithm choice, and handling issues like the cold start problem and scalability

Project Flow

For now let's focus on Data, Pre-Processing and Model Building with a simple Website.

DataSet used is from Kaggel :

Let's do some simple coding:

  1. Import necessary packages:

     import numpy as np
     import pandas as pd
     import nltk
     from sklearn.feature_extraction.text import CountVectorizer
     from nltk.stem.porter import PorterStemmer
     from sklearn.metrics.pairwise import cosine_similarity
     import pickle
    
    1. Read the data set files which are in CSV format:

       credits = pd.read_csv('tmdb_5000_credits.csv')
       movies = pd.read_csv('tmdb_5000_movies.csv')
      
      1. To read the contents inside the dataset: movies.head(), here movies is the dataset name.

        1. To make the readability of the dataset more easy we have to merge the two data sets with a common column, where here it is Title

          movies = movies.merge(credits,on='title')

          1. Now we have to read the list in a better format: Using this

            def convert(text): L = [] for i in ast.literal_eval(text): L.append(i['name']) return L

            import ast ast.literal_eval('[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]')

            1. Now using the above function we have to convert all the columns to a better-formatted list.

              1. Next, in the cast and crew column there are many numbers of data, so to simplify the reading we will extract the top 3 data of both columns.

                def convert3(text): L = [] counter = 0 for i in ast.literal_eval(text): if counter < 3: L.append(i['name']) counter+=1 return L

                movies['cast'] = movies['cast'].apply(lambda x:x[0:3])

            2. Now with crew, we will extract the top directors:

              def fetch_director(text): L = [] for i in ast.literal_eval(text): if i['job'] == 'Director': L.append(i['name']) return L

              movies['crew'] = movies['crew'].apply(fetch_director)

            3. Removing extra space(White Spaces):

              movies['genres']= movies['genres'].apply(lambda x:[i.replace(" ","") for i in x]) movies['keywords']= movies['keywords'].apply(lambda x:[i.replace(" ","") for i in x]) movies['cast']= movies['cast'].apply(lambda x:[i.replace(" ","") for i in x]) movies['crew']= movies['crew'].apply(lambda x:[i.replace(" ","") for i in x]

            4. Creating tags: Tags are where apart from the columns movie id and title rest columns will be merged to form a paragraph, meaning full easy to understand.

              movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']

              new_df = movies[['movie_id','title','tags']]

              new_df['tags']= new_df['tags'].apply(lambda x:" ".join(x))

              new_df['tags'].apply(lambda x:x.lower())

              new_df['tags']= new_df['tags'].apply(lambda x:x.lower())

               from sklearn.feature_extraction.text import CountVectorizer
               cv = CountVectorizer(max_features=5000, stop_words='english')
               cv.fit_transform(new_df['tags']).toarray()
               vectors = cv.fit_transform(new_df['tags']).toarray()
               cv.get_feature_names_out()
               import nltk
               from nltk.stem.porter import PorterStemmer
               ps = PorterStemmer()
               def stem(text):
                   y = []
                   for i in text.split():
                      y.append( ps.stem(i))
                   return " ".join(y)
              
  1. Using stem:

     new_df['tags'].apply(stem)
     new_df['tags']= new_df['tags'].apply(stem)
     from sklearn.metrics.pairwise import cosine_similarity
     similarity = cosine_similarity(vectors)
     sorted(list(enumerate(similarity[0])), reverse=True, key = lambda x:x[1])[1:6]
    
     def recommend(movie):
         movie_index = new_df[new_df['title']==movie].index[0]
         distances = similarity[movie_index]
         movie_list = sorted(list(enumerate(distances)), reverse=True, key = lambda x:x[1])[1:6]
         for  i in movie_list:
             print(new_df.iloc[i[0]].title)
    
     import pickle
     pickle.dump(new_df, open('movies.pkl','wb'))
     new_df['title'].values
     pickle.dump(similarity,open('similarity.pkl','wb'))
     pickle.dump(new_df.to_dict(),open('movie_dict.pkl','wb'))
    

Last is the Front End : With a simple library Streamlit :

                import streamlit as st
                import pickle
                import pandas as pd

                def recommend(movie):
                    movie_index = movies[movies['title'] == movie].index[0]
                    distances = similarity[movie_index]
                    movie_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]

                    recommended_movies = []
                    for i in movie_list:
                        recommended_movies.append(movies.iloc[i[0]].title)
                    return recommended_movies


                movies_dict = pickle.load(open('movie_dict.pkl','rb'))
                movies = pd.DataFrame(movies_dict)

                similarity = pickle.load(open('similarity.pkl','rb'))

                st.title('Movie Recommendation System')

                selected_movie_name = st.selectbox(
                    'Which movie you want to watch?',
                    movies['title'].values)

                if st.button('Recommend'):
                    recommendations = recommend(selected_movie_name)
                    for i in recommendations:
                        st.write(i)