Movie Recommendation Model.

What is a Recommender System?

Recommender systems are software or algorithms that provide personalized recommendations based on user behaviour and preferences. They are used in various platforms to improve user experience and engagement. There are different types of recommender systems, including collaborative filtering, content-based filtering, hybrid systems, matrix factorization, deep learning-based systems, context-aware systems, and reinforcement learning-based systems. The effectiveness of these systems depends on data quality, algorithm choice, and their ability to handle scalability and new user/item scenarios.

Types of recommender systems:

Collaborative Filtering: Recommends items based on the preferences and behaviour of other users
Content-Based Filtering: Recommends items based on their characteristics and the user's historical preferences
Hybrid Recommender Systems: Combines collaborative and content-based filtering techniques
Matrix Factorization: Factors user-item interaction data to understand underlying patterns
Deep Learning-Based Recommenders: Uses neural networks to capture complex patterns in user behaviour and content
Context-Aware Recommenders: Considers additional contextual information for more relevant recommendations
Reinforcement Learning-Based Recommenders: Optimizes recommendation strategies using user feedback

Applications of recommender systems:

E-commerce
Streaming services
Social media

Effectiveness of recommender systems:

Depends on data quality, algorithm choice, and handling issues like the cold start problem and scalability

Project Flow

For now let's focus on Data, Pre-Processing and Model Building with a simple Website.

DataSet used is from Kaggel :

https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata

Let's do some simple coding:

Import necessary packages:
```
 import numpy as np
 import pandas as pd
 import nltk
 from sklearn.feature_extraction.text import CountVectorizer
 from nltk.stem.porter import PorterStemmer
 from sklearn.metrics.pairwise import cosine_similarity
 import pickle
```
1. Read the data set files which are in CSV format:
```
 credits = pd.read_csv('tmdb_5000_credits.csv')
 movies = pd.read_csv('tmdb_5000_movies.csv')
```
  1. To read the contents inside the dataset: movies.head(), here movies is the dataset name.
    1. To make the readability of the dataset more easy we have to merge the two data sets with a common column, where here it is Title
      
      movies = movies.merge(credits,on='title')
      1. Now we have to read the list in a better format: Using this
        
        def convert(text): L = [] for i in ast.literal_eval(text): L.append(i['name']) return L
        
        import ast ast.literal_eval('[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]')
        
        Now using the above function we have to convert all the columns to a better-formatted list.
        
        Next, in the cast and crew column there are many numbers of data, so to simplify the reading we will extract the top 3 data of both columns.
        
        def convert3(text): L = [] counter = 0 for i in ast.literal_eval(text): if counter < 3: L.append(i['name']) counter+=1 return L
        
        movies['cast'] = movies['cast'].apply(lambda x:x[0:3])
        
        Now with crew, we will extract the top directors:
        
        def fetch_director(text): L = [] for i in ast.literal_eval(text): if i['job'] == 'Director': L.append(i['name']) return L
        
        movies['crew'] = movies['crew'].apply(fetch_director)
        
        Removing extra space(White Spaces):
        
        movies['genres']= movies['genres'].apply(lambda x:[i.replace(" ","") for i in x]) movies['keywords']= movies['keywords'].apply(lambda x:[i.replace(" ","") for i in x]) movies['cast']= movies['cast'].apply(lambda x:[i.replace(" ","") for i in x]) movies['crew']= movies['crew'].apply(lambda x:[i.replace(" ","") for i in x]
        
        Creating tags: Tags are where apart from the columns movie id and title rest columns will be merged to form a paragraph, meaning full easy to understand.
        
        movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']
        
        new_df = movies[['movie_id','title','tags']]
        
        new_df['tags']= new_df['tags'].apply(lambda x:" ".join(x))
        
        new_df['tags'].apply(lambda x:x.lower())
        
        new_df['tags']= new_df['tags'].apply(lambda x:x.lower())
        
        from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(max_features=5000, stop_words='english') cv.fit_transform(new_df['tags']).toarray() vectors = cv.fit_transform(new_df['tags']).toarray() cv.get_feature_names_out() import nltk from nltk.stem.porter import PorterStemmer ps = PorterStemmer() def stem(text): y = [] for i in text.split(): y.append( ps.stem(i)) return " ".join(y)

Using stem:

 new_df['tags'].apply(stem)
 new_df['tags']= new_df['tags'].apply(stem)
 from sklearn.metrics.pairwise import cosine_similarity
 similarity = cosine_similarity(vectors)
 sorted(list(enumerate(similarity[0])), reverse=True, key = lambda x:x[1])[1:6]

 def recommend(movie):
     movie_index = new_df[new_df['title']==movie].index[0]
     distances = similarity[movie_index]
     movie_list = sorted(list(enumerate(distances)), reverse=True, key = lambda x:x[1])[1:6]
     for  i in movie_list:
         print(new_df.iloc[i[0]].title)

 import pickle
 pickle.dump(new_df, open('movies.pkl','wb'))
 new_df['title'].values
 pickle.dump(similarity,open('similarity.pkl','wb'))
 pickle.dump(new_df.to_dict(),open('movie_dict.pkl','wb'))

Last is the Front End : With a simple library Streamlit :

                import streamlit as st
                import pickle
                import pandas as pd

                def recommend(movie):
                    movie_index = movies[movies['title'] == movie].index[0]
                    distances = similarity[movie_index]
                    movie_list = sorted(list(enumerate(distances)), reverse=True, key=lambda x: x[1])[1:6]

                    recommended_movies = []
                    for i in movie_list:
                        recommended_movies.append(movies.iloc[i[0]].title)
                    return recommended_movies


                movies_dict = pickle.load(open('movie_dict.pkl','rb'))
                movies = pd.DataFrame(movies_dict)

                similarity = pickle.load(open('similarity.pkl','rb'))

                st.title('Movie Recommendation System')

                selected_movie_name = st.selectbox(
                    'Which movie you want to watch?',
                    movies['title'].values)

                if st.button('Recommend'):
                    recommendations = recommend(selected_movie_name)
                    for i in recommendations:
                        st.write(i)

Movie Recommendation Model.

Table of contents

Project Flow