Summarizing Youtube Comments

Summary

I built a lightweight web app that allows users to input a YouTube URL and receive a structured summary of viewer comments. The app scrapes top comments, formats them into a structured DataFrame, and sends them to an LLM via the Perplexity API to generate actionable insights for content creators.

Built with Python, Streamlit, and the youtube-comment-downloader package, this tool showcases rapid prototyping and effective LLM integration for real-world use cases.

Problem

Content creators struggle to quickly digest large volumes of feedback from YouTube comments. Can we build a tool that extracts, cleans, and summarizes comment sentiment into actionable suggestions?

Approach

1. Data Collection

Downloading the comments from Youtube comes from this package called youtube-comment-downloader.

downloader = YoutubeCommentDownloader()
        comments = downloader.get_comments_from_url(Youtube_URL, sort_by=SORT_BY_POPULAR)
        
        all_comments_dict = {
            'cid': [],
            'text': [],
            'time': [],
            'author': [],
            'channel': [],
            'votes': [],
            'replies': [],
            'photo': [],
            'heart': [],
            'reply': [],
            'time_parsed': []
        }
        
        for comment in comments:
            for key in all_comments_dict.keys():
                all_comments_dict[key].append(comment[key])
        
        comments_df = pd.DataFrame(all_comments_dict)
        return comments_df

We essentially just get the comments from the URL, and then we put it into a pandas dataframe. In this case, we collected a lot of information into our dictionary including text, time, author, and more. We stored all of them into a pandas dataframe, that we'll use later for our summarization.

2. Summarization

For summarization, we use an openai chat completion call to summarize the comments. As a student, I have access to a free pro plan for Perplexity which gives me a 5$ credit every month, so in this implementation I use the perplexity api key and their model sonar-pro.

messages = [
    {"role": "system", "content": "You are an AI assistant focused on helping content creators to improve and grow their channels."},
    {"role": "user", "content": f"Summarize this text: {input_text} into five headers: 'why viewers watch', 'how to improve', 'how to improve content production' , 'how to improve channel growth', 'how to improve engagement'."}
]
client = OpenAI(api_key=api_key, base_url="https://api.perplexity.ai")
response = client.chat.completions.create(
    model="sonar-pro",
    messages=messages,
    max_tokens=max_tokens,
    temperature=0.7
)
return response.choices[0].message.content

In this snippet, I first clarify what the 'messages' are inside the chat completions call. First, we tell the system that they are an AI assistant, and then we provide the first message from the user asking them to summarize the text input_text. Then I specify the client by calling OpenAI, but also specifying the base_url to be perplexity as that's the api key I'm going to use. Then I call the chat completions create method, and specify the model to be sonar-pro, referencing the same messages above.

3. Streamlit App

So far, I've mostly just used and written simple python code that one might use on a jupyter notebook or a short script. But now, we can just put all of this into a streamlit app. In our main function, we'll use st to refer to streamlit methods and classes, and start with things like

st.header("YouTube Comments Downloader and Summarizer")
st.write("Download and summarize comments from YouTube videos effortlessly.")

st.divider()

url_text = st.text_input("Enter YouTube URL")
api_key = st.text_input("Enter Perplexity API Key", type="password")

Here, we essentially just start with a header, then a short description of the page. We added a divider, and then collect both the url_text and api_key as a text_input. The user has to input a youtube url and their own api key for summarization.

Now, we call the functions we defined above.

raw_df = youtube_url_to_df(url_text)
    
if raw_df is None or raw_df.empty:
    st.info('Please enter a valid YouTube link.')
else:
    download_df(raw_df, "Raw")
    
    st.subheader("Raw Comments")
    st.dataframe(raw_df[['text']].head(10))  # Show top 10 comments
    
    combined_comments = " ".join(raw_df['text'].tolist())
    
    if api_key:
        with st.spinner("Summarizing comments..."):
            summary = summarize_text(combined_comments, api_key)
            st.subheader("Summary of Comments")
            st.write(summary)
    else:
        st.warning("Please provide your API key to generate summaries.")

We first call the youtube_url_to_df function, and then check if the dataframe is empty. If it is, we show a message to the user to enter a valid youtube link. Otherwise, we download the raw comments, and show the top 10 comments. We then combine all the comments into one string, and then call the summarize_text function.

Finally, we can run the app by calling streamlit run streamlit_app.py in the terminal or we can run this online by hosting it on streamlit directly.

The code for this app can be found here and the streamlit app can be found here.