AIM:-Detecting Fake News
In this blog, I am creating a mini project on “Detecting Fake News” using Machine learning.
What is Fake News?
A type of yellow journalism, fake news encapsulates pieces of news that may be hoaxes and is generally spread through social media and other online media. This is often done to further or impose certain ideas and is often achieved with political agendas. Such news items may contain false and/or exaggerated claims and may end up being virtualized by algorithms, and users may end up in a filter bubble.
Let`s Start the implementation,
fromFirst, we require a dataset of fake news. So you can download the dataset form HERE.
After that, we import the python library. Blow image you can see the code.
Fig. Importing library
Then I am going to load the dataset using pandas. Below the image, you can see the shape of the data set is 4 columns and 6335 rows.
Fig. Loading the dataset
After that, I get the shape of the data and the first 5 records.
Fig. Print first five rows using head()
Also, I am printing the information about the datasets using into().
Fig. About the dataset
And get the labels from the DataFrame.
Fig. Get labels
Here I am coveting labels into numeric values using the labelencoder technique. Below image, you can see that things.
Fig. Label Encoder
After that, I am Split the dataset into training and testing sets.
Fig. Split the dataset into train and test set
Let’s initialize a TfidfVectorizer with stop words from the English language and a maximum document frequency of 0.7 (terms with a higher document frequency will be discarded). Stop words are the most common words in a language that is to be filtered out before processing the natural language data. And a TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF features.
Now, fit and transform the vectorizer on the train set, and transform the vectorizer on the test set.
Fig. fit and transform the vectorizer on the train set
Here I am using LogisticRegression to predict the output.
Then, we’ll predict on the test set from the TfidfVectorizer and calculate the accuracy with accuracy_score() from sklearn metrics.
Also Here I am getting an accuracy score of 91.71%.
Fig. Find the accuracy
After that, I am going to calculate the confusion matrix.
Fig. confusion matrix
I hope you will understand these things…