Data Exploration

Data Preprocessing

To efficiently execute this stage of the project, reviewing the obtained data types and identifying the extant data levels are conducive measures to infer the appropriate exploration methods.

We began the preprocessing stage by resizing the collected organized data and dropping the empty rows and negligible columns.

It was deduced that rows with empty values in the ‘Timestamp (DD/MM/YY H: M:S)’ column are consequently blank rows.

Additionally, the aforesaid columns included in the dropped list are:

[‘ID’, ‘Timestamp (DD/MM/YY H: M:S)’, ‘Tweet URL’, ‘Group’, ‘Collector’, ‘Category’, ‘Topic’, ‘Screenshot’, ‘Reviewer’, ‘Review’].

These columns were distinguished as negligible since their values were used to review or identify—not classify—row entries. Thus, they do not have any bearing on the research question. 

General Steps for Preprocessing