How to extract post data using Reddit’s API

Reddit is an absolute treasure trove of current information. This fact is particularly relevant for the natural language community as users predominately use text to communicate with each other. Reddit has created an API to facilitate developers in their app-building endeavors. In this tutorial, I wish to explain the steps pertaining to the use of this API. Moreover, we’ll be reviewing the process of connecting to the API and extracting users’ posts.

Let’s get Started

First, venture over to and create a new app.

Provide a name (ie. Tutorial_1), description, and redirect URL for your application. Make sure to select the “script”…

Learn to grasp the mysteries of deep learning

Photo by Possessed Photography on Unsplash

Have you ever wondered how your smartphone can magically predict the words and phrases as you are typing a text message or email? The purpose of this article is to help demystify the complex machinery behind artificial intelligence algorithms we interact with every day.

Today’s state-of-the-art artificial intelligence algorithms utilize a process called deep learning. If you happen to type “deep learning” into your favorite search engine you will undoubtedly be presented with a variety of “web-like” diagrams seen below. We often refer to these diagrams as “Neural Networks” due to their similarity to a human neuron. …

Various Implementations of transfer learning using Keras

Image by Author

It is not unusual to train complex neural network models for several days, weeks, or even months. Most of us do not have access to the vast among of data and computational resources needed to successfully train those models. That said, there are entities such as OpenAI, Facebook, Google, Facebook, etc. who do possess the resources to develop very complex, accurate, and generalizable models. What’s more, they are willing to open-source these models for us, mere-mortals, to use in our own research. The process of incorporating pre-trained models into your own model is named “Transfer Learning”.

Topics Covered

In this tutorial, we’ll…

Teaching a computer to predict the next set of words in a sentence

Photo by Szabo Viktor on Unsplash

In this tutorial, we will walk through the process of building a deep learning model used to predict the next word(s) following a seed phrase. For example, we’ll ask the computer to predict the next 10 words after we have typed “The candidates are”.

Although cutting-edge models used in your smartphones to assist with sending text messages are vastly more complex, this article should give you a general idea of the methodology involved in this prediction (classification) task.

  1. Text Processing: tokenization, n_gram sequencing, engineering features and labels, and word embeddings
  2. Building a Bidirectional LSTM model
  3. Using our model to predict…

How to check regression assumptions using Python

Image by Gordon Johnson from Pixabay


The dataset we’ll be using for this tutorial is from Kaggle’s “House Prices: Advanced Regression Techniques” competition (LINK) as I am currently working on submitting my results. As this is a regression problem, where we are tasked with predicting house prices, we need to check if we have met all the major assumptions behind regression

In general, if you have violated any of these assumptions, then the results obtained from your model can be very misleading. Violations of some assumptions are much more serious than others but we should take great care to correctly process our data.

OLS Regression

Prior to checking…

Just some of the steps involved in prepping a dataset for analysis and machine learning.

Source: Image Created by Author

Forbes’s survey found that the least enjoyable part of a data scientist’s job encompasses 80% of their time. 20% is spent collecting data and another 60% is spent cleaning and organizing of data sets. Personally, I disagree with the notion that 80% is the least enjoyable part of our jobs. I often see the task of data cleansing as an open-ended problem. Typically, each data set can be processed in hundreds of different ways depending on the problem at hand but we can very rarely apply the same set of analyses and transformations from one dataset to another. …

Analyze statistical rigor and latent factors to determine employee sentiment using python

Image by mohamed Hassan from Pixabay

When analyzing employee sentiment data, which in our case is an employee exit survey, we have to look at four topics.

  1. Statistical rigor of the survey
  2. Demographical composition of survey respondents
  3. Overall sentiment for defined latent constructs
  4. Sentiment scores by respondents’ characteristics (ie. gender, location, department, etc.)

First, keeping to this methodology will enable us to determine how well our survey is measuring what it is meant to measure. Secondly, by understanding who answered the survey from a respondent characteristics perspective (ie. gender, departments, etc) we can provide context to our analysis and results. Thirdly, this methodology will help us…

Significant results are just the beginning.

Photo by Aleksandar Cvetanovic on Unsplash

Congratulations, your experiment has yielded significant results! You can be sure (well, 95% sure) that the independent variable influenced your dependent variable. I guess all you have left to do is write up your discussion and submit your results to a scholarly journal. Right…………?

Obtaining significant results is a tremendous accomplishment in itself self but it does not tell the entire story behind your results. I want to take this time and discuss statistical significance, sample size, statistical power, and effect size, all of which have an enormous impact on how we interpret our results.

Significance (p = 0.05)

First and foremost, let’s discuss…

Aligning research design and statistical analyses

Image by PIRO4D from Pixabay

From the first day I sat in my undergraduate “Research Methods” course staring at SPSS output, I knew I found my calling. I can still recall my first research paper. Watching my the completed surveys come in, diligently cleaning the data and crossing my fingers in the hopes of significant results. Despite my results coming back not significant I knew I found my passion.

In spite of engrossing myself in the topic, I found it particularly difficult aligning research design to statistical analysis. As the terminology began to roll-in (ie. t-tests, ANOVA, effect size, IV, MANOVA, ANCOVA, regression, R², etc.)…

Let’s talk about correlations, Cronbach’s alpha and factor analysis

Photo by Glenn Carstens-Peters on Unsplash

The human resources industry relies heavily on a wide range of assessments to support its functions. In fact, to ensure unbiased and fair hiring practices the US department of labor maintains a set of guidelines (Uniform Guidelines) to aid HR professionals in their assessment development ventures.

Personality assessments are often used in selection batteries to determine cultural fit into a company. Cognitive ability (ie. IQ) tests are consistently found to be the best overall predictor of job performance across all types and levels of jobs (Schmidt & Hunter, 1998). Structured interviews are used extensively in hiring decisions as they help…

Kamil Mysiak

Data Scientist | I/O Psychologist | Motorcycle Enthusiast | On a Search for my Personal Legend/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store