NLP Project - Find Your Dream Job in Data Science
Hard to believe that 2/3 of the bootcamp has already past. As the graduation day coming closer, I start to think about job searching.
It’s not an easy process. By doing this NLP project, I hope I could have more insights on:
- Which are industries that have data science related posting?
- Who are the top employers?
- What the key skill sets that they are looking for?
- How to efficiently find the jobx that fits me the best?
Data Source and size
I scraped my data around 900 active job postings from glassdoor.com using the key word “Data Scientist” in Seattle area. For each posting, I obtained the job description, glassdoor estimate salary and several information about the employers include rating, size & revenue.
Modeling Steps
- Text Pre-processing:
- Retrieved all the skill sets related information and counted how frequently each skill appears in the job description(e.g., Python, R).
- Retrieved sentences which mentioned “year…experience”. From there, got the required number of years of experience.
- Remove all the non-alphabetic characters and stop words. Change the text to lower case.
- Remove company names from the job description.
- Remove common words especially for the ones in section where employers talk about being an equal opportunity employer.
- Use TF-IDF 2-3 ngrams to tokenized the job description cleaned by the previous two steps.
- Topic Modeling:
-
Initially I was having trouble identifying clear topics. When I used LSA to split the full dataset sets into two topics. I found one is related to medical laboratory research, while the other one is more data science related. Below shows the scatter plot with x & y as the two LSA topic probabilities. Therefore I removed the datapoints related to medical laboratory research (around 100 datapoints). After that, the topic modeling become more clear.

-
I tried a few techniques include LDA, LSA and NMF. And the one give me the best result is 4-topic-split using NMF. Below I have created a Tableau story book showing the results of exploration data analysis and topic modeling.
Findings
Here is also the link to my Tableau storybook if you have trouble viewing
- Recommendation System
I read in my resume in plain text format, did the same text processing as the job descriptions, and then I calculated the cosine similarity between my resume and all the job descriptions. Below are the ones with top similarity scores.

Future Works
In the future, I would like to:
- Do more data exploration analysis
- Build UI for job recommendation system
- Refine recommendation system by incorporating user application history
