Text
The Auto-disputer: the single most important barrier Tellor has from hacks and attacks to customers and their funds.
link to source code
[[MORE]]
Why did I build this?
When I worked at Tellor, we arrived at a situation where we were adding lots of data to our on-chain database, but we were not monitoring the quality of the data. Our customers relied on the accuracy of our data because our data secured their smart contracts and customer funds.
What did I build?
The founders of Tellor tasked me with building a python CLI tool that would evaluate the accuracy of on-chain data as it was added, in real time. This tool was required to send text alerts to a user-provided phone number in case the monitor detected bad data. The tool was also required to remove bad data in real time by sending an Ethereum transaction. The tool was originally called the "Disputable Values Monitor", but we renamed it the "Auto-disputer".
How does it work?
First, the Auto-disputer uses the web3.py library and an RPC connection to an Ethereum node to retrieve new data from the Tellor contract in real time. When the Auto-disputer picks up a new datapoint, it compares the found value to a reference calculation for that datapoint.
In order to auto-dispute, users need to define what a "disputable value" is. To do this, users can set "thresholds" for feeds they want to monitor. Thresholds in the auto-disputer serve to set cutoffs between a healthy value and a disputable value. The Auto-disputer supports three types of thresholds: range, percentage, and equality.
What were the results?
On completion, the Auto-disputer immediately found users. Customers of Tellor, Liquity and Liquid Loans, began to manage their own instances of the Auto-disputer in order to protect their protocols and their customer funds. In other words, the Auto-disputer is the single most important barrier Tellor has from hacks and attacks to customers and their funds.
1 note
·
View note
Text
County-Level Poltical Analysis of Coronavirus-Related Tweets
Click here for interactive Plotly Dash app
Why This Project?
This project intends to mitigate the spread of Coronavirus by providing to relief organizations a firmer understanding of how political geography affects an individual's reaction to crisis.
By uncovering the keywords that differentiate Twitter users from Red counties vs Blue counties (based on 2016 Presidential Election results), this model aims to assist public health workers address the needs of individuals based on their political geography.
Results
This model predicted the political leaning of a tweet's U.S. county with 65% accuracy, as compared to the baseline accuracy of 50% (articially balanced classes)
Interpreting The Graph
Submit a tweet, and a graph will appear that displays the keywords of the tweet that the model believes predicts the political affiliation of the user's county.
Greatest Challenges
The greatest challenge for me in this project was picking a problem framework with a solution that would be both interpretable and robust. One framework I experimented with was to ask "Can I predict the specific county a tweet comes from?" The advantages of this framework would be that I could discover more nuanced regional trends on a map beyond political affiliation.
However, I found that this approach lended to predicting only the top metropolitan areas of the United States. This consistent prediciton arose because most tweets in the U.S. come from the major metropolitan areas. Even after oversampling rural counties (which was painstaking considering most counties from my dataset only output 1-5 tweets), the model predicted the same classes each time. This time, the predicted counties were from Wisconsin, Michigan, Iowa, etc.
Because of the inaccuracies of this model's results, I opted for a simpler model that would predict only the political affiliation of the tweet's home county based on the 2016 Presidential Election results. While this model doesn't reveal patterns in regional geography, it does reveal patterns in political geography.
How Was This Data Sourced?
This model was trained on 30,000 tweets tweeted during the week of March 30, 2019. They were sourced using Twitter's official API and the Tweepy python library.
0 notes
Text
Spotify Song Recommender
Click here for interactive streamlit app
Main Objective: Spotify Recommendation Engine
The main idea behind this project was to create a web application that recommends music based on a given song. Such an application is a simulation of a common business problem of recommending a product to a customer. In the next section, I will explain how my team and I solved this business problem in this particular use case.
Spotify API
My team and I accessed the Spotify API through a Python library called Spotipy. With Spotipy, we queried 160,000 songs and their attributes (acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, time signature, and valence -- note: these are measured by Spotify). Using this database, we trained a Nearest Neighbors model to return the most similar song from our database based on song attributes. On the front end, we displayed visualizations of a given recommended song's attributes and the percent difference of each attribute vs the original song.
0 notes
Text
Predicting Philosophy Authorship -- Michel Foucault vs Noam Chomsky
Click for Interactive app
Main Objective: Authorship Prediction
In order to search for a difference in the writing styles and topics of philosophers Michel Foucault and Noam Chomsky, who famously debated in 1971, I decided to run an NLP Project to predict the authorship of a selected passage or a famous quote by either author. The model I chose was trained on 3 full books by each author, credits to the Internet Archive for sourcing the text files for the books. I trained a Naive Bayes classifier on the 6 books (total) split into 5,725 5-sentence samples, and the model predicted the author of each sample with 98 percent accuracy.
Preprocessing
Prior to the Naive Bayes, I first tokenized each word using gensim's thorough preprocess_string method, which eliminates stop words, numerics, and symbols, among other tasks. Then, I ran a TFIDF vectorizer to count the appearances of each word in the corpus relative to the length of the document. Finally, I added a Random UnderSampler from the imbalanced-learn library to ensure that the model wouldn't be biased towards one author or the other in deployment.
Second Objective: Locate Similar Passages
The second objective of this project was to locate the most similar passage from the 6-book selection given any text sample. The model uses a NearestNeighbors algorithm to return the most similar 5-sentence sample from the 6-book collection. I based the similarity of passages on the Euclidean distance between two 2-dimensional vectors, one representing the input text and the other representing the most similar passage. I converted text samples into 2-dimensional vectors using spacy's word2vec pre-trained model, with 2-dimensional PCA applied.
0 notes
Text
The Age of Anxiety in New York: Data Visualizations for Public Health
Introduction
Homelessness, inequality, poverty — Anyone who has walked around Manhattan in 2019 for more than 30 minutes has encountered these social issues on the streets of the city where we once believed the American Dream was most alive. Everyone will agree that they are deeply upsetting, and almost everyone will agree that they feel powerless to this injustice.
Even on a grander scale, Americans will agree that depression is a large, close-to-home problem in the United States. Half of Americans have been personally affected by suicide in some way, showing that no one is too far removed from the ramifications of poor mental health.
In urban domains, public health officials often approach physical ailments from a human geography perspective, examining correlations between socio-economics, demographics, and health indicators such as exercise and addiction. This approach begs a similar, but more novel question: how does public health influence *mental *health? Could certain attributes of a neighborhood’s health — role models, poverty, community inclusivity — predict the mental health of its constituents?
Key Question
How does *public *health influence *mental *health?
Dataset
The New York City Department of Public Health’s Community Health Survey (CHS) is a random telephone survey that has collected regional data on mental and physical health, household economic data, and community strength across the sheer racial, social, and economic diversity of the region.
CHS asks each respondent the eight standardized questions which make up Personal Health Questionnaire 8 (PHQ8), a clinically standard diagnostic tool for Major Depressive Disorder. An example form is attached below.
![A population-based study of edentulism in the US: Does depression and rural residency matter after controlling for potential confounders? — Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Patient-health-questionnaire-PHQ-8-scoring-and-interpretation-with-BRFSS-response_tbl1_259875720 [accessed 5 Aug, 2019]](https://cdn-images-1.medium.com/max/2000/1JuO46avwLMsw5m-G1qql_A.png =100x)A population-based study of edentulism in the US: Does depression and rural residency matter after controlling for potential confounders? — Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Patient-health-questionnaire-PHQ-8-scoring-and-interpretation-with-BRFSS-response_tbl1_259875720 [accessed 5 Aug, 2019]*
What makes CHS unique is that, in addition to asking the PHQ8 questionnaire, the survey asks New Yorkers to consider the quality of life in their neighborhood.
Analysis
In order to evaluate the relationship between neighborhood health and individual health, I first examined the prevalence of depression severity among those who answered the following questions:
How Would You Describe Your Food Security Situation?
About 10 percent of respondents who reported not having enough to eat in their household also qualified for severe depression. More than half of those reporting this level of food insecurity also qualified for depression of any severity.
Have You Experienced Discriminatory Healthcare Treatment in the Past 12 Months?
A similar pattern emerges with discriminatory healthcare treatment. About 5 percent of respondents who reported experiencing discriminatory healthcare treatment also reported severe depression, and about half of those who reported yes qualified for a depression diagnosis of any severity.
Conclusion
Depression is a very serious disease, oftentimes crippling those who suffer from it. Furthermore, the risk of suicide can destroy families and livelihoods.
Many assume that the distribution of depression in the American population is random or genetic. Rather, the results of this analysis point to a different hypothesis: while this analysis cannot assert that food insecurity and healthcare discrimination *cause *depression, the results point to a correlation between environmental factors such as food insecurity and discriminatory healthcare treatment and depression.
About 50 percent of New York City’s food-insecure citizens are depressed. 10 percent are severely depressed. I hypothesize that an inability to secure one’s basic needs facilitates depression and a sense of hopelessness.
The prevalence of severe depression among the food-insecure is simply disproportionate, which presents the idea that mental health can be a justice issue as obesity and smoking are.
1 note
·
View note