#Tidyverse | Explore Tumblr posts and blogs

assignmentoc · 3 days ago

Text

View this post on Instagram

A post shared by Assignment On Click (@assignmentonclick)

instagram

#DataVisualization #ggplot2 #VisualizeWithR #RForDataScience #DataVizR #LearnR #RProgramming #Tidyverse #ChartsWithR #TechForStudents #AssignmentHelp #AssignmentOnClick #assignment #assignment help #assignment service #assignmentexperts #assignmentwriting #Instagram

0 notes

quixoticanarchy · 1 year ago

Text

data science is so funny it just sounds so overwhelmingly fake. "hiii we have tuples and tibbles and packages from the tidyverse" whoever was coming up w these words would've done better designing technobabble jargon for a low budget children's sci fi show

#making fun of r instead of doing my tasks...#i don't hate r actually. is the worse thing maybe. i think it's kind of fun but the tidyverse is simply a stupid name #skravler

38 notes · View notes

catchascatchcn · 17 days ago

Text

every time i forget that filter() isn't a base r function and i run it without librarying tidyverse and subsequently ruin my dataset an angel loses its wings. i understand that this is meaningless to everyone

#“why would you run anything without tidyverse” bc my r session got fucked and i forgot to reload it PLEASE #catch attempts a phd

0 notes

splicejunction · 1 month ago

Text

spreadsheet situation so diabolical i had to pull out the library(tidyverse)

#sometimes you need to parse text okay. Okay?#And cross reference lists of issues that are not formatted the same way and ordered differently #side note who the fuck decided to put the character appearances on the dc wikia in alphabetical order whos going in like let me read all th #appearances in TITLE ALPHABETICAL ORDER

16 notes · View notes

guardian-of-local-forest · 4 months ago

Text

my supervisor telling me about tidyverse: you probably use it and know it really well. i cant get the hang of it. i feel like a dinosaur. it is super modern and is supposed to be intuitive, but base R is so much better for me.

me, who only ever made one (1) ggplot: *cricket noises*

me later when i want to count mean for group of data using dplyr: HOW IS THIS INTUITIVE

#pls help me #is there a way to make base R do it (without having to make a subset)?#no im not sure if i will get hang of dplyr #intuitive who. whats the % supposed to mean.#AAAA #yes i am actually ranting about Rstudio on tumblr dot com #what have i become #biostatistics #rstudio #bachelor thesis #is this what my science career is gonna look like

9 notes · View notes

aorish · 5 months ago

Text

the tidyverse markup language

8 notes · View notes

regina-bithyniae · 1 year ago

Text

bad %>% good

Used to strongly dislike coding in R, but now after about a year of using it most of the time, and using only it since September, I've come to appreciate it. Having ChatGPT as a coding+learning tool is huge, I think Python has a legibility advantage for programming noobs.

But for data analysis, the Tidyverse model of piping step 1 to 2 to 3 in one go feels so much more natural than it ever did in Python. Can't believe you had to wrap `function_3(function_2(function_1(df)))` to chain steps, that was barbaric!

#jobposting #econposting

21 notes · View notes

onemanscienceband · 9 months ago

Text

when i'm in the middle of a programming headache i sometimes return to the question of why do libraries like tidyverse or pandas give me so many headaches

and the best I can crystallize it is like this: the more you simplify something, the more restricted you make its use case, which means any time you want to deviate from the expected use case, it's immediately way harder than it seemed at the start.

like. i have a list and I want to turn it into a dataframe. that's easy, as long as each item in the list represents a row for the dataframe*, you're fine. but what if you want to set a data type for each column at creation time? there's an interface for passing a list of strings to be interpreted as column names, but can I pass a list of types as well?

as far as I can tell, no! there might have been one back in the early 2010s, but that was removed a long time ago.

so instead of doing:

newFrame = pd.DataFrame(dataIn,cols=colNames,types=typeNames)

I have to do this:

newFrame = pd.DataFrame(dataIn,cols=colNames) for colName in ["col1","col2","col3"]: newFrame[colName] = newFrame[colName].astype("category")

like, the expected use case is just "pass it the right data from the start", not "customize the data as you construct the frame". it'd be so much cleaner to just pass it the types I want at creation time! but now I have to do this in-place patch after the fact, because I'm trying to do something the designers decided they didn't want to let the user do with the default constructor.

*oh you BET I've had the headache of trying to build a dataframe column-wise this way

#gripes #programming #diary #python

4 notes · View notes

bisquid · 1 year ago

Text

Yeah I've done a bachelor's degree and a master's in STEM (zoology and ecology respectively) and we got,,,, very little in the way of actual statistical training. Iirc there was a week long course on how to use R and tidyverse during my master's, but undergrad? Maybe a module in first year, but again that was focused on how to DO statistics, rather than how to UNDERSTAND statistics, or how to choose an appropriate test.

Most of my statistics knowledge has come from a) my dissertation, which ended up being entirely data based, using other people's data (something that made it Very Clear how much you never want to use other people's data if you like your sanity) and then b) self study.

one of the best academic paper titles

#i have the advantage of a mathematician parent #but even so

117K notes · View notes

nschool · 7 days ago

Text

The Best Open-Source Tools for Data Science in 2025

Data science in 2025 is thriving, driven by a robust ecosystem of open-source tools that empower professionals to extract insights, build predictive models, and deploy data-driven solutions at scale. This year, the landscape is more dynamic than ever, with established favorites and emerging contenders shaping how data scientists work. Here’s an in-depth look at the best open-source tools that are defining data science in 2025.

1. Python: The Universal Language of Data Science

Python remains the cornerstone of data science. Its intuitive syntax, extensive libraries, and active community make it the go-to language for everything from data wrangling to deep learning. Libraries such as NumPy and Pandas streamline numerical computations and data manipulation, while scikit-learn is the gold standard for classical machine learning tasks.

NumPy: Efficient array operations and mathematical functions.

Pandas: Powerful data structures (DataFrames) for cleaning, transforming, and analyzing structured data.

scikit-learn: Comprehensive suite for classification, regression, clustering, and model evaluation.

Python’s popularity is reflected in the 2025 Stack Overflow Developer Survey, with 53% of developers using it for data projects.

2. R and RStudio: Statistical Powerhouses

R continues to shine in academia and industries where statistical rigor is paramount. The RStudio IDE enhances productivity with features for scripting, debugging, and visualization. R’s package ecosystem—especially tidyverse for data manipulation and ggplot2 for visualization—remains unmatched for statistical analysis and custom plotting.

Shiny: Build interactive web applications directly from R.

CRAN: Over 18,000 packages for every conceivable statistical need.

R is favored by 36% of users, especially for advanced analytics and research.

3. Jupyter Notebooks and JupyterLab: Interactive Exploration

Jupyter Notebooks are indispensable for prototyping, sharing, and documenting data science workflows. They support live code (Python, R, Julia, and more), visualizations, and narrative text in a single document. JupyterLab, the next-generation interface, offers enhanced collaboration and modularity.

Over 15 million notebooks hosted as of 2025, with 80% of data analysts using them regularly.

4. Apache Spark: Big Data at Lightning Speed

As data volumes grow, Apache Spark stands out for its ability to process massive datasets rapidly, both in batch and real-time. Spark’s distributed architecture, support for SQL, machine learning (MLlib), and compatibility with Python, R, Scala, and Java make it a staple for big data analytics.

65% increase in Spark adoption since 2023, reflecting its scalability and performance.

5. TensorFlow and PyTorch: Deep Learning Titans

For machine learning and AI, TensorFlow and PyTorch dominate. Both offer flexible APIs for building and training neural networks, with strong community support and integration with cloud platforms.

TensorFlow: Preferred for production-grade models and scalability; used by over 33% of ML professionals.

PyTorch: Valued for its dynamic computation graph and ease of experimentation, especially in research settings.

6. Data Visualization: Plotly, D3.js, and Apache Superset

Effective data storytelling relies on compelling visualizations:

Plotly: Python-based, supports interactive and publication-quality charts; easy for both static and dynamic visualizations.

D3.js: JavaScript library for highly customizable, web-based visualizations; ideal for specialists seeking full control.

Apache Superset: Open-source dashboarding platform for interactive, scalable visual analytics; increasingly adopted for enterprise BI.

Tableau Public, though not fully open-source, is also popular for sharing interactive visualizations with a broad audience.

7. Pandas: The Data Wrangling Workhorse

Pandas remains the backbone of data manipulation in Python, powering up to 90% of data wrangling tasks. Its DataFrame structure simplifies complex operations, making it essential for cleaning, transforming, and analyzing large datasets.

8. Scikit-learn: Machine Learning Made Simple

scikit-learn is the default choice for classical machine learning. Its consistent API, extensive documentation, and wide range of algorithms make it ideal for tasks such as classification, regression, clustering, and model validation.

9. Apache Airflow: Workflow Orchestration

As data pipelines become more complex, Apache Airflow has emerged as the go-to tool for workflow automation and orchestration. Its user-friendly interface and scalability have driven a 35% surge in adoption among data engineers in the past year.

10. MLflow: Model Management and Experiment Tracking

MLflow streamlines the machine learning lifecycle, offering tools for experiment tracking, model packaging, and deployment. Over 60% of ML engineers use MLflow for its integration capabilities and ease of use in production environments.

11. Docker and Kubernetes: Reproducibility and Scalability

Containerization with Docker and orchestration via Kubernetes ensure that data science applications run consistently across environments. These tools are now standard for deploying models and scaling data-driven services in production.

12. Emerging Contenders: Streamlit and More

Streamlit: Rapidly build and deploy interactive data apps with minimal code, gaining popularity for internal dashboards and quick prototypes.

Redash: SQL-based visualization and dashboarding tool, ideal for teams needing quick insights from databases.

Kibana: Real-time data exploration and monitoring, especially for log analytics and anomaly detection.

Conclusion: The Open-Source Advantage in 2025

Open-source tools continue to drive innovation in data science, making advanced analytics accessible, scalable, and collaborative. Mastery of these tools is not just a technical advantage—it’s essential for staying competitive in a rapidly evolving field. Whether you’re a beginner or a seasoned professional, leveraging this ecosystem will unlock new possibilities and accelerate your journey from raw data to actionable insight.

The future of data science is open, and in 2025, these tools are your ticket to building smarter, faster, and more impactful solutions.

0 notes

assignmentoc · 3 days ago

Text

View this post on Instagram

A post shared by Assignment On Click (@assignmentonclick)

instagram

#DataWrangling #dplyr #tidyr #Tidyverse #RForDataScience #RProgramming #DataCleaningR #DataManipulation #LearnR #TechForStudents #AssignmentHelp #AssignmentOnClick #assignment #assignment help #assignment service #assignmentexperts #assignmentwriting #Instagram

0 notes

jarviscodinghub · 10 days ago

Text

SOC-GA 2332 Intro to Stats Problem Set 2 Solved

Prerequisite Load multiple packages to your environment using the following code (you can add more packages to the current list as per your need): knitr::opts_chunk$set(echo = TRUE) library(pacman) p_load(tidyverse, foreign, corrplot, stargazer, coefplot, effects) Part 1: The Replication Project This week, we will begin familiarizing ourselves with the replication exercise. Download and save the…

0 notes

sustainableyadayadayada · 23 days ago

Text

the ggplot2 "ecosystem"

In the beginning there was R. Or, S? I’ve heard that R actually rests on a foundation of C++ or Java. Anyway, then there was the tidyverse, sort of another whole programming language that rests in R (or a metastasizing cancer that has grown to dominate R, if you ask certain people, but I personally am a big fan). Now within the tidyverse was always ggplot2, which I have grown to rely on almost…

#computer programming #data science #visualization

0 notes

davesanalytics · 1 month ago

Text

Emacs/Tidyverse: Lives Saved by Vaccines (vid 02)

#our world in data #data visualization #emacs #data viz

0 notes

htktyo · 2 months ago

Text

形態素解析処理周りのインストール状況

次のソフトウエア関係をインストール。最後のmecab-userdic.csvが効いてるような気がする。

mecab

mecab-ipadic

mecab-ipadic-neologd

mecab-userdic.csv

さらに，R関係で次をインストール

tidyverse

RMeCab

あと，RStudioはエディタ画面を2枚作らないと日本語入力がヘンだったことを思い出した。

0 notes

tpointtechedu · 2 months ago

Text

The Complete R Programming Tutorial for Aspiring Data Scientists

In the world of data science, the right programming language can make all the difference. Among the top contenders, R programming stands out for its powerful statistical capabilities, robust data analysis tools, and a rich ecosystem of packages. If you're an aspiring data scientist, mastering R can open the door to a wide range of opportunities in research, business intelligence, machine learning, and online R compiler.

In this complete R programming tutorial, we’ll walk you through the essentials you need to start coding with R—from installation to basic syntax, data manipulation, and even simple visualizations.

Why Learn R for Data Science?

R is a language built specifically for statistical computing and data analysis. It is widely used in academia, finance, healthcare, and tech industries. Some key reasons to learn R include:

Open Source & Free: R is completely free to use and has a vast community contributing packages and resources.

Built for Data: Unlike general-purpose languages, R was designed with statistics in mind.

Visualization Power: With packages like ggplot2, R makes data visualization intuitive and beautiful.

Data Analysis-Friendly: Data frames, tidyverse, and built-in functions make data wrangling a breeze.

Step 1: Installing R and RStudio

Before you can dive into coding, you’ll need two essential tools:

R: Download and install R from CRAN.

RStudio: A user-friendly IDE (Integrated Development Environment) that makes writing R code easier. Download it from rstudio.com.

Once installed, open RStudio. You'll see a scripting window, console, environment panel, and files/plots/packages/help panel—everything you need to code efficiently.

Step 2: Writing Your First R Script

Let’s start with a simple script.# This is a comment print("Hello, Data Science World!")

Hit Ctrl + Enter (Windows) or Cmd + Enter (Mac) to run the line. You’ll see the output in the console.

Step 3: Understanding Data Types and Variables

R has several basic data types:# Numeric num <- 42 # Character name <- "Data Scientist" # Logical is_learning <- TRUE # Vector scores <- c(90, 85, 88, 92) # Data Frame students <- data.frame(Name = c("John", "Sara"), Score = c(90, 85))

Use the str() function to explore objects:str(students)

Step 4: Importing and Exploring Data

R can read multiple file formats like CSV, Excel, and JSON. To read a CSV:data <- read.csv("yourfile.csv") head(data) summary(data)

If you're working with large datasets, packages like data.table or readr can offer better performance.

Step 5: Data Manipulation with dplyr

Part of the tidyverse, dplyr is essential for transforming data.library(dplyr) # Select columns data %>% select(Name, Score) # Filter rows data %>% filter(Score > 85) # Add new column data %>% mutate(Grade = ifelse(Score > 90, "A", "B"))

Step 6: Data Visualization with ggplot2

ggplot2 is one of the most powerful visualization tools in R.library(ggplot2) ggplot(data, aes(x = Name, y = Score)) + geom_bar(stat = "identity") + theme_minimal()

You can customize charts with titles, colors, and themes to make your data presentation-ready.

Step 7: Writing Functions

Functions help you reuse code and keep things clean.calculate_grade <- function(score) { if(score > 90) { return("A") } else { return("B") } } calculate_grade(95)

Step 8: Exploring Machine Learning Basics

R offers packages like caret, randomForest, and e1071 for machine learning.

Example using linear regression:model <- lm(Score ~ Age + StudyHours, data = students) summary(model)

This builds a model to predict score based on age and study hours.

Final Thoughts

Learning R is a valuable skill for anyone diving into data science. With its statistical power, ease of use, and strong community support, R continues to be a go-to tool for data scientists around the globe.

Key Takeaways:

Start by installing R and RStudio.

Understand basic syntax, variables, and data structures.

Learn data manipulation with dplyr and visualizations with ggplot2.

Begin exploring models using built-in functions and machine learning packages.

Whether you're analyzing research data, building reports, or preparing for a data science career, this R programming tutorial gives you the solid foundation you need.

For Interview Related Q&A :

Happy coding!

#RProgramming #LearnR #DataScienceTutorial

0 notes