#what is non-linearly separable data
Explore tagged Tumblr posts
archaeren · 11 months ago
Note
Hello!! I hope you're having a good day ^^ I came across your post about writing non-linearly on Notion and I'm excited to try it out because the advice resonated with me! Though, I'm really new to using the app and, if possible, need help with how to do this part: 'where every scene is a separate table entry and the scene is written in the page inside that entry.' ;v;
Hello! Thank you so much for messaging!!! Since that post about writing non-linearly (linked for context) blew up roughly ten thousand times as much as anything I've ever posted, I've been kind of meaning to make a followup post explaining more about how I use Notion for writing non-linearly, but, you know, ADHD, so I haven't done it yet. XD In the meantime, I'll post a couple screenshots of my current long fic with some explanations! I'd make this post shorter, but I'm unable to not be Chatty. XD (just ask my poor readers how long my author notes are...) (There is a phone app as well which syncs with the desktop/browser versions, but I work predominantly in the desktop app so that's what I'm gonna be showing)
Tumblr media
(the table keeps going off the right side of the image but it's a bunch of unimportant stuff tbh) So this is more complicated than what you'll probably start with because I'm Normal and add a bunch of details that you might not need depending on what you're doing. For example, my fic switches POVs so I have a column for tracking that, and my fic follows a canon timeline so I have a column for dates so I can keep track of them, and I also made columns for things like if a scene had spoilers or certain content readers may want to avoid, which they can access in my spoiler and content guide for the fic. (As I said, I'm Normal.) I also do some complicated stuff using Status and estimated wordcount stuff to get an idea of how long I predict the content to be, but again, not necessary. Anyway, you don't need any of that. For the purposes of this explanation, we're just gonna look at the columns I have called Name, Order, and Status. (And one called Part, but we'll get into that later) Columns in Notion have different types, such as Text, Numbers, Select, Date, etc, so make sure to use the type that works best for the purpose of each column! For example, here I'm using Select for Character POVs, Number for Order and WC (wordcount), and Text for the In-Game Date. Okay let's get into it! Name is a column that comes in a Notion table by default, and you can't get rid of it (which drives me up the wall for some purposes but works totally fine for what we're doing here). As you can see on the scene I've labeled 'roll call', if you hover over a Name entry, a little button called 'Open' appears, which you click on to open the document that's inside the table. That's all default, you don't have to set anything up for it. Here's a screenshot of what it looks like when I click the one titled 'I will be anything for you' (I've scrolled down in the screenshot so you can see the text, but all the data fields also appear at the top of the page)
Tumblr media
(This view is called 'side peek' meaning the document opens on one side and you can still see the table under it on the left, which is what mine defaults to. But you can set it to 'center peek' or 'full page' as well.) All my scenes have their own entry like this! Note that I've said scenes, not chapters. I decide the chapters later by combining the scenes in whatever combination feels right, which means I can often decide in advance where my chapter endings will be. This helps me consciously give most of my endings more impact than I was usually able to do when I tried to write linearly. So hopefully that gives you an idea of what I mean by writing inside the table and treating the table as a living outline. The 'Status' column is also pretty straightforward, and might require a little setup for whatever your needs are. This is another default column type Notion has which is similar to a Select but has a few more specialized features. This is how mine is set up:
Tumblr media
(I don't actually use 'Done', idk why I left it there. Probably I should replace it with 'Posted' and use that instead of the checkmark on the far left? whatever, don't let anyone tell you I'm organized. XDD)
Pretty straightforward, it just lets me see easily what's complete and what still needs work. (You'll notice there's no status for editing, because like I mentioned in my other post, I don't ever sit down to consciously edit, I just let it happen as I reread) Obviously tailor this to your own needs! The Order column is sneakily important, because this is what makes it easy for me to keep the scenes organized. I set the Sort on the table to use the Order to keep the scene ordered chronologically. When I make the initial list of scenes I know the fic will have, I give all of them a whole number to put them in order of events. Then as I write and come up with new scene ideas, the new scenes get a number with a decimal point to put them in the spot they fit in the timeline. (you can't see it here, but some of them have a decimal three or four digits deep, lol). Technically you can drag them to the correct spot manually, but if you ever create another View in your table (you can see I have eight Views in this one, they're right under the title) it won't keep your sorting in the new View and you'll hate yourself when it jumbles all your scenes. XD (And if you get more comfortable with Notion, you probably will at some point desire to make more Views) The Part column isn't necessary, but I found that as the fic grew longer, I was naturally separating the scenes into different points along the timeline by changes in status quo, etc. (ex. "this is before they go overseas" "this is after they speak for the first time", stuff like that) in my mind. To make it easier to decide where to place new scenes in the timeline, I formalized this into Parts, which initially I named with short summaries of the current status quo, and later changed to actual titles because I decided it would be cool to actually use them in the fic itself. Since it's not in the screenshots above, here's what the dropdown for it looks like:
Tumblr media
(I've blocked some of the titles out for spoiler reasons)
Basically I only mention the Parts thing because I found it was a useful organizational tool for me and I was naturally doing it in my head anyway. Anyway, I could keep talking about this for a really long time because I love Notion (don't get me started on how I use toggle blocks for hiding content I've edited out without deleting it) but that should be enough to get started and I should really, you know, not make this another insanely long post. XDD And if anybody is curious about how the final results look, the fic can be found here.
539 notes · View notes
cm-shorts · 1 month ago
Text
High-Dimensional Perspectives
Tumblr media
Sometimes, problems that seem chaotic, non-predictable, or unsolvable in their original form become much more manageable when they are viewed from a higher-dimensional perspective. This idea has deep roots in physics and mathematics, where embedding a system into a richer space can uncover hidden structure, simplify dynamics, or even reveal exact solutions.
A classic example comes from Hamiltonian mechanics. Instead of describing a system only by its positions, physicists use a phase space that includes both positions and momenta. This doubles the dimension, but it reveals a symplectic geometry where conserved quantities and regular flows become visible. What looked messy in ordinary space becomes structured in this extended view.
Another beautiful case is the Kaluza-Klein theory, where electromagnetism is unified with gravity by imagining a fifth compact dimension. In this higher-dimensional space, the electromagnetic force isn't an extra entity — it simply becomes part of the curved geometry. A complex force field in four dimensions is just geodesic motion in five.
Quantum mechanics offers a different kind of embedding. Physical systems are represented as vectors in a complex Hilbert space. Many features that seem paradoxical — like superposition and interference — become natural when viewed as geometric relationships in this abstract, infinite-dimensional space. Again, lifting the problem into a richer realm simplifies its logic.
Even in machine learning, this technique is powerful. The so-called "kernel trick" maps data into a higher-dimensional feature space where previously non-separable patterns become linearly separable. What was hard becomes easy, just by changing the lens.
There are also mathematical tricks where nonlinear differential equations become linear after a clever transformation or embedding. The Cole-Hopf transformation, for instance, turns the nonlinear Burgers equation into a simple heat equation. And in chaos theory, Takens’ embedding theorem shows how the behavior of a dynamical system can be reconstructed from delayed copies of a single observable �� allowing one to recover the geometry of strange attractors from a one-dimensional time series.
In all these cases, the key idea is the same: what appears unpredictable or complex in its native space may become orderly and even elegant when embedded in a larger, more expressive structure.
3 notes · View notes
optimisticunknowndream · 2 months ago
Text
INSIGHT ON K MEAN CLUSTERING
Tumblr media
📌 What is K-Means Clustering?
K-Means is an unsupervised machine learning algorithm used for clustering data points into K distinct groups based on their similarities. It works by:
Choosing K cluster centers (centroids) randomly.
Assigning each data point to the nearest centroid.
Updating centroids based on the mean of assigned points.
Repeating the process until centroids no longer change significantly.
✅ Advantages of K-Means Clustering
✔ Fast & Scalable – Efficient for large datasets, especially with optimized algorithms. ✔ Easy to Implement – Simple to understand and apply. ✔ Works Well on Well-Separated Clusters – Effective when data has distinct groups. ✔ Interpretable Results – Cluster centers provide insights into data patterns.
❌ Limitations of K-Means Clustering
🚫 Requires Choosing K in Advance – No built-in method to determine the optimal number of clusters. 🚫 Sensitive to Initial Centroid Placement – Different initializations can lead to different results. 🚫 Struggles with Complex Shapes – Works best with spherical clusters; fails with non-linearly separable data. 🚫 Outlier Sensitivity – Outliers can distort cluster assignments.
🔬 Applications of K-Means in Chemistry
K-Means is widely used in chemistry and material sciences for pattern recognition, classification, and analysis. Some key applications include:
1️⃣ Molecular Clustering & Drug Discovery
Groups molecules based on chemical properties (e.g., solubility, molecular weight).
Helps in virtual screening to identify potential drug candidates.
2️⃣ Spectroscopy Data Analysis
Clusters spectral data (e.g., Raman, IR, NMR spectra) to identify patterns in chemical compositions.
Useful in classifying unknown substances in forensic and environmental chemistry.
3️⃣ Materials Science & Nanotechnology
Groups materials based on crystallographic, thermal, or mechanical properties.
Assists in predicting new material behavior for industrial applications.
4️⃣ Environmental Chemistry & Pollution Monitoring
Clusters pollutants based on their chemical signatures in air, water, or soil samples.
Helps in identifying sources of contamination and tracking pollution levels.
5️⃣ Food Chemistry & Quality Control
Clusters food samples based on chemical composition, taste profiles, and contamination levels.
Useful in detecting food adulteration and ensuring product consistency.
0 notes
nomidls · 4 months ago
Text
Perceptron Neural Network: A Fundamental Building Block of Artificial Intelligence
Tumblr media
The Perceptron Neural Network is one of the most foundational concepts in the field of artificial intelligence and machine learning. Introduced by Frank Rosenblatt in 1958, the perceptron represents the simplest type of artificial neural network and is widely regarded as the cornerstone of deep learning systems used today.
What Is a Perceptron?
A perceptron is a computational model that mimics how neurons function in the human brain. It processes inputs, applies weights to them, and produces an output based on an activation function. This structure allows the perceptron to solve simple classification problems, such as determining whether an input belongs to one class or another.
The perceptron is composed of the following key components:
Inputs: The perceptron takes multiple input values, often represented as features in a dataset.
Weights: Each input is associated with a weight that signifies its importance.
Summation Function: The perceptron computes a weighted sum of the inputs.
Activation Function: The result of the summation function is passed through an activation function, such as a step function, to determine the perceptron’s output.
The perceptron operates based on a simple rule:
If the weighted sum of inputs exceeds a threshold, the perceptron outputs 1.
Otherwise, it outputs 0.
This binary decision-making capability allows the perceptron to perform linear classification tasks effectively.
How Perceptron Neural Networks Work
Perceptron neural networks consist of multiple perceptrons arranged in a single layer or across various layers. The basic perceptron is a single-layer neural network, but it can be extended to form a multi-layer perceptron (MLP), which can solve more complex, non-linear problems.
The perceptron learning algorithm is a supervised learning technique. It adjusts the weights of the inputs based on the error between the predicted output and the actual output, using an optimization process called gradient descent. This iterative process ensures that the perceptron learns and improves its performance over time.
Applications of Perceptron Neural Networks
Although simple, perceptron neural networks have paved the way for more advanced neural networks. They have applications in areas such as:
Pattern Recognition: Recognizing images, text, and speech patterns.
Data Classification: Categorizing data into predefined groups.
Predictive Analytics: Making forecasts based on historical data.
Limitations of Perceptrons
One notable limitation of perceptrons is that they can only solve linearly separable problems. For instance, they cannot handle problems like the XOR operation. This limitation was addressed with the introduction of multi-layer perceptrons and non-linear activation functions.
Conclusion
What is perceptron? Let's Know us The Perceptron Neural Network remains an essential concept in understanding modern AI systems. Its simplicity provides a clear introduction to how neural networks process information and make decisions. By building on this foundation, researchers and engineers have developed sophisticated deep-learning models capable of solving complex problems.
To learn more about perceptrons and artificial neural networks, visit NoMidl or contact us for expert guidance and resources in the field of AI.
0 notes
ingoampt · 10 months ago
Text
Day 12 _ Activation Function, Hidden Layer and non linearity
Understanding Non-Linearity in Neural Networks – Part 1 Understanding Non-Linearity in Neural Networks – Part 1 Non-linearity in neural networks is essential for solving complex tasks where the data is not linearly separable. This blog post will explain why hidden layers and non-linear activation functions are necessary, using the XOR problem as an example. What is Non-Linearity? Non-linearity…
0 notes
deploy111 · 11 months ago
Text
Task
This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included.
Your assignment is to run a k-means cluster analysis to identify subgroups of observations in your data set that have similar patterns of response on a set of clustering variables.
Data
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
Attribute Information:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
Iris Setosa
Iris Versicolour
Iris Virginica
Results
A k-means cluster analysis was conducted to identify classes of iris plants based on their similarity of responses on 4 variables that represent characteristics of the each plant bud. Clustering variables included 4 quantitative variables such as: sepal length, sepal width, petal length, and petal width.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. Then k-means cluster analyses was conducted on the training data specifying k=3 clusters (representing three classes: Iris Setosa, Iris Versicolour, Iris Virginica), using Euclidean distance.
To describe the performance of a classifier and see what types of errors our classifier is making a confusion matrix was created. The accuracy score is 0.82, which is quite good due to the small number of observation (n=150).
In [73]:import numpy as np import pandas as pd import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn import datasets from sklearn.cluster import KMeans from sklearn.metrics import accuracy_score from sklearn.decomposition import PCA import seaborn as sns %matplotlib inline rnd_state = 3927
In [2]:iris = datasets.load_iris() data = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target']) data.head()
Out[2]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target05.13.51.40.20.014.93.01.40.20.024.73.21.30.20.034.63.11.50.20.045.03.61.40.20.0
In [66]:data.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): sepal length (cm) 150 non-null float64 sepal width (cm) 150 non-null float64 petal length (cm) 150 non-null float64 petal width (cm) 150 non-null float64 target 150 non-null float64 dtypes: float64(5) memory usage: 5.9 KB
In [3]:data.describe()
Out[3]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)targetcount150.000000150.000000150.000000150.000000150.000000mean5.8433333.0540003.7586671.1986671.000000std0.8280660.4335941.7644200.7631610.819232min4.3000002.0000001.0000000.1000000.00000025%5.1000002.8000001.6000000.3000000.00000050%5.8000003.0000004.3500001.3000001.00000075%6.4000003.3000005.1000001.8000002.000000max7.9000004.4000006.9000002.5000002.000000
In [4]:pca_transformed = PCA(n_components=2).fit_transform(data.iloc[:, :4])
In [7]:colors=["#9b59b6", "#e74c3c", "#2ecc71"] plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", data.target))) plt.title('PCA on Iris data') plt.subplot(122) sns.countplot(data.target, palette=sns.color_palette(colors)) plt.title('Countplot Iris classes');
For visualization purposes, the number of dimensions was reduced to two by applying PCA analysis. The plot illustrates that classes 1 and 2 are not clearly divided. Countplot illustrates that our classes contain the same number of observations (n=50), so they are balanced.
In [85]:(predictors_train, predictors_test, target_train, target_test) = train_test_split(data.iloc[:, :4], data.target, test_size = .3, random_state = rnd_state)
In [86]:classifier = KMeans(n_clusters=3).fit(predictors_train) prediction = classifier.predict(predictors_test)
In [87]:pca_transformed = PCA(n_components=2).fit_transform(predictors_test)
Predicted classes 1 and 2 mismatch the real ones, so the code block below fixes that problem.
In [88]:prediction = np.where(prediction==1, 3, prediction) prediction = np.where(prediction==2, 1, prediction) prediction = np.where(prediction==3, 2, prediction)
In [91]:plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", target_test))) plt.title('PCA on Iris data, real classes'); plt.subplot(122) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", prediction))) plt.title('PCA on Iris data, predicted classes');
The figure shows that our simple classifier did a good job in identifing the classes, despite the few mistakes.
In [78]:clust_df = predictors_train.reset_index(level=[0]) clust_df.drop('index', axis=1, inplace=True) clust_df['cluster'] = classifier.labels_
In [79]:clust_df.head()
Out[79]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster05.72.84.51.3015.62.74.21.3027.13.05.92.1236.53.05.82.2245.93.04.21.50
In [80]:print ('Clustering variable means by cluster') clust_df.groupby('cluster').mean() Clustering variable means by cluster
Out[80]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster05.8590912.7909094.3431821.41590914.9897443.4256411.4717950.24871826.8863643.0909095.8545452.077273
In [92]:print('Confusion matrix:\n', pd.crosstab(target_test, prediction, colnames=['Actual'], rownames=['Predicted'], margins=True)) print('\nAccuracy: ', accuracy_score(target_test, prediction)) Confusion matrix: Actual 0 1 2 All Predicted 0.0 11 0 0 11 1.0 0 11 1 12 2.0 0 7 15 22 All 11 18 16 45 Accuracy: 0.8222222222222222
0 notes
samrats · 3 years ago
Text
Task
This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included.
Your assignment is to run a k-means cluster analysis to identify subgroups of observations in your data set that have similar patterns of response on a set of clustering variables.
Data
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
Attribute Information:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
Iris Setosa
Iris Versicolour
Iris Virginica
Results
A k-means cluster analysis was conducted to identify classes of iris plants based on their similarity of responses on 4 variables that represent characteristics of the each plant bud. Clustering variables included 4 quantitative variables such as: sepal length, sepal width, petal length, and petal width.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. Then k-means cluster analyses was conducted on the training data specifying k=3 clusters (representing three classes: Iris Setosa, Iris Versicolour, Iris Virginica), using Euclidean distance.
To describe the performance of a classifier and see what types of errors our classifier is making a confusion matrix was created. The accuracy score is 0.82, which is quite good due to the small number of observation (n=150).
In [73]:import numpy as np import pandas as pd import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn import datasets from sklearn.cluster import KMeans from sklearn.metrics import accuracy_score from sklearn.decomposition import PCA import seaborn as sns %matplotlib inline rnd_state = 3927
In [2]:iris = datasets.load_iris() data = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target']) data.head()
Out[2]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target05.13.51.40.20.014.93.01.40.20.024.73.21.30.20.034.63.11.50.20.045.03.61.40.20.0
In [66]:data.info() RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): sepal length (cm) 150 non-null float64 sepal width (cm) 150 non-null float64 petal length (cm) 150 non-null float64 petal width (cm) 150 non-null float64 target 150 non-null float64 dtypes: float64(5) memory usage: 5.9 KB
In [3]:data.describe()
Out[3]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)targetcount150.000000150.000000150.000000150.000000150.000000mean5.8433333.0540003.7586671.1986671.000000std0.8280660.4335941.7644200.7631610.819232min4.3000002.0000001.0000000.1000000.00000025%5.1000002.8000001.6000000.3000000.00000050%5.8000003.0000004.3500001.3000001.00000075%6.4000003.3000005.1000001.8000002.000000max7.9000004.4000006.9000002.5000002.000000
In [4]:pca_transformed = PCA(n_components=2).fit_transform(data.iloc[:, :4])
In [7]:colors=["#9b59b6", "#e74c3c", "#2ecc71"] plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", data.target))) plt.title('PCA on Iris data') plt.subplot(122) sns.countplot(data.target, palette=sns.color_palette(colors)) plt.title('Countplot Iris classes');
For visualization purposes, the number of dimensions was reduced to two by applying PCA analysis. The plot illustrates that classes 1 and 2 are not clearly divided. Countplot illustrates that our classes contain the same number of observations (n=50), so they are balanced.
In [85]:(predictors_train, predictors_test, target_train, target_test) = train_test_split(data.iloc[:, :4], data.target, test_size = .3, random_state = rnd_state)
In [86]:classifier = KMeans(n_clusters=3).fit(predictors_train) prediction = classifier.predict(predictors_test)
In [87]:pca_transformed = PCA(n_components=2).fit_transform(predictors_test)
Predicted classes 1 and 2 mismatch the real ones, so the code block below fixes that problem.
In [88]:prediction = np.where(prediction==1, 3, prediction) prediction = np.where(prediction==2, 1, prediction) prediction = np.where(prediction==3, 2, prediction)
In [91]:plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", target_test))) plt.title('PCA on Iris data, real classes'); plt.subplot(122) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", prediction))) plt.title('PCA on Iris data, predicted classes');
The figure shows that our simple classifier did a good job in identifing the classes, despite the few mistakes.
In [78]:clust_df = predictors_train.reset_index(level=[0]) clust_df.drop('index', axis=1, inplace=True) clust_df['cluster'] = classifier.labels_
In [79]:clust_df.head()
Out[79]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster05.72.84.51.3015.62.74.21.3027.13.05.92.1236.53.05.82.2245.93.04.21.50
In [80]:print ('Clustering variable means by cluster') clust_df.groupby('cluster').mean() Clustering variable means by cluster
Out[80]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster05.8590912.7909094.3431821.41590914.9897443.4256411.4717950.24871826.8863643.0909095.8545452.077273
In [92]:print('Confusion matrix:\n', pd.crosstab(target_test, prediction, colnames=['Actual'], rownames=['Predicted'], margins=True)) print('\nAccuracy: ', accuracy_score(target_test, prediction)) Confusion matrix: Actual 0 1 2 All Predicted 0.0 11 0 0 11 1.0 0 11 1 12 2.0 0 7 15 22 All 11 18 16 45 Accuracy: 0.8222222222222222
0 notes
suranjanam · 3 years ago
Text
Task
This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included.
Your assignment is to run a k-means cluster analysis to identify subgroups of observations in your data set that have similar patterns of response on a set of clustering variables.
Data
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
Attribute Information:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
Iris Setosa
Iris Versicolour
Iris Virginica
Results
A k-means cluster analysis was conducted to identify classes of iris plants based on their similarity of responses on 4 variables that represent characteristics of the each plant bud. Clustering variables included 4 quantitative variables such as: sepal length, sepal width, petal length, and petal width.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. Then k-means cluster analyses was conducted on the training data specifying k=3 clusters (representing three classes: Iris Setosa, Iris Versicolour, Iris Virginica), using Euclidean distance.
To describe the performance of a classifier and see what types of errors our classifier is making a confusion matrix was created. The accuracy score is 0.82, which is quite good due to the small number of observation (n=150).
In [73]:import numpy as np import pandas as pd import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn import datasets from sklearn.cluster import KMeans from sklearn.metrics import accuracy_score from sklearn.decomposition import PCA import seaborn as sns %matplotlib inline rnd_state = 3927
In [2]:iris = datasets.load_iris() data = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target']) data.head()
Out[2]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target05.13.51.40.20.014.93.01.40.20.024.73.21.30.20.034.63.11.50.20.045.03.61.40.20.0
In [66]:data.info() RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): sepal length (cm) 150 non-null float64 sepal width (cm) 150 non-null float64 petal length (cm) 150 non-null float64 petal width (cm) 150 non-null float64 target 150 non-null float64 dtypes: float64(5) memory usage: 5.9 KB
In [3]:data.describe()
Out[3]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)targetcount150.000000150.000000150.000000150.000000150.000000mean5.8433333.0540003.7586671.1986671.000000std0.8280660.4335941.7644200.7631610.819232min4.3000002.0000001.0000000.1000000.00000025%5.1000002.8000001.6000000.3000000.00000050%5.8000003.0000004.3500001.3000001.00000075%6.4000003.3000005.1000001.8000002.000000max7.9000004.4000006.9000002.5000002.000000
In [4]:pca_transformed = PCA(n_components=2).fit_transform(data.iloc[:, :4])
In [7]:colors=["#9b59b6", "#e74c3c", "#2ecc71"] plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", data.target))) plt.title('PCA on Iris data') plt.subplot(122) sns.countplot(data.target, palette=sns.color_palette(colors)) plt.title('Countplot Iris classes');
Tumblr media
For visualization purposes, the number of dimensions was reduced to two by applying PCA analysis. The plot illustrates that classes 1 and 2 are not clearly divided. Countplot illustrates that our classes contain the same number of observations (n=50), so they are balanced.
In [85]:(predictors_train, predictors_test, target_train, target_test) = train_test_split(data.iloc[:, :4], data.target, test_size = .3, random_state = rnd_state)
In [86]:classifier = KMeans(n_clusters=3).fit(predictors_train) prediction = classifier.predict(predictors_test)
In [87]:pca_transformed = PCA(n_components=2).fit_transform(predictors_test)
Predicted classes 1 and 2 mismatch the real ones, so the code block below fixes that problem.
In [88]:prediction = np.where(prediction==1, 3, prediction) prediction = np.where(prediction==2, 1, prediction) prediction = np.where(prediction==3, 2, prediction)
In [91]:plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", target_test))) plt.title('PCA on Iris data, real classes'); plt.subplot(122) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)), list(map(lambda tup: tup[1], pca_transformed)), c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", prediction))) plt.title('PCA on Iris data, predicted classes');
Tumblr media
The figure shows that our simple classifier did a good job in identifing the classes, despite the few mistakes.
In [78]:clust_df = predictors_train.reset_index(level=[0]) clust_df.drop('index', axis=1, inplace=True) clust_df['cluster'] = classifier.labels_
In [79]:clust_df.head()
Out[79]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster05.72.84.51.3015.62.74.21.3027.13.05.92.1236.53.05.82.2245.93.04.21.50
In [80]:print ('Clustering variable means by cluster') clust_df.groupby('cluster').mean() Clustering variable means by cluster
Out[80]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster05.8590912.7909094.3431821.41590914.9897443.4256411.4717950.24871826.8863643.0909095.8545452.077273
In [92]:print('Confusion matrix:\n', pd.crosstab(target_test, prediction, colnames=['Actual'], rownames=['Predicted'], margins=True)) print('\nAccuracy: ', accuracy_score(target_test, prediction)) Confusion matrix: Actual 0 1 2 All Predicted 0.0 11 0 0 11 1.0 0 11 1 12 2.0 0 7 15 22 All 11 18 16 45 Accuracy: 0.8222222222222222
0 notes
haleyjena · 3 years ago
Text
Top 19 Machine Learning Interview Questions
Tumblr media
Why reasons resulted in Machine learning introduction?
The simplest answer is for making our lives easier. In the early days of intelligent applications, numerous systems depended on hardcode rules of “if” and “else” decisions for processing data or adjusting the user input. Imagine spam filter whose job is to move the right incoming email messages to a spam folder.
With machine learning algorithms, one is offered ample information for the data to learn and identify patterns from the data. One is not required to write new rules for each problem in machine learning.
2. What are several Types of Machine Learning algorithms?
There are several machine learning algorithms. Broadly speaking Machine learning algorithms are divided in supervised, unsupervised, and reinforcement learning.
3. What is Supervised Learning?
Supervised learning simply putmachine learning algorithm of deducing a function from labelled training data. Some of the supervised learning algorithms are:
Support Vector Machines
Regression
Naive Bayes
Decision Trees
4. What is Unsupervised Learning?
Unsupervised learning is second type of ML algorithm considered for finding patterns on the set of data provided. In this one does not have to dependent on variable or label to predict.
Unsupervised learning algorithms include:
Clustering,
Anomaly Detection,
Neural Networks and Latent Variable Models.
In case you wish to gain more clarity then machine learning coding bootcamp can offer you the right guidance for successful career opportunities.
5. What is ‘Naive’ concept in Naive Bayes?
Naive Bayes methodology is a supervised learning algorithm; it is naive as it makes supposition by applying Bayes’ theorem that all characteristics are independent of each other. Consult a machine learning bootcamp to understand the technique and further tools for cracking the interview.
6. What is PCA? When do you use it?
Principal component analysis (PCA) is the most commonly used for dimension reduction and measures the variation in each variable. If there is little alteration, it throws the variable out.
Principal component analysis makes the dataset easy to visualize, and is used in finance, neuroscience, and pharmacology. It is further useful in pre-processing stage, when linear correlations are present between features. Consider coding bootcamp for learning tools and techniques.
7. Explain SVM Algorithm.
A SVM or Support Vector Machine is a strong and versatile supervised machine learning model, capable of performing linear or non-linear classification, outlier detection and regression.
8. What are Support Vectors in SVM?
Support Vector Machine (SVM) is an algorithm which makes fitting line between different classes that maximizes the distance from line to the points of the classes. In this manner, it tries to find a robust separation between classes. Support Vectors are points of edge of dividing hyper plane.
9. What are Different Kernels in SVM?
There are 6 types of kernels in SVM however, following four are widely used:
Linear Kernel- used when data is linearly separable.
Polynomial kernel — When one has discrete data that has no natural notion of efficiency.
Radial basis kernel — Is used for creating a decision boundary for doing a better job of separating two classes compared to the the linear kernel.
Sigmoid kernel — Is used as an activation function for neural networks.
10. What is Cross-Validation?
Cross-Validation is a method of splitting data in three parts- training, validation and testing. Data is split into K subsets, and models have trained on k-1 of the datasets. The last subset is held for testing and is conducted for each of the subsets. This is k-fold cross-validation. Lastly, the scores from all the k-folds are averaged for producing final score.
11. What is Bias in Machine Learning?
Bias in data indicates there is inconsistency in data. The inconsistency may be cause due to several reasons which are not reciprocally exclusive.
12. What is the Difference Between Classification and Regression?
Classification is used for producing discrete results whereas, classification is used for classifying data into some definite categories.
13. Define Precision and Recall?
Precision and recall are ways of monitoring power of machine learning implementation. But these are often used at the same time. Precision may inspect relevance whereas recall answers the questions. Basically, the meaning of precision is the fact of being exact and accurate. Same goes in machine learning models as well. In case one has set of items that model needs to predict to be relevant then it could answer how many items are truly relevant.
15. How to Tackle Overfitting and Underfitting?
Overfitting means model fitted for training data well, in this case, one needs to resample the data and estimate model accuracy using techniques like K-fold cross-validation. Whereas in case of underfitting one is not able to understand or capture the patterns from data, in such case, one needs to change the algorithms, or one needs to feed more data points in the model for accuracy.
16. What is a Neural Network?
Neural Network to put in simple words is model of human brain. Much like brain, it has neurons that activate when encountering something relatable. Different neurons are connected via connections which help information flow from one neuron to another.
17. What is Ensemble learning?
Ensemble learning is a method that joins multiple machine learning models for creating powerful models.
There are numerous reasons for a mode to be different. Some are:
Different Hypothesis
Different Population
Different Modelling techniques
When working with model’s training and testing data, one can experience an error. This error might be bias, irreducible error or variance.
Now model should have a balance between bias and variance, this one call a bias-variance trade-off. This ensemble learning is a manner to perform this trade-off. There are numerous ensemble techniques available but when aggregating multiple models there are general two methods- Bagging and Boosting.
18 . How does one make sure which Machine Learning Algorithm to use?
It solely depends on the dataset one has. If the data is discrete one makes use of SVM. If the dataset is continuous one uses linear regression. So, while there is no specific way for knowing which ML algorithm to use, it entirely depends on the exploratory data analysis (EDA)
19. How to Handle Outlier Values?
An outlier is an action in the dataset which is far away from other observations in the dataset. Tools used for discovering outlier are:
Z-score
Box plot
Z-score
Scatter plot
Conclusion,
The above listed questions cover the basics of machine learning. With the advancement in machine learning growing rapidly so in case one has to consider joining the communities, and cracking the interview machine learning bootcamp is the way forward.
Source: https://jenahaley54.medium.com/top-19-machine-learning-interview-questions-addee3317084
0 notes
cothers · 4 years ago
Text
Segments of Memory
When we look at the RAM naively, it is an array of bytes addressed linearly. In practice, it is a sparse array where types of data are converged in specific addresses called segments. Each segment serves a different purpose for computer programs and is supported by the assembly/machine language, the CPUs' mother tongue.
But before diving into this, let's remember one of our computers' architectural principles, which may look old but still current.
Von Neumann Architecture
The earliest computers had fixed programs. They were like a calculator which knows how to calculate two numbers and nothing else. Changing what it does requires extensive redesign and rewiring of the system.
In 1944, John von Neumann was involved in the ENIAC project. He invented the "stored program" concept there in which, programs are loaded and stored in the main memory of the computer like any data. He wrote a paper titled "First Draft of a Report on the EDVAC". This paper was circulated among the computer builders and "Von Neumann Architecture" was born.
Since 1945, nearly all computers are built upon this idea, including the one we created our examples.
Types of memories in C
Let's start with a complete example which manifests all kinds of memory types in C:
#include <stdio.h> #include <stdlib.h> int g; char *s = "Hello"; int main(int argc, char* argv[]) { int n; int *p = malloc(1); // Stack Segment printf("&argc %u\n", &argc); printf("&n %u\n", &n); printf("&p %u\n", &p); // Code Segment printf("main %u\n", main); // Data segment printf("&g %u\n", &g); printf("&s %u\n", &s); printf("s %u\n", s); // Heap printf("p %u\n", p); return 0; }
The word "segment" attached to code, data and stack names comes from the native language CPUs called "machine language". The machine language identifies those memory areas and provides convenient and fast instructions to reach them through offsets. Programming languages including C mimics that with their structuring of programs.
The output lists the addresses of different types of variables.
&argc 1048287484 &n 1048287500 &p 1048287504 main 2981486985 &g 2981498908 &s 2981498896 s 2981490692 p 3008201376
While the numbers may look similar, you can observe some convergences between them. Let's start with the most divergent ones, the variables that reside on the stack segment:
&argc 1048287 484 0 &n 1048287 500 +16 &p 1048287 504 +20
We will observe the stack extensively later, but for now, we can simply say that all the variables defined as function parameters and local variables reside in the stack segment.
When the program is loaded from an executable file (like .exe, .dll, .so) into the memory, its instructions and constants are placed in the code segment.
main 29814 86985 s 29814 90692 +3707
Global pointer variable s is pointed to the constant "HELLO" in the code segment. We know it because when we try to manipulate it via the following code;
s[1] = 'A';
It crashes with the good old "Segmentation Fault" because the CPU prevents writing into memory marked as "code".
Every program has global variables which are stored in the data segment:
&s 29814 98896 +0 &g 29814 98908 +12
Stack, code and data segments occupy fixed sizes as defined by the program. But most programs need to dynamically allocate memory as they run. We call this memory "heap". When we run the above program, it requested 1 byte of memory via malloc(), and it is located a little bit far away from other segments.
(burada modern language'lere gel. Cok fazla sayida naif heap kullanimi ile baslayip, sonra generational'a donduklerinden falan bahset)(Thread stack)
Modern Languages' Take on Segments
Let's rewrite the above program in Java, without the printf()s since we can't take the address of variables there:
public class Xyz { public static int g; public static String s = "Hello"; private int i; public static void main(String[] args) { int n; Xyz p = new Xyz(); } }
In this program, the global variables are defined as static members in the Xyz since no variables outside the class scope permitted in Java. But they are real global variables, stored in the data segment and reachable as Xyz.g and Xyz.s anywhere throughout the program.
The variables args, n and p are located in the stack as in C. The non-static class attribute (or property) i can only be reachable if allocated on the heap via operator "new". Here, p is a reference and occupies a space in the stack, but the object instance it represents is allocated on the heap.
Heap is not Cheap
Modern programming languages tend to overuse heap because the only available means of creating a data structure requires using the "new" operator.
In C, you define a structure like this:
struct PERSON { char name[100]; char surname[100]; int age; }
This structure has fixed size (204 bytes in 32-bit systems) and can be used in the data segment, stack segment and heap. Let's define the same struct in Java:
class Person { String name; String surname; int age; }
This structure should be placed on the heap. Since string type is a class whenever a value is assigned to it, it is also allocated separately.
Allocation on the heap is not a cheap operation as in data and stack segments. Heap is invented because dynamic allocation is needed, and it comes with a price; for each request, the algorithm should search for an empty space. We'll explore heaps further in a separate article, but for now, this information is adequate.
C# provides a "struct" structure just for this purpose. Its standard class library uses structs for trivial data structures:
struct Point { int X; int Y; }
This is pragmatic and serves the purpose until strings are encountered.
using System; using System.Runtime.InteropServices; namespace ConsoleApp2 { struct Person { internal string name; internal string surname; internal int age; } class Program { static void Main(string[] args) { Person p; p.name = "Michael"; p.surname = "Jordan"; p.age = 55; Console.WriteLine("Person size " + Marshal.SizeOf(p)); } } }
The output would be "Person size 24" (or 12 in 32-bit systems), or 8 bytes per member because the string's content is not included in the calculation. Here "Michael" and "Jordan" are allocated on the data segment at program start just as in C, but in reality, values come from somewhere else and space from the heap should be given for them.
It is possible to allocate fixed-size structs in C# with some magic:
... [StructLayout(LayoutKind.Sequential)] struct Person { [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 100)] internal string name; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 100)] internal string surname; internal int age; } ...
We can get 204 as in C this way. </stdlib.h></stdio.h>
#c
1 note · View note
mr2peak · 7 years ago
Text
Web programming is hard to do right
Web programming is hard to do right Creating a toy web service is easy. Creating a large robust and secure application is pure hell. Comprehensive software environments are for sale to help coders (WebSphere, Broadvision etc etc) - but why is it so hard in the first place? This documents attempts to set out the reason and also mentions the solution. Insuring your car Imagine that you need to get insurance for your car, and you go to an office to arrange for it. You enter the building and are directed to Desk 1, where an anonymous employee asks you for your name, but tells you to answer the question over at desk 2. Befuddled, you go there, and you state your name. Desk 2 has a very identical anonymous employee that thanks you, writes down your name on a piece of paper and gives it to you. Furthermore, he asks you what kind of car you have, and if you have ever claimed insurance on other cars, but please mention the answer to Desk 3.
At Desk 3, you show your piece of paper with your name on it, and you tell them you have a 1975 car, and that only last year, you had an accident with your other car. They note the make of your car, and 'Had accidents' on your piece of paper, and send you of to Desk 92.
On arriving there, you see that this is the Vintage Car Insurance desk. They look at the piece of paper you brought, and tell you what insuring your car is going to cost, and that you are not going to get a discount. This is all written down on your piece of paper, and you are told to go to Desk 5 to settle the payment.
You head there, with your piece of paper, and pay the amount specified on it. Your car is now insured.
Asynchronous stateless programming Does this sound the least bit convoluted to you? It is. But it is the way most web services operate these days. Because there is no permanent connection between you and the website you are visiting, each time you have performed part of an operation, your reappearance comes as a complete surprise to the webserver. The next step in the process is determined by which desk ('url') you walk to, and what is written down on your piece of paper. Each separate part of the operation must make sure that you went to the intended desk, and re-read your piece of paper, to see who you are and what you want.
Besides being complicated, this is also error prone. What if you decide to change your piece of paper? You could easily modify your accident history, and get a huge discount. Or go one better and while heading to the payment desk, change the amount of money you need to pay!
The real world The 'piece of paper modification problem' is real. Many merchant sites suffer from the 'choose your own price' problem mentioned in the previous paragraph. Clued programmers work around it. They don't store your data on a piece of paper they trust you with. Instead, they give you a token, and store all data on the webserver. You only carry the token around. When you arrive at a desk, the piece of paper corresponding to your token is retrieved. This only allows you to fiddle with your token, which given proper mathematics, is not going to work - the webserver detects a bogus token, and refuses service.
This token technique is hard, however, and the problem remains that you are free to try what happens with your token over at other desks. Perhaps you can skip the appraisal desk, and neglect to mention your history of accidents, who knows. Each desk operates on its own.
How did we arrive at this mess? Well, this is how webservers work! Each page is a desk, and each time you visit a script, it is started anew, with no information on what happened before. Static webpages contain no state - the famous 'index.html' may contain a clock, telling you the date, or perhaps some smartness in figuring out your preferred language - but is mostly static. When the web became more dynamic, coders did not leave the 'dynamic page' paradigm. Instead, they improved on the 'piece of paper passing' technique. A lot of tricks were evolved, for example, a Desk can be programmed to refer to itself. This is important for error handling - a user may labour under the impression that his car dates from 1875 instead of 1975. In our story above, Desk 3 would have to check your answer and send you back to Desk 2.
This creates a large distance between user input and error checking, which complicates coding. A smarter desk would contain both input and error checking - it would give you your paper, and instruct you to come back to the same desk again.
But given the connection-less nature of the web, even when redirecting to the original desk does not save you from passing around pieces of paper. Each pageview is a whole new event!
That's just the way it works, isn't it? The vast majority of web coders have never programmed anything else - this is due to the exponential growth of the web where newcomers will alway outnumber old hands by a large margin. To many of these newcomers, this 'event driven paper passing' technique may seem perfectly natural. Old farts however who may have programmed console applications, or even used 'BASIC' on their microcomputers have a very hard time. To them, most of work coding is spent on perfecting the transfer of data from one step of the website to another, and checking that the steps are executed in the right (intended, non-tampered) order.
This is not what they want to do - they want as many lines of code as possible to be involved with actually doing things, entering users in the database, processing payments and selling stuff.
How should it be then? In the old days, a program may have looked like this (in no specific language): 10 print "What is your name?"; 20 name=getLine(); 30 print "Your name is: " + name; 50 print "What kind of car do you have?"; 60 car=getLine(); 70 print "Insuring a care of make " + car + " costs: "; 100 print carCost(car) + " per year"; Even in this very simple language, this program makes sense. Actions are laid out linearly. You can read what happens, and in what order, by starting at the top, and reading on from there. Note how this is decidedly unlike the many-Desk horror story above! In fact, the code looks just like insuring a car should be, although heavily simplified. The employee asks a question, gets an answer, asks another question, does a calculation, and tells you what you want to know.
In many-Desk parlance, this code may look like this, again in no specific existing language:
index.html: What is your name? <form action=name.html> Your name: <input name=yourName> </form>
name.html: Your name is $yourName What kind of car do you have? <form action=car.html> <input type=hidden name=yourName value=$yourName> Your car: <input name=yourCar> </form>
car.html: $yourName, a $yourCar car costs to insure: $calcPrice($yourCar); Even without error checking, this is far more verbose and split out. Note the clever use of 'hidden form inputs' to pass your name to the next page. This is the infamous piece of paper! Now we improve the original program somewhat: 10 print "What is your name?"; 20 name=getLine(); 25* if(name=="") then goto 10; 30 print "Your name is: " + name; 50 print "What kind of car do you have?"; 60 car=getLine(); 65* if(noSuchCar(car)) then goto 50; 70 print "Insuring a care of make " + car + " costs: "; 100 print carCost(car) + " per year"; The lines marked with an asterisk are new and perform very simple error checking, making sure that you enter a non-empty name, and that your car make exists. If you make a mistake in either case, it will just ask you again, until you get it right. Now translate this to the many-Desk scenario. This is where the hurting seriously takes off. There are a myriad ways to handle a detected error. In our example, name.html may decide to also know about the original form and reprint it in case you forgot to enter your name. Or it may print a warning, and output a link directing you back to index.html, asking you to try again. Or it may do so for you, and forcibly send your browser back, passing an error message on your piece of paper.
Index.html would then read that piece of paper and tell you that you forgot to enter your name, and to please try again. Each of these methods has a problem and each of them is in wide use.
But it gets worse from here. Car.html also has to check if your name is still set, and if not, balk at the error. This gets even more imporant if there were a fourth file, payment.html! Each successive step has to perform steps to check if the state of things is as it should be.
Remedies These problems are well known and there are lots of ways to ameliorate them. It all revolves around the stateless event-driven nature of the web. The current solutions try to regain some statefulness. For example, many environments allow you to tag certain variables as 'persistent'. Through the right kind of magic, the value of $yourName might then survive from name.html to car.html without the use of the hidden form input. It is also possible to unify the three files into one 'persistent class'. Every variable within that class is then 'persistent', and travels automatically from page to page.
However, all these tricks fail to do more than gloss over the problem. Each pageview is a whole new event. Each stage has to check if previous states behaved. Each stage must make sure that it cannot be fooled by accessing it at the wrong moment (which as mentioned before would allow us to skip the 'accident history' page).
Why not go back to the old days? Compared to the many-Desks drama, the simple numbered program listed above sounds like heaven! Why did you ever move to this event driven nonsense? As mentioned, this is partly due to the 'dynamic page' paradigm, where some dynamic code is added to basically static webpages, ie, a clock telling you the current time. Such pages are inherently 'desk oriented'. Also, as the computer science people can tell you, nothing is as efficient as stateless interrupt driven operations. While the user is filling out his form, say, in between index.html and name.html, no resources are consumed, save perhaps for the piece of paper we have to store - a few bytes.
The dynamic page paradigm is ridiculously efficient. Even a meager webserver could run millions of sessions this way - because there is no session to speak of. In the absence of other stimuli, no computer professional can resist the pull of perfect efficiency.
The listed program above has to wait for the user to fill out his form. This 'waiting' consumes resources - but not a lot. By the time you are driving enough traffic for this to matter, you will have other problems.
A new (or old!) paradigm: Synchronous Web Programming The technical term for the simple program listing above is 'Synchronous'. It outputs a form and then waits, in place, for the user to respond. This paradigm is new for web programming, but is old hat for all other uses. Over the past 30 years it has served us well. It is time for us to integrate the web into our current practices, and treat it no different from other programs.
A sample implementation exists which has already proven that the synchronous paradigm lends itself very well to webdesign. It is expected that existing script languages can be adapted to synchronous operations.
Some example code  main()  {    string username;    if(doUserPasswordCheck(&username)) {      memberMenu();    }    print("Bye!");    die(); // session will die after page is viewed    }
 bool doUserPasswordCheck(string *user)  {    while(true) {            // loop forever      startTableForm();      // makes a pretty form in a table      formInput("Your username","username");      formPassword("Your password","password");      formSubmit("Login!");      formSubmit("Cancel");      formEnd();
     readVariables(); // wait for user input
     if(buttonPressed("Cancel")) // user canceled return false; // not logged in
     if(getVar("username").empty()) formError("username","<- must not be empty!"); // will be displayed on retry
     if(!userPassExists(getVar("username"),getVar("password"))) print("Password/username did not match database! Try again: ;<p>\n");      else {        *user=getVar("username"); return true;      }    }    return false; // we never get here  }
Hang on, this can never work!
Source Here: https://ds9a.nl/webcoding.html
1 note · View note
aqeeda · 5 years ago
Text
what video editing apps do instagramers use (1)
InShot App Video & amp; Photo editing program title> The changes are quick and easy. With the new effect "Reduce Bit Resolution" the user can reduce the number of colors of the displayed object. Better outlines of sprite objects. Program crash when working with DivX files. Program crash when activating certain firewall settings in the system. Error messages when writing to a network drive. Centering objects on the scene. Flip effect has been improved and added as a separate tool to the quick access panel. Improved memory optimization increases performance stability and prevents it from crashing while working on large video files. 'Pack project' feature added with an option to save and transfer a project file and all of its output (raw) resources to another computer. The DeLogo effect added to the Filters section with blur and pixel set presets. A format of timeline settings that has been modified with three modes to save the available scale. An object can now be made semi-transparent directly on the timeline. Added basic effects window with main adjustment effects, RGB and YUV curves, and quick rotation tools available in a control panel. It is easy to use and, thanks to different themes with color filters and suitable music, appealing clips can be cut quickly. Highlight certain problems you encounter and Softonic will take care of them as soon as possible. It means that a benign program is incorrectly marked as malicious due to a detection signature that is too broad or an algorithm used in an antivirus program. I find it increasingly annoying and arrogant to read primarily comments that are becoming increasingly rude.
Tumblr media
Now it is possible to copy objects on the timeline and the user interface supports drag & drop. Now all parameters of the sound effects can be edited visually.
Newbies or those who want to save time can apply stylish filters similar to Instagram in a single click.
Copying materials from this website is only permitted with the written approval of the website administration allowed.
VSDC Free Video Editor allows you to create differently shaped masks that hide, blur or highlight certain elements in your video.
This enables fluid movement, rotation, transformation and the exact positioning of the objects in relation to each other.
Apple is known for its amazing device quality, where camera performance is always the biggest edge for Apple smart devices.
The program supports many different web services and allows you to organize the collection of downloaded videos. VSDC Free Video Editor not only offers ready-made profiles for export to social networks, but also uploads your videos to YouTube directly from the app without changing windows or tabs. You can change the look of your video image to suit your needs using color blending. With a wide range of adjustable parameters, you can create a unique professional video. Newbies or those who want to save time can apply stylish filters similar to Instagram in a single click. In the free version, you unfortunately have to come to terms with advertising and an annoying watermark, which you can get rid of quickly with a premium upgrade. Although this costs $ 2.99, it also includes additional filters and effects. This enables fluid movement, rotation, transformation and the precise positioning of the objects in relation to each other. Have you used any of these photo and video editing apps to create insta stories? If so, which of the apps do you like best personally? I look forward to your feedback and tips, just write a comment on the post. HypeType is a free iPhone app. Copying of materials from this website is only permitted with the written approval of the website administration. - Download tool to download videos from a variety of websites and video services. In addition, the parameters can change their values ​​not only linearly, but also according to the curve, creating complicated and beautiful effects. Templates for the curves have been added in the parameter change editor. The trajectories of the non-linear parameters can have several paths, each of which can be both a line and a curve.
unusable compared to before
The data for this week is available free of charge after registration. This app is only available in the App Store for iPhone and iPad. Edit video with pictures and music, viedeo edit app with music and text. Try out effects, text, stickers, and music, and give every frame of your video a magical shock. Slide to edit, adjust effects with just a tap and present your work in a simple way.
0 notes
fitnesshealthyoga-blog · 6 years ago
Photo
Tumblr media
New Post has been published on https://fitnesshealthyoga.com/maternal-smoking-before-and-during-pregnancy-and-the-risk-of-sudden-unexpected-infant-death-articles/
Maternal Smoking Before and During Pregnancy and the Risk of Sudden Unexpected Infant Death | Articles
Abstract
OBJECTIVES: Maternal smoking during pregnancy is an established risk factor for sudden unexpected infant death (SUID). Here, we aim to investigate the effects of maternal prepregnancy smoking, reduction during pregnancy, and smoking during pregnancy on SUID rates.
METHODS: We analyzed the Centers for Disease Control and Prevention Birth Cohort Linked Birth/Infant Death Data Set (2007–2011: 20 685 463 births and 19 127 SUIDs). SUID was defined as deaths at <1 year of age with International Classification of Diseases, 10th Revision codes R95 (sudden infant death syndrome), R99 (ill-defined or unknown cause), or W75 (accidental suffocation or strangulation in bed).
RESULTS: SUID risk more than doubled (adjusted odds ratio [aOR] = 2.44; 95% confidence interval [CI] 2.31–2.57) with any maternal smoking during pregnancy and increased twofold between no smoking and smoking 1 cigarette daily throughout pregnancy. For 1 to 20 cigarettes per day, the probability of SUID increased linearly, with each additional cigarette smoked per day increasing the odds by 0.07 from 1 to 20 cigarettes; beyond 20 cigarettes, the relationship plateaued. Mothers who quit or reduced their smoking decreased their odds compared with those who continued smoking (reduced: aOR = 0.88, 95% CI 0.79–0.98; quit: aOR = 0.77, 95% CI 0.67–0.87). If we assume causality, 22% of SUIDs in the United States can be directly attributed to maternal smoking during pregnancy.
CONCLUSIONS: These data support the need for smoking cessation before pregnancy. If no women smoked in pregnancy, SUID rates in the United States could be reduced substantially.
Abbreviations:
aOR —
adjusted odds ratio
CDC —
Centers for Disease Control and Prevention
CI —
confidence interval
GAM —
generalized additive model
ICD-10 —
International Classification of Diseases, 10th Revision
SIDS —
sudden infant death syndrome
SUID —
sudden unexpected infant death
What’s Known on This Subject:
Approximately 3500 infants <1 year old die suddenly and unexpectedly each year in the United States. Previous research has revealed that maternal smoking during pregnancy is a known risk factor for sudden unexpected infant death (SUID).
What This Study Adds:
In this retrospective cross-sectional analysis (20 685 463 births and 19 127 SUIDs), we use advanced modeling techniques to quantitatively determine the effects of maternal smoking, smoking cessation, and smoking reduction in pregnancy on SUID rates with much higher resolution than previous studies.
In the United States, >3700 infants die annually from sudden unexpected infant death (SUID), which includes sudden infant death syndrome (SIDS), accidental suffocation and strangulation in bed, and ill-defined causes.1 Multiple epidemiologic studies have shown a strong relationship between maternal smoking and SIDS. Researchers of 1 meta-analysis reported a pooled risk associated with maternal prenatal smoking of nearly fourfold (risk ratio = 3.9; 95% confidence interval [CI] 3.8–4.1). Odds ratios (unadjusted) associated with postnatal maternal smoking range from 1.47 to 6.56.2 There are dose-dependent relationships between SIDS rates and both the number of cigarettes smoked prenatally3–5 and duration of smoke exposure postnatally.6,7 Moreover, there is compelling evidence that maternal smoking may play a causal role in SIDS deaths.2,8
Substantial work has been undertaken to understand the pathophysiology underlying this increased risk of sudden infant death. Abnormalities in major neurotransmitters, including serotonin and their receptors, have been well documented in the brainstems of SIDS infants,9–11 with experimental data supporting nicotine’s effects on respiration, autonomic regulation, chemosensitivity, sleep, and arousal.12–17 Maternal smoking has been linked to serotonergic abnormalities in important brainstem nuclei of SIDS infants.18,19 In animal models, nicotine increases serotonin release and alters the firing of serotonergic neurons in a dose-dependent manner.20,21 Serotonergic neuronal development may be disrupted by maternal smoking as early as the first trimester.18,19,22
Many conclude that maternal smoking is the strongest prenatal modifiable risk factor for SIDS in industrialized nations.23–25 Although previous research has focused on the association between pre- or postnatal smoking and sudden infant death, these studies have given limited attention to diagnostic preferences as they affect measured outcomes.26,27 Additionally, only 1 published study has provided details about prepregnancy smoking.28 Here, we use national vital statistics data29 and a logistic regression modeling approach to analyze maternal smoking behavior during pregnancy for all 2007–2011 US live births with complete smoking information (∼12 million births), using higher resolution than previous studies and expanding the analysis to the 3 major causes of SUID. Additionally, beginning in 2011, this data set recorded the number of cigarettes that mothers smoked in the 3 months prepregnancy. We thus analyzed maternal smoking and the risk of SUID by trimester by daily cigarette consumption, prepregnancy smoking levels, smoking reduction or cessation during pregnancy, and individual International Classification of Diseases, 10th Revision (ICD-10) cause of death to estimate population-attributable risk.
Methods
Study Design and Population
We conducted a retrospective, cross-sectional study to assess the relationship between SUID and self-reported maternal smoking before and during pregnancy, using data from the Centers for Disease Control and Prevention (CDC) Birth Cohort Linked Birth/Infant Death Data Set for births between 2007 and 2011.29 This data set does not include details about the frequency of autopsy or death scene investigation, although autopsy is an element of diagnostic criteria in SIDS.
We defined a SUID case as an infant (<365 days old) death with the following ICD-10 codes: R95 (SIDS), R99 (ill-defined and unknown cause), or W75 (accidental suffocation or strangulation in bed).
The analysis of the effects of maternal smoking before pregnancy was conducted by using only births in 2011 (3 134 781 total births, 2585 SUIDs; SUID rate 0.83 per 1000 live births), the first year the CDC reported on the number of daily cigarettes smoked in the 3 months before pregnancy.
We used the complete set of 20 685 463 births and 19 127 deaths and dichotomous smoking data (smoking versus no smoking) to estimate the number of deaths attributable to prenatal smoking.
Covariates
In this study, we aimed to make an inference regarding the effect of maternal prenatal smoking on SUID risk. We used regression adjustment of potential confounding variables to decrease bias and to improve the precision of estimates in the smoking-SUID association. On the basis of these calculations, we used the following covariates in all adjusted analyses: mother’s and father’s race and/or ethnicity/Hispanic origin, mother’s and father’s age, mother’s marital status, mother’s education, live birth order, number of prenatal visits, gestational length (weeks), delivery method (vaginal or cesarean), infant sex, and birth weight.
Statistical Analysis
To understand the relationship between the reported average number of cigarettes smoked per day and risk of SUID, we developed both a logistic regression model and a generalized additive model (GAM). CDC data include dichotomous data about maternal smoking (yes or no) for all births and also include daily number of cigarettes for 60% of births. To ensure that the data were consistent and that there was no bias effect, both sets of data were used to calculate adjusted odds ratios (aORs). The logistic regression model used the average number of cigarettes in the 3 trimesters as the predictor of primary interest, which was coded as a categorical variable, whereas the GAM used the same variable as the predictor of interest but coded as a continuous numerical variable. All logistic regression and GAM models were adjusted for covariates.
Using 2011 data and a logistic regression model, we assessed the increased risk from prepregnancy smoking using a variable that identified the smoking habits before and during pregnancy. We then used 3 logistic regression models to understand the effects of smoking in each trimester. The models were similar to the GAM, except that the daily number of cigarettes smoked in the 3 trimesters were modeled independently instead of averaging across all 3 trimesters.
We also examined the reduction in SUID risk when mothers quit or reduced the amount smoked compared with smokers who did not quit or reduce smoking during pregnancy. A new categorical variable was created to identify the mothers who smoked in the first trimester and then quit, reduced, or continued the number of daily cigarettes in later trimesters. If the number of cigarettes by the third trimester was 0, the mother was defined as having quit. If the number of total daily cigarettes in the second and third trimesters was less than the daily number of cigarettes in the first trimester multiplied by 2, the mother was categorized as a reduced smoker; those who continued to smoke the same amount (or more) were defined as continued smokers. In the model, we controlled for covariates and total number of cigarettes smoked during pregnancy.
To differentiate between SUID subcategories (R95, R99, and W75) and non-SUID causes of death, we developed separate logistic regression models to estimate the risk of each cause of death independently.
For estimating the proportion of SUID cases attributable to smoking, we used the same logistic regression model with a database in which all mothers were artificially set as nonsmokers.30 By assuming causation, the difference between the result of this model and actual SUID rates is the proportion of deaths that can be attributed to smoking.
Results
We investigated 20 685 463 births and 19 127 SUIDs during the years 2007–2011 (SUID rate 0.92 in 1000 live births). Of the births, 12 417 813 had complete prenatal smoking information, and of these cases, 10 737 met the SUID definition. In 2011, 11.5% of mothers smoked in the 3 months before pregnancy, and 8.9% smoked during pregnancy; 24.3% of smokers who smoked prepregnancy quit before the first trimester.
By using a dichotomous variable of smoking (yes or no), SUID risk more than doubled (aOR = 2.44; 95% CI 2.31–2.57) with any maternal smoking during pregnancy. The aOR was similar when calculated for cases in which there were data on number of cigarettes smoked during pregnancy (aOR = 2.40; 95% CI 2.23–2.59).
There was a positive correlation between average number of daily cigarettes during pregnancy and the risk of SUID (Table 1). This correlation was similar for each trimester when modeled independently (Fig 1), but the average number of cigarettes in the 3 trimesters together provided greater predictive power (Supplemental Fig 4). There was a twofold-increased SUID risk between no smoking and smoking 1 cigarette daily throughout pregnancy (aOR = 1.98; 95% CI, 1.73–2.28). For 1 to 20 cigarettes per day, the probability of SUID increased linearly, with each additional cigarette smoked per day increasing the odds by 0.07 (aOR = 0.07 × number of daily cigarettes + 1.91) (Supplemental Fig 4). In the GAM, we observed the same twofold increase for smoker versus nonsmoker (aOR = 1.96; 95% CI 1.72–2.23), a linear relationship in SUID risk for 1 to 20 cigarettes, and a flattening of the curve after 20+ daily cigarettes with much wider CIs because of fewer cases (Fig 2). For the population mode, between 1 and 20 cigarettes, the line fitted on the results of both logistic regression and GAM were the same (aOR = 0.07 × number of cigarettes +1.91).
TABLE 1
aORs of SUID for 0–20 Cigarettes
FIGURE 1
aORs of SUID given the average number of cigarettes (between 1 and 20) smoked daily by the mother per trimester.
FIGURE 2
Logistic regression and GAMs. Two different computational models, logistic regression and the GAM, plot the rate of SUID given the average daily number of reported cigarettes smoked by the mother across all 3 trimesters.
Of mothers who smoked during pregnancy, 55% did not reduce smoking during pregnancy, 20% quit smoking by the beginning of the third trimester, and 24% reduced their smoking. Those who quit or reduced smoking by the third trimester decreased the amount of smoking during pregnancy by an average of 58% and 33%, respectively. Compared with continued smokers, SUID risk in the reduced group was slightly decreased (aOR = 0.88; 95% CI 0.79–0.98), whereas those who quit exhibited the largest reduction in risk (aOR = 0.77; 95% CI 0.67–0.87).
Compared with mothers who did not smoke in the 3 months before or during pregnancy, SUID risk progressively increased for those who smoked before pregnancy and quit before pregnancy (aOR = 1.47; 95% CI 1.16–1.87), those who did not smoke before but smoked during pregnancy (aOR = 2.22; 95% CI 1.15–4.29), and those who smoked before and during pregnancy (aOR = 2.52; 95% CI 2.25–2.83). For mothers who smoked prepregnancy only, the number of cigarettes smoked prepregnancy did not have a significant association with a change in SUID risk.
The logistic regression models plotting the correlation between maternal smoking and specific ICD-10 codes for SUID revealed a statistically significant dose-effect relationship between maternal smoking and odds of R95, R99, and W75 (Fig 3). Conversely, non-SUID causes of death, including P07.2 (extreme immaturity of the newborn), P07.3 (prematurity), and P01.1 (newborn affected by premature rupture of membranes) did not exhibit this positive dose-response relationship (Fig 3).
FIGURE 3
aORs of specific causes of SUID and non-SUID infant death. Comparison is between aORs of specific causes of infant death, including R95 (SIDS), R99 (ill-defined or unknown cause of mortality), and W75 (accidental suffocation or strangulation in bed), and other non-SUID causes of infant death, including P07.2 (extreme immaturity of the newborn), P07.3 (prematurity), and P01.1 (newborn affected by premature rupture of membranes).
Assuming causality, an estimated ∼800 infants per year, or 22% of all SUID cases in the United States, were attributed to maternal smoking during pregnancy.
Discussion
Public health campaigns launched in the 1990s educating parents about the importance of infant sleep position and environment led to a ∼50% decrease in US SIDS rates.31 As prevalence of prone sleeping has declined, the relative contribution of prenatal maternal smoking to the risk of sudden infant death has increased.2 We found that any smoking during pregnancy was associated with a doubling in SUID risk. Additionally, if mothers quit or reduced smoking during pregnancy, the relative risk of SUID decreased compared with those who continued smoking. Although the average number of cigarettes across the 3 trimesters held greater predictive power, the increase in SUID risk due to prenatal maternal smoking was seen even when each trimester was modeled independently, suggesting that smoking during any trimester is associated with increased SUID risk. However, this phenomenon is at least partly explained by a high correlation between smoking in the first trimester and smoking in subsequent trimesters. In each model, there was a twofold risk for smokers who smoked at least 1 cigarette.
There was a linear correlation between average number of daily cigarettes smoked and increased risk for SUID. Similar dose-dependent trends have been described previously,32,33 but not with such resolution or sample size. In the GAM, the curve began to plateau after >20 cigarettes per day, suggesting that smoking cessation efforts may have greater impact on decreasing SUID rates when directed toward those who smoke fewer than 1 pack per day versus the more traditionally targeted heavy (>20 cigarettes per day) smokers.
Compared with the pregnant smokers who did not reduce their smoking during pregnancy (more than half), those who reduced the number of cigarettes smoked by the third trimester demonstrated a modest (12%) decrease in the risk of SUID, and quitting by the third trimester was associated with a greater reduction in risk (by 23%). However, there may be some selection bias because the group who reduced smoking started at a higher average number of cigarettes in the first trimester, whereas those who successfully quit smoked fewer cigarettes in the first trimester.
The largest predictor of SUID risk with maternal prenatal smoking was the average number of cigarettes smoked daily over the 3 trimesters. Thus, a woman who smoked 20 cigarettes per day in the first trimester and reduced to 10 cigarettes per day in subsequent trimesters had a similarly reduced SUID risk as a woman who averaged 13 cigarettes per day in each trimester. Public health promotion should specifically encourage women to quit before pregnancy. Furthermore, pregnant smokers seeking prenatal care in the first trimester should be strongly advised that the greatest benefit for reducing SUID risk unequivocally results from quitting but also that any reduction in the number of cigarettes smoked is associated with a small decrease in risk.
Although smoking has decreased overall in the United States in recent years, 11.6% of mothers reported smoking in the 3 months before pregnancy in 2011. Of these, only one-quarter stopped smoking for the duration of the pregnancy. The adjusted odds for SUID were slightly but significantly increased (aOR = 1.47; 95% CI 1.16–1.87) in cases wherein the mother smoked prepregnancy but quit during the pregnancy compared with those who never smoked. Part of this increase could be due to environmental tobacco exposure because it is not uncommon for those who smoke to have a partner who also smokes34; it is also likely that a proportion of women who smoked prepregnancy and quit during pregnancy restarted in the postpartum period.35 This group may have also included women who stopped smoking as soon as they knew they were pregnant and thus reported that they were nonsmokers in the first trimester, but the fetus had been exposed to maternal smoking during the period before pregnancy was diagnosed. Interestingly, the increased odds ratio was similar regardless of how many cigarettes were smoked during the 3 months prepregnancy. Although the study adjusted for many potential confounders, residual confounding, especially with socioeconomic factors, might explain this finding. There may also be other exposures (eg, women who drink alcohol during pregnancy, another potent risk factor for SUID, are more likely to smoke at moderate, high, and very high continuous levels as compared with women classified as nondrinkers and quitters).36
The relationship between smoking and rates for R95, R99, and W75 diagnoses individually revealed similar linear trends. These findings support the idea that, despite differing labels on the death certificate, there may be commonalities in intrinsic and/or extrinsic factors, and these deaths should consistently be considered together as SUID. Interestingly, specific non-SUID causes of death, including P07.2, P07.3, and P01.1, did not reveal dose-effect relationships with smoking. This was unexpected because smoking increases the risk of preterm birth, which is associated with higher mortality and morbidity.37
Researchers in various countries, including New Zealand (33%),2 Chile (33%),38 Denmark (30%–40%),4 and the United States (23%–34%),39 have attempted to estimate the percentage of SIDS and/or SUID attributable to prenatal smoking. In this study, we employed sophisticated statistical analyses in combination with high population numbers to allow for greater granularity in estimating population-attributable risk for prenatal smoke exposure. The relationship between smoking and SUID meet the criteria for a causal association,40 including (1) strength (effect size; the magnitude of the risk is strong), (2) a dose-effect relationship (a linear relationship between number of cigarettes and SUID risk), (3) temporal relationship (the risk factor [smoking] precedes the event [death]), (4) consistency of findings (smoking is identified as a risk factor in many studies), (5) biological plausibility,2,3 and (6) the reduction in risk with smoking reduction and cessation. If causality is assumed in our model, we estimate that ∼22% of all US SUID cases are directly attributable to smoking (ie, if every mother did not smoke during pregnancy, there would have been an estimated 800 fewer SUIDs in the United States in 2011 alone). This suggests that a significant reduction in SIDS incidence might occur if the prevalence of maternal smoking was reduced.
This study is limited by the likely conservative smoking estimates because our data set does not include environmental smoke exposure during pregnancy or in the postpartum period, including paternal smoking, which has an independent influence on SIDS risk.3,41 In addition, smoking rates are self-reported. Because it is widely known that smoking is an unhealthy behavior, it is likely that some women underestimated or denied their true smoking habits. Indeed, in studies documenting serum cotinine levels, maternal self-reported smoking status during pregnancy underestimated smoking prevalence by >20%.42,43 Finally, only 60% of births had data about the number of cigarettes smoked. However, the missing data were not related to maternal characteristics but instead to the adoption of the 2003 revision of the US Standard Certificate of Live Birth, and therefore had minimal effect on the estimates.
Conclusions
Educational efforts to decrease SUID risk should strongly encourage nonsmoking practices before pregnancy and smoking cessation during pregnancy. Those who are unable to quit entirely should be advised to reduce the amount smoked. We estimate that US SUID rates could be reduced by 22% if no women smoked during pregnancy.
Acknowledgments
Dr Mitchell was supported in part by the Gavin and Ann Kellaway Medical Research Fellowship. We thank Kelty Allen, Avleen Bijral, Urszula Chajewska, Ricky Johnston, Sushama Murthy, and John Thompson for statistical guidance and useful discussion, and John Kahan and Daniel Rubens for inspiring this collaboration.
Footnotes
Accepted January 16, 2019.
Address correspondence to Tatiana M. Anderson, PhD, Seattle Children’s Research Institute, Center for Integrative Brain Research, 1900 Ninth Ave, Seattle, WA 98101. E-mail: tatianaaatuw.edu
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Supported by the National Institutes of Health (grants P01HL0906654 and R01HL126523 awarded to J.M.R.), Microsoft, and the Aaron Matthew Sudden Infant Death Syndrome Research Guild. Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: Dr Moon has served as a paid medical expert in a case of unexpected sudden infant death; the other authors have indicated they have no potential conflicts of interest to disclose.
Copyright © 2019 by the American Academy of Pediatrics
Source link
0 notes
haleyjena · 4 years ago
Text
Machine Learning Interview Questions
Why reasons resulted in Machine learning introduction?
The simplest answer is for making our lives easier. In the early days of intelligent applications, numerous systems depended on hardcode rules of “if” and “else” decisions for processing data or adjusting the user input. Imagine spam filter whose job is to move the right incoming email messages to a spam folder.
With machine learning algorithms, one is offered ample information for the data to learn and identify patterns from the data. One is not required to write new rules for each problem in machine learning.
2. What are several Types of Machine Learning algorithms?
There are several machine learning algorithms. Broadly speaking Machine learning algorithms are divided in supervised, unsupervised, and reinforcement learning.
3.What is Supervised Learning?
Supervised learning simply putmachine learning algorithm of deducing a function from labelled training data. Some of the supervised learning algorithms are:
Support Vector Machines
Regression
Naive Bayes
Decision Trees
4. What is Unsupervised Learning?
Unsupervised learning is second type of ML algorithm considered for finding patterns on the set of data provided. In this one does not have to dependent on variable or label to predict.
Unsupervised learning algorithms include:
Clustering,
Anomaly Detection,
Neural Networks and Latent Variable Models.
In case you wish to gain more clarity then machine learning coding bootcamp can offer you the right guidance for successful career opportunities.
5. What is ‘Naive’ concept in Naive Bayes?
Naive Bayes methodology is a supervised learning algorithm; it is naive as it makes supposition by applying Bayes’ theorem that all characteristics are independent of each other. Consult a machine learning bootcamp to understand the technique and further tools for cracking the interview.
6. What is PCA? When do you use it?
Principal component analysis (PCA) is the most commonly used for dimension reduction and measures the variation in each variable. If there is little alteration, it throws the variable out.
Principal component analysis makes the dataset easy to visualize, and is used in finance, neuroscience, and pharmacology. It is further useful in pre-processing stage, when linear correlations are present between features. Consider coding bootcamp for learning tools and techniques.
7. Explain SVM Algorithm.
A SVM or Support Vector Machine is a strong and versatile supervised machine learning model, capable of performing linear or non-linear classification, outlier detection and regression.
8. What are Support Vectors in SVM?
Support Vector Machine (SVM) is an algorithm which makes fitting line between different classes that maximizes the distance from line to the points of the classes. In this manner, it tries to find a robust separation between classes. Support Vectors are points of edge of dividing hyper plane.
9. What are Different Kernels in SVM?
There are 6 types of kernels in SVM however, following four are widely used:
Linear Kernel- used when data is linearly separable.
Polynomial kernel - When one has discrete data that has no natural notion of efficiency.
Radial basis kernel - Is used for creating a decision boundary for doing a better job of separating two classes compared to the the linear kernel.
Sigmoid kernel - Is used as an activation function for neural networks.
10. What is Cross-Validation?
Cross-Validation is a method of splitting data in three parts- training, validation and testing. Data is split into K subsets, and models have trained on k-1 of the datasets. The last subset is held for testing and is conducted for each of the subsets. This is k-fold cross-validation. Lastly, the scores from all the k-folds are averaged for producing final score.
11. What is Bias in Machine Learning?
Bias in data indicates there is inconsistency in data. The inconsistency may be cause due to several reasons which are not reciprocally exclusive.
12. What is the Difference Between Classification and Regression?
Classification is used for producing discrete results whereas, classification is used for classifying data into some definite categories.
13. Define Precision and Recall?
Precision and recall are ways of monitoring power of machine learning implementation. But these are often used at the same time. Precision may inspect relevance whereas recall answers the questions. Basically, the meaning of precision is the fact of being exact and accurate. Same goes in machine learning models as well. In case one has set of items that model needs to predict to be relevant then it could answer how many items are truly relevant.
15. How to Tackle Overfitting and Underfitting?
Overfitting means model fitted for training data well, in this case, one needs to resample the data and estimate model accuracy using techniques like K-fold cross-validation. Whereas in case of underfitting one is not able to understand or capture the patterns from data, in such case, one needs to change the algorithms, or one needs to feed more data points in the model for accuracy.
16. What is a Neural Network?
Neural Network to put in simple words is model of human brain. Much like brain, it has neurons that activate when encountering something relatable. Different neurons are connected via connections which help information flow from one neuron to another.
17. What is Ensemble learning?
Ensemble learning is a method that joins multiple machine learning models for creating powerful models.
There are numerous reasons for a mode to be different. Some are:
Different Hypothesis
Different Population
Different Modelling techniques
When working with model’s training and testing data, one can experience an error. This error might be bias, irreducible error or variance.
Now model should have a balance between bias and variance, this one call a bias-variance trade-off. This ensemble learning is a manner to perform this trade-off. There are numerous ensemble techniques available but when aggregating multiple models there are general two methods- Bagging and Boosting.
18 . How does one make sure which Machine Learning Algorithm to use?
It solely depends on the dataset one has. If the data is discrete one makes use of SVM. If the dataset is continuous one uses linear regression. So, while there is no specific way for knowing which ML algorithm to use, it entirely depends on the exploratory data analysis (EDA)
19. How to Handle Outlier Values?
An outlier is an action in the dataset which is far away from other observations in the dataset. Tools used for discovering outlier are:
Z-score
Box plot
Z-score
Scatter plot
Conclusion,
The above listed questions cover the basics of machine learning. With the advancement in machine learning growing rapidly so in case one has to consider joining the communities, and cracking the interview��machine learning bootcamp is the way forward.
Source: https://machinelearningprogrammer.blogspot.com/2021/10/machine-learning-interview-questions.html
0 notes
thedatasciencehyderabad · 4 years ago
Text
On-line IIMJobs conducts job fairs and facilitates recruitment drives with high hiring companies. These profession fairs will be carried out in cities, such as Mumbai, Bengaluru, Pune, NCR-Delhi, Hyderabad, and Chennai. Dedicated profession specialists from IIMJobs will completely handhold learners to register on the portal, present them suggestions and steering to improve their profile and comply with the best key phrases. The course begins with an introduction to ideas in arithmetic, statistics and knowledge science. Students receive instruction in the world's hottest languages - Python and R. As part of this module learn about another Deep Learning algorithm SVM which can be a black field method. SVM is about creating boundaries for classifying information in multidimensional areas. These boundaries are called hyperplanes which can be linear or non-linear boundaries which segregate the classes to a most margin potential. Learn about kernel tricks utility to transform the data into excessive dimensional areas to categorise the non-linear spaces into linearly separable knowledge. Neural Network is a black field method used for deep learning fashions. Hyderabad presently has around 1200+ job vacancies for various positions within the field of knowledge science. This quantity is expected to extend at the same time as many sectors have been adapting data science. It is feasible to be taught data science by yourself, so long as you keep centered and motivated. Luckily, there are lots of on-line courses and boot camps available. Start by determining what interests you about data science. If you gravitate to visualizations, start learning about them. Let’s begin from the bottom of Maslow’s pyramid of human needs, which you safe with money. According to Glassdoor, in 2020 data science was the single highest paid profession. Nothing could possibly be further from the truth – data scientists are few and much between, and highly wanted. IBM predicts demand for knowledge scientists will soar 28% by 2020. Data Science is the field that includes of everything that related to data cleaning, information mining, information preparation, and information analysis. Word cloud, Principal Component Analysis, Bigrams & Trigrams A word cloud is a knowledge visualization method, which is used to characterize text knowledge. This module will make you learn every little thing about Word cloud, Principal Component Analysis, Bigrams, and Trigrams used in Data Visualization. Text cleaning, regular expressions, Stemming, Lemmatization Text Cleaning is a essential procedure to emphasise the attributes for your machine studying mannequin to decide on. Regular Expression is a language that states text search strings. Stemming is a method utilized in Natural Language Processing , which plucks out the bottom form of phrases by the removal of affixes from the words. Lemmatization is another generally used NLP approach, which combines the totally different inflected word forms to be analysed as a single item. ADF, Random stroll and Auto Arima In this module, you'll learn about ADF, Random walk, and Auto Arima methods used in Time Series. Besides being taught by an outstanding set of college, additionally, you will be taught by senior business leaders who would also ship specific modules of the course. Explore right here extra about Great Learning - Data Science Faculty. Vidhya mam is likely one of the greatest college I even have seen in my whole learning path. Kindly assign Vidhya mam for all our upcoming programs as well. The DSE Program has been designed to help candidates jumpstart their careers within the field of Data Science. SQL is a question language designed for relational databases. Data scientists cope with massive amounts of knowledge, and they retailer plenty of that information in relational databases. Other languages similar to Java, C++, JavaScript, and Scala are additionally used, albeit less so. If you already have a background in those languages, you'll be able to explore the instruments obtainable in these
languages. However, if you already know another programming language, you'll doubtless be able to decide up Python very quickly. We are anyway offering necessary programming abilities to execute Data Science tasks. Hackathon Hackathon is an occasion usually hosted by a tech organisation, the place laptop programmers gather for a short interval to collaborate on a software program project. I actually have discovered Data Science classes from Mr.K Venkata Rao. Having huge experience in Business analytic and data science along with his outstanding instructing skills, he helped me to rapidly catch with the data science subjects. Even as many information scientists research and analyze big datasets for fixing an issue, data science is more about producing a sophisticated model that may cause a huge impact in the space of your work. A knowledge scientist is not only an information cruncher however he's additionally a problem solver, he's a strategist who discovers the best plan that matches your business downside. Most of the ambiguous issues in a number of sectors have been solved by the appliance of the methods of information science. Learning knowledge science requires specialised technical expertise together with knowledge of programming basics and analytics instruments to get started. However, this data science course explains all of the relevant ideas from scratch, so you will discover it simple to put your new abilities to use. You will earn an industry-acknowledged certificate from IBM and Simplilearn that can attest to your new expertise and on-the-job expertise. This Tableau certification course helps you master Tableau Desktop, a world-wide utilized information visualization, reporting, and enterprise intelligence tool. Advance your profession in analytics by learning Tableau and the way to greatest use this training in your work. This Data Scientist Master’s program covers in depth Data Science training, combining on-line instructor-led classes and self-paced learning co-developed with IBM. The program concludes with a capstone project designed to reinforce the educational by constructing an actual business product encompassing all the key features realized throughout this system. Venkat, Alumni of IIM Kolkata & ISI Kolkata, having 15 years of expertise in Advanced Data Analytics, Consulting and Training. As a tutor you can connect with greater than 1,000,000 college students and grow your community. Upon completion of the following minimal requirements, you'll be eligible to receive the Data Scientist Master’s certificate that will testify to your abilities as an skilled in Data Science. No matter which space you are in Hyderabad, be it Madhapur, Vijay Nagar Colony, Banjara Hills, Up pal, Begumpet, Sanjeeb Reddy Nagar, Moosapet, Kutkatpally wherever. Learn the way to make use of excel and get perception into the knowledge in diversified varieties and varieties. Choose from programmes specially curated to swimsuit each professional’s training wants. On submission of all assignments, you will obtain a Course Completion Certificate. A pattern of the data science certificates is out there on our website in your reference. Learn about bettering reliability and accuracy of choice tree fashions using ensemble methods. Bagging and Boosting are the go to techniques in ensemble methods.
Navigate to Address: 360DigiTMG - Data Analytics, Data Science Course Training Hyderabad 2-56/2/19, 3rd floor,, Vijaya towers, near Meridian school,, Ayyappa Society Rd, Madhapur,, Hyderabad, Telangana 500081 099899 94319
0 notes
rewat500-blog · 5 years ago
Text
k-Means
This week’s assignment involves running a k-means cluster analysis. Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included.
Your assignment is to run a k-means cluster analysis to identify subgroups of observations in your data set that have similar patterns of response on a set of clustering variables.
Data
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
Attribute Information:
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
Iris Setosa
Iris Versicolour
Iris Virginica
Results
A k-means cluster analysis was conducted to identify classes of iris plants based on their similarity of responses on 4 variables that represent characteristics of the each plant bud. Clustering variables included 4 quantitative variables such as: sepal length, sepal width, petal length, and petal width.
Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. Then k-means cluster analyses was conducted on the training data specifying k=3 clusters (representing three classes: Iris Setosa, Iris Versicolour, Iris Virginica), using Euclidean distance.
To describe the performance of a classifier and see what types of errors our classifier is making a confusion matrix was created. The accuracy score is 0.82, which is quite good due to the small number of observation (n=150).
In [73]:
import numpy as np import pandas as pd import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn import  datasets from sklearn.cluster import KMeans from sklearn.metrics import accuracy_score from sklearn.decomposition import PCA import seaborn as sns %matplotlib inline rnd_state = 3927
In [2]:
iris = datasets.load_iris() data = pd.DataFrame(data= np.c_[iris['data'], iris['target']],                     columns= iris['feature_names'] + ['target']) data.head()
Out[2]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
05.13.51.40.20.0
14.93.01.40.20.0
24.73.21.30.20.0
34.63.11.50.20.0
45.03.61.40.20.0
In [66]:
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): sepal length (cm)    150 non-null float64 sepal width (cm)     150 non-null float64 petal length (cm)    150 non-null float64 petal width (cm)     150 non-null float64 target               150 non-null float64 dtypes: float64(5) memory usage: 5.9 KB
In [3]:
data.describe()
Out[3]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target
count150.000000150.000000150.000000150.000000150.000000
mean5.8433333.0540003.7586671.1986671.000000
std0.8280660.4335941.7644200.7631610.819232
min4.3000002.0000001.0000000.1000000.000000
25%5.1000002.8000001.6000000.3000000.000000
50%5.8000003.0000004.3500001.3000001.000000
75%6.4000003.3000005.1000001.8000002.000000
max7.9000004.4000006.9000002.5000002.000000
In [4]:
pca_transformed = PCA(n_components=2).fit_transform(data.iloc[:, :4])
In [7]:
colors=["#9b59b6", "#e74c3c", "#2ecc71"] plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)),            list(map(lambda tup: tup[1], pca_transformed)),            c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", data.target))) plt.title('PCA on Iris data') plt.subplot(122) sns.countplot(data.target, palette=sns.color_palette(colors)) plt.title('Countplot Iris classes');
For visualization purposes, the number of dimensions was reduced to two by applying PCA analysis. The plot illustrates that classes 1 and 2 are not clearly divided. Countplot illustrates that our classes contain the same number of observations (n=50), so they are balanced.
In [85]:
(predictors_train, predictors_test, target_train, target_test) = train_test_split(data.iloc[:, :4], data.target, test_size = .3, random_state = rnd_state)
In [86]:
classifier = KMeans(n_clusters=3).fit(predictors_train) prediction = classifier.predict(predictors_test)
In [87]:
pca_transformed = PCA(n_components=2).fit_transform(predictors_test)
Predicted classes 1 and 2 mismatch the real ones, so the code block below fixes that problem.
In [88]:
prediction = np.where(prediction==1, 3, prediction) prediction = np.where(prediction==2, 1, prediction) prediction = np.where(prediction==3, 2, prediction)
In [91]:
plt.figure(figsize=(12,5)) plt.subplot(121) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)),            list(map(lambda tup: tup[1], pca_transformed)),            c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", target_test))) plt.title('PCA on Iris data, real classes'); plt.subplot(122) plt.scatter(list(map(lambda tup: tup[0], pca_transformed)),            list(map(lambda tup: tup[1], pca_transformed)),            c=list(map(lambda col: "#9b59b6" if col==0 else "#e74c3c" if col==1 else "#2ecc71", prediction))) plt.title('PCA on Iris data, predicted classes');
The figure shows that our simple classifier did a good job in identifing the classes, despite the few mistakes.
In [78]:
clust_df = predictors_train.reset_index(level=[0]) clust_df.drop('index', axis=1, inplace=True) clust_df['cluster'] = classifier.labels_
In [79]:
clust_df.head()
Out[79]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)cluster
0 5.7 2.8 4.5 1.3 0
1 5.6 2.7 4.2 1.3 0
2 7.1 3.0 5.9 2.1 2
3 6.5 3.0 5.8 2.2 2
4 5.9 3.0 4.2 1.5 0
In [80]:
print ('Clustering variable means by cluster') clust_df.groupby('cluster').mean()
Clustering variable means by cluster
Out[80]:sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)
cluster
0 5.859091 2.790909 4.343182 1.415909
1 4.989744 3.425641 1.471795 0.248718
2 6.886364 3.090909 5.854545 2.077273
In [92]:
print('Confusion matrix:\n', pd.crosstab(target_test, prediction, colnames=['Actual'], rownames=['Predicted'], margins=True)) print('\nAccuracy: ', accuracy_score(target_test, prediction))
Confusion matrix: Actual      0   1   2  All Predicted                 0.0        11   0   0   11 1.0         0  11   1   12 2.0         0   7  15   22 All        11  18  16   45 Accuracy:  0.8222222222222222
0 notes