#2 pixels each then a couple signature that will be split up . . . ^_^ | Explore Tumblr posts and blogs

felord · 6 years ago

Text

CS 189 Introduction to Machine Learning HW1 Solved

This homework is comprised of two parts. The first part consists of a set of coding exercises. The second part consists of math problems. Start this homework early! You can only submit to Kaggle twice a day. Deliverables: Submit your predictions for the test sets to Kaggle as early as possible. Include your Kaggle scores in your write-up (see below). Submit a PDF of your homework, with an appendix listing all your code, to the Gradescope assignment entitled “HW1 Write-Up”. You may typeset your homework in LaTeX or Word (submit PDF format, not .doc/.docx format) or submit neatly handwritten and scanned solutions. Please start each question on a new page. If there are graphs, include those graphs in the correct sections. Do not put them in an appendix. We need each solution to be self-contained on pages of its own. In your write-up, please state with whom you worked on the homework. • In your write-up, please copy the following statement and sign your signature next to it. (Mac Preview and FoxIt PDF Reader, among others, have tools to let you sign a PDF file.) We want to make it extra clear so that no one inadverdently cheats. “I certify that all solutions are entirely in my own words and that I have not looked at another student’s solutions. I have given credit to all external sources I consulted.” Submit all the code needed to reproduce your results to the Gradescope assignment entitled “HW1 Code”. Yes, you must submit your code twice: once in your PDF write-up (above) so the readers can easily read it, and once in compilable/interpretable form so the readers can easily run it. Do NOT include any data files we provided. Please include a short file named README listing your name, student ID, and instructions on how to reproduce your results. Please take care that your code doesn’t take up inordinate amounts of time or memory. If your code cannot be executed, your solution cannot be verified.

1 Python Configuration and Data Loading

Please follow the instructions below to ensure your Python environment is configured properly, and you are able to successfully load the data provided with this homework. No solution needs to be submitted for this question. For all coding questions, we recommend using Anaconda for Python 3. Either install Anaconda for Python 3, or ensure you’re using Python 3. To ensure you’re running Python 3, open a terminal in your operating system and execute the following command: python --version Do not proceed until you’re running Python 3. Install the following dependencies required for this homework by executing the following command in your operating system’s terminal: pip install scikit-learn scipy numpy matplotlib Please use Python 3 with the modules specified above to complete this homework. You will be running out-of-the-box implementations of support vector machines to classify three datasets. You will find a set of .mat files in the data folder for this homework. Each .mat file will load as a Python dictionary. Each dictionary contains three fields: training data, the training set features. Rows are samples and columns are features. training labels, the training set labels. Rows are samples. There is one column: The label for each sample. test data, the test set features. Rows are samples and columns are features. You will fit a model to predict the labels for this test set, and submit those predictions to Kaggle. The three datasets for the coding portion of this assignment are described below. mnist data.mat contains data from the MNIST dataset. There are 60,000 labeled digit images for training, and 10,000 digit images for testing. The images are grayscale, 28×28 pixels flattened. There are 10 possible labels for each image, namely, the digits 0–9. Figure 1: Examples from the MNIST dataset. spam data.mat contains featurized spam data. The labels are 1 for spam and 0 for ham. The data folder includes the script featurize.py and the folders spam, ham (not spam), and test (unlabeled test data); you may modify featurize.py to generate new features for the spam data. cifar10 data.mat contains data from the CIFAR10 dataset. There are 50,000 labeled object images for training, and 10,000 object images for testing. The images are flattened 3×32×32 (3 color channels). The labels 0–9 correspond alphabetically to the categories. For example, 0 means airplane, 1 means automobile, 2 means bird, and so on. Figure 2: Examples from the CIFAR-10 dataset. To check whether your Python environment is configured properly for this homework, ensure the following Python script executes without error. Pay attention to errors raised when attempting to import any dependencies. Resolve such errors by manually installing the required dependency (e.g. execute pip install numpy for import errors relating to the numpy package). import sys if sys.version_info < 3: raise Exception("Python 3 not detected.") import numpy as np import matplotlib.pyplot as plt from sklearn import svm from scipy import io for data_name in : data = io.loadmat("data/%s_data.mat" % data_name) print("\nloaded %s data!" % data_name) fields = "test_data", "training_data", "training_labels" for field in fields: print(field, data.shape)

2 Data Partitioning

Rarely will you receive “training” data and “validation” data; usually you will have to partition available labeled data yourself. The datasets for this assignment are described below. Write code to partition the datasets as follows. For the MNIST dataset, write code that sets aside 10,000 training images as a validation set. For the spam dataset, write code that sets aside 20% of the training data as a validation set. For the CIFAR-10 dataset, write code that sets aside 5,000 training images as a validation set. Be sure to shuffle your data before splitting it to make sure all the classes are represented in your partitions.

3 Support Vector Machines: Coding

We will use linear support vector machines to classify our datasets. For images, we will use the simplest of features for classification: raw pixel brightness values. In other words, our feature vector for an image will be a row vector with all the pixel values concatenated in a row major (or column major) order. There are several ways to evaluate models. We will use classification accuracy as a measure of the error rate (see here: https://scikit-learn.org/stable/modules/generated/sklearn. metrics.accuracy_score.html). Train a linear support vector machine (SVM) on all three datasets. Plot the error rate on the training and validation sets versus the number of training examples that you used to train your classifier. The number of training examples in your experiment will vary per dataset. You may only use sklearn for the SVM model and the accuracy metric function. Everything else (train vs. val plots) must be done without the use of sklearn. For the MNIST dataset, use raw pixels as features. Train your model with the following numbers of training examples: 100, 200, 500, 1,000, 2,000, 5,000, 10,000. At this stage, you should expect accuracies between 70% and 90%. Hint: Be consistent with any preprocessing you do. Use either integer values between 0 and 255 or floating-point values between 0 and 1. Training on floats and then testing with integers is bound to cause trouble. For the spam dataset, use the provided word frequencies as features. In other words, each document is represented by a vector, where the ith entry denotes the number of times word i (as specified in featurize.py) is found in that document. Train your model with the following numbers of training examples: 100, 200, 500, 1,000, 2,000, ALL. Note that this dataset does not have 10,000 examples; use all of your examples instead of 10,000. At this stage, you should expect accuracies between 70% and 90%. For the CIFAR-10 dataset, use raw pixels as features. At this stage, you should expect accuracies between 25% and 35%. Be forwarned that training SVMs for CIFAR-10 takes a couple minutes to run for a large training set. Train your model with the following numbers of training examples: 100, 200, 500, 1,000, 2,000, 5,000. Note: We find that SVC(kernel=’linear’) is faster than LinearSVC.

4 Hyperparameter Tuning

In the previous problem, you learned parameters for a model that classifies the data. Many classifiers also have hyperparameters that you can tune that influence the parameters. In this problem, we’ll determine good values for the regularization parameter C in the soft-margin SVM algorithm. When we are trying to choose a hyperparameter value, we train the model repeatedly with different hyperparameters. We select the hyperparameter that gives the model with the highest accuracy on the validation dataset. Before generating predictions for the test set, the model should be retrained using all the labeled data (including the validation data) and the previously-determined hyperparameter. The use of automatic hyperparameter optimization libraries is prohibited for this part of the homework. (a) For the MNIST dataset, find the best C value. In your report, list the C values you tried, the corresponding accuracies, and the best C value. As in the previous problem, for performance reasons, you are required to train with up to 10,000 training examples but not required to train with more than that.

5 K-Fold Cross-Validation

For smaller datasets (e.g., the spam dataset), the validation set contains fewer examples, and our estimate of our error might not be accurate—the estimate has high variance. A way to combat this is to use k-fold cross-validation. In k-fold cross-validation, the training data is shuffled and partitioned into k disjoint sets. Then the model is trained on k − 1 sets and validated on the kth set. This process is repeated k times with each set chosen as the validation set once. The cross-validation accuracy we report is the accuracy averaged over the k iterations. Use of automatic cross-validation libraries is prohibited for this part of the homework. (a) For the spam dataset, use 5-fold cross-validation to find and report the best C value. In your report, list the C values you tried, the corresponding accuracies, and the best C value. Hint: Effective cross-validation requires choosing from random partitions. This is best implemented by randomly shuffling your training examples and labels, then partitioning them by their indices.

6 Kaggle

MNIST Competition: https://kaggle.com/c/cs189-hw1-mnist SPAM Competition: https://kaggle.com/c/cs189-hw1-spam CIFAR-10 Competition: https://kaggle.com/c/cs189-hw1-cifar10 Using the best model you trained for each dataset, generate predictions for the test sets we provide and save those predictions to .csv files. Be sure to use integer labels (not floating-point!) and no spaces (not even after the commas). Upload your predictions to the Kaggle leaderboards (submission instructions are provided within each Kaggle competition). In your report, include your Kaggle name as it displays on the leaderboard and your Kaggle score for each of the three datasets. For your Kaggle submissions, you may optionally add more features or use a non-linear SVM kernel to get a higher position on the leaderboard. If you do, please explain what you did in your report and cite your external sources. Examples of things you might investigate include SIFT and HOG features for images, and bag of words for spam/ham. Almost everything is fair game as long as your underlying model is an SVM (i.e., do not use a neural network, decision tree, etc.). You are also not allowed to search for the labeled test data and submit that to Kaggle. If you have any questions about whether something is allowed or not, ask on Piazza. Remember to start early! Kaggle only permits two submissions per leaderboard per day. To help you format the submission, please use check.py to run a basic sanity check on your submission and save csv.py to help save your results. To check your submission csv, python check.py

7 Theory of Hard-Margin Support Vector Machines

A decision rule (or classifier) is a function r : Rd → ±1 that maps a feature vector (test point) to +1 (“in class”) or −1 (“not in class”). The decision rule for linear SVMs is  r(x) =  +1 if w · x + α ≥ 0, (1)  −1 otherwise, where w ∈ Rn and α ∈ R are the weights (parameters) of the SVM. The hard-margin SVM optimization problem (which chooses the weights) is min |w|2 subject to yi(Xi · w + α) ≥ 1, ∀i ∈ {1,...,m}, (2) w,α √ where |w| = kwk2 = w · w. We can rewrite this optimization problem by using Lagrange multipliers to eliminate the constraints. (If you’re curious to know what Lagrange multipliers are, the Wikipedia page is recommended, but you don’t need to understand them to do this problem.) We thereby obtain the equivalent optimization problem m X maxmin |w|2 − λi(yi(Xi · w + α) − 1). λi≥0 w,α i=1 (a) Show that Equation (3) can be rewritten as the dual optimization problem m m m (3)

XXX X

maxλiλjyiyjXi · Xj subject to λiyi = 0. (4) λi≥0 i=1i=1 j=1 i=1 Hint: Use calculus to determine what values of w and α optimize Equation (3). Explain where the new constraint comes from. We note that SVM software usually solves this dual quadratic program, not the primal quadratic program. Suppose we know the values λ∗i and α∗ that optimize Equation (3). Show that the decision rule specified by Equation (1) can be written  r(x) =  +1 if yiXi · x ≥ 0, (5)  −1 otherwise. The training points Xi for which λ∗i > 0 are called the support vectors. In practice, we frequently encounter training data sets for which the support vectors are a small minority of the training points, especially when the number of training points is much larger than the number of features (i.e., the dimension of the feature space). Explain why the support vectors are the only training points needed to evaluate the decision rule. Then explain why the non-support vectors nonetheless still have some influence on the decision rule ...what is the nature of that influence? Read the full article

0 notes

technewsroom-blog · 8 years ago

Text

Haswell to the safeguard: Acer's invigorated Aspire S7 Ultrabook surveyed The console and the cost are the main residual hindrances to entry.

There's nothing more baffling than equipment that is practically incredible. Regardless of whether it's a telephone with a less than impressive screen, a tablet with poor battery life, or a portable PC with a dull console, there is no failure very like the item that does everything right—you know, whether you disregard the maybe a couple vital things that it does ineffectively.

Acer's Aspire S7 Ultrabook was one of those nearly extraordinary frameworks. For quite a while, Acer has been attempting to shed its picture as a purveyor of scratch and dent section portable workstations, and the well-manufactured, appealingly styled S7 was its most persuading exertion yet. In any case, two noteworthy weaknesses kept the portable workstation down: a poor console with an abnormal design and shallow key travel, and a battery scarcely deserving of the name.

Presently the S7 is back, and it's pressing Intel's Haswell processors. The 2013 MacBook Air has as of now demonstrated to us what those chips can accomplish for your battery life, yet would they be able to do a similar thing for Acer's Ultrabook? Also, does the Haswell variant of the framework have a similar console issues that the more established adaptation did?

Body, construct quality, and screen

The Haswell S7 looks and feels practically the same as the Ivy Bridge rendition. That is something worth being thankful for—PC plans are frequently changed for change, and it can be hard to discover an outline you like one year that isn't changed radically a year or two not far off. The portable workstation is still thin-and-light (2.87 pounds, contrasted with 2.97 pounds for Toshiba's Kirabook and 2.96 for Apple's 13-inch MacBook Air), so it's no inconvenience at all to sling it in a shoulder sack and bear it throughout the day.

The PC is made basically of three unique materials: glass, which coats the screen and is utilized on the top of the portable workstation; aluminum, which is utilized around the edge of the screen and for the palmrest and console region; and plastic, which is utilized for the base of the PC. While it's not exactly as strong as the aluminum unibody development of the MacBook Air—both the cover and the base of the tablet twist and flex under weight—it's still great. Indeed it's sufficient to make you disregard the shabby plasticky stuff that Acer (and, to be reasonable, each other PC OEM) put out at the low end of the market.

The portable PC's white glass cover is particularly striking, and it's one of a kind among the for the most part metal or plastic tops utilized by other comparative tablets. The glass on both sides of the cover is Gorilla Glass 2, so it ought to face scratches, breaks, and chips and most telephone and tablet screens do. In any case, utilizing glass for the cover still makes me only somewhat apprehensive—I've seen enough screens split from unpleasant dealing with amid air travel that a top made of a similar stuff gives me pause.Everything on the opposite side of the top is truly near perfect. The S7's screen is as yet a 13.3-inch, 1920×1080 IPS board with 166 pixels-per-inch, and it's splendid and clear and vivid. The review points are in like manner phenomenal, and hues move almost no notwithstanding when you're taking a gander at it from outrageous flat or vertical edges. The base bezel is maybe somewhat thicker than it should be—Toshiba's Kirabook packs a 13.3-inch screen into a portable workstation that is littler inside and out and width (12.44 by 8.15 crawls for the Kirabook, contrasted with 12.7 by 8.8 for the S7)— however generally this is one of the most pleasant screens you can get in a Ultrabook.

The one potential issue with the screen has nothing to do with the screen itself however with Windows. At 13 inches, a 1080p screen is sufficiently thick that you'll likely need to turn on Windows' desktop scaling to make a few things clear. As we've seen some time recently, the outcomes can be conflicting (even in Windows 8.1). In case you're adhering to the Start screen and applications from the Windows store, scaling is more unsurprising and for the most part better-looking, however the scarcity of applications in the Windows Store makes living completely in that condition troublesome (particularly on an undeniable portable workstation).

Thickness issues aside, the touchscreen is precise and responsive, and the pivot is sufficiently unbending that you don't tilt the screen just by associating with it. There's no undue wobbling, yet it's as yet conceivable to lift the screen up without holding the base of the portable workstation down. The heaviness of the all-glass cover is sufficient to close itself if it's open at excessively shallow an edge, yet in genuine utilize this doesn't generally create any issues. Like the Ivy Bridge form, this current S7's pivot will open to a 180-degree point.

The port design is modified marginally from the Ivy Bridge show, and the progressions are all savvy ones. The left side now houses the power jack, control catch, one USB 3.0 port, and the SD card space. The correct side is home to an earphone jack, another USB 3.0 port, a full-measure HDMI port, and smaller than expected DisplayPort.This adjusted design fixes three things we despised about the last model. To begin with, putting the edge-mounted power catch nearer to the pivot of the tablet makes it more hard to knock the catch inadvertently. Second, having one USB port on each side instead of both on one side gives you access to no less than one port regardless of the possibility that one side of the tablet is hindered for reasons unknown. Third, having both a full-estimate HDMI out and a small DisplayPort out makes it significantly less demanding to interface the S7 to outside showcases than the past model's solitary smaller scale HDMI port. Miniaturized scale HDMI is likewise rarer than either full HDMI or smaller than expected DisplayPort with regards to links and connectors.

Acer's commentators control additionally says that the new S7 has a calmer fan, and the portable PC seems to remain generally cool and calm even under supported load. In a live with light foundation commotion, the portable PC was essentially unintelligible while I was composing, Web perusing, and battery life testing. (You can in any case expect perceptible fan commotion in case you're trading video or gaming for a considerable length of time at an extend however.) The base of the portable workstation is generally unadorned—its level, white polycarbonate is just separated by its four elastic feet and its two Dolby-marked stereo speakers. These are satisfactory for easygoing use yet are tinny and need bass, much the same as most other portable workstation speakers.

Console and trackpad

The S7's console was its most noticeably bad perspective in the Ivy Bridge rendition, and keeping in mind that the circumstance is somewhat better this time around, it's still a long way from our most loved Ultrabook console.

The first and biggest issue is as yet the console's odd design, with its absence of a committed line for capacity keys and its squished little tops bolt key (genuinely, when you spend a major piece of your day in the Ars virtual office, you require fast and solid access to your tops bolt key at all circumstances). It's not for absence of space, either—there's a wide swath of unused palmrest over the console.

The S7 used to come in both 11-inch and 13-inch seasons that utilized this same console. While the console was clearly worked to fit all the more cozily in the littler 11-inch body, one could in any event observe the basis for utilizing a similar console in both models to lessen part costs. Acer reveals to me that it has no arrangements to discharge a Haswell variant of the 11-inch demonstrate, so staying with this same console in a portable PC that could without much of a stretch fit a bigger, better one is especially befuddling.A second, less problem that needs to be addressed is the console's diminish, pale blue backdrop illumination. It's superior to nothing, yet it's as yet uneven similarly as it was in the Ivy Bridge rendition of the portable PC. Notwithstanding changing the backdrop illumination's shading from blue to white would help make the letters emerge more from the silver plastic keys.

What improves is the key travel, which was extremely shallow and sub-par in the primary form of the S7. It feels somewhat better here (Acer says it expanded the go from 1.0mm to 1.3mm). It's as yet a Chiclet console, however once you get used to the format peculiarities it really doesn't feel too terrible to sort on. I composed a major lump of this survey on the S7's console, and after the couple of days it took for me to increase to my ordinary speed and precision, I really observed the console to be not too bad. It does infrequently appear to miss letters, yet that might be an issue with me and not with the console. It's not exactly a console I can love, but rather it's one I can live with.

The trackpad is in a comparable watercraft, even with the most recent drivers from the Acer bolster site introduced. It's genuinely precise and not awful with things like two-finger looking over and the Windows 8 trackpad motions. It carries on less well when given other multitouch motions—clicking and dragging, for instance, can once in a while be risky, and we sometimes had issues with palm dismissal. These trackpad misfortunes are all things that Windows clients ought to (lamentably) be utilized to at this point.

Programming

Maybe tired of our steady protests about bloatware, Acer transported me the Microsoft Signature form of the Haswell S7 for survey. As we've composed, this implies the portable workstation accompanies just Windows 8, every one of the drivers the tablet needs to work legitimately, and the Windows Essentials applications introduced (there's additionally an Acer device introduced to make processing plant introduce media, which can be rejected or uninstalled by making media or by uninstalling it). While there's as yet an "Acer Picks" area of the Windows Store, downloading and introducing any of it is surrendered over to the client (also it ought to be).

Seething about pre-introduced bloatware is old fashioned now, particularly for our crowd—a significant number of you have the know-how you have to introduce a new duplicate of Windows yourself at any rate. It's far to a greater degree a treat than it ought to be to get a Windows portable workstation that just runs Windows when you remove it from the case, and it's kind of absurd that you have to purchase frameworks straight from Microsoft to get that from the vast majority of the OEMs. The PC producers could include a great deal of valu.

0 notes