Tumgik
#textit
Text
Use LaTeX $…$ around italicized words instead of \textit{…}
20 notes · View notes
stateofbrock · 11 months
Text
Tumblr media
Touches the Textit
He’s a big boy 👀
93 notes · View notes
craigbrownphd · 1 year
Text
If you did not already know
Machine Reading Comprehension (MRC) Machine reading comprehension aims to teach machines to understand a text like a human and is a new challenging direction in Artificial Intelligence. This article summarizes recent advances in MRC, mainly focusing on two aspects (i.e., corpus and techniques). The specific characteristics of various MRC corpus are listed and compared. The main ideas of some typical MRC techniques are also described. … Transformer-XL Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. Concretely, it consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the problem of context fragmentation. As a result, Transformer-XL learns dependency that is about 80\% longer than RNNs and 450\% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformer during evaluation. Additionally, we improve the state-of-the-art (SoTA) results of bpc/perplexity from 1.06 to 0.99 on enwiki8, from 1.13 to 1.08 on text8, from 20.5 to 18.3 on WikiText-103, from 23.7 to 21.8 on One Billion Word, and from 55.3 to 54.5 on Penn Treebank (without finetuning). Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch. … Weight of Evidence (WoE) The Weight of Evidence or WoE value is a widely used measure of the ‘strength’ of a grouping for separating good and bad risk (default). It is computed from the basic odds ratio: (Distribution of Good Credit Outcomes) / (Distribution of Bad Credit Outcomes). Or the ratios of Distr Goods / Distr Bads for short, where Distr refers to the proportion of Goods or Bads in the respective group, relative to the column totals, i.e., expressed as relative proportions of the total number of Goods and Bads. Why Use Weight of Evidence? … Parallel Iterative Nonnegative Matrix Factorization (PARINOM) Matrix decomposition is ubiquitous and has applications in various fields like speech processing, data mining and image processing to name a few. Under matrix decomposition, nonnegative matrix factorization is used to decompose a nonnegative matrix into a product of two nonnegative matrices which gives some meaningful interpretation of the data. Thus, nonnegative matrix factorization has an edge over the other decomposition techniques. In this paper, we propose two novel iterative algorithms based on Majorization Minimization (MM)-in which we formulate a novel upper bound and minimize it to get a closed form solution at every iteration. Since the algorithms are based on MM, it is ensured that the proposed methods will be monotonic. The proposed algorithms differ in the updating approach of the two nonnegative matrices. The first algorithm-Iterative Nonnegative Matrix Factorization (INOM) sequentially updates the two nonnegative matrices while the second algorithm-Parallel Iterative Nonnegative Matrix Factorization (PARINOM) parallely updates them. We also prove that the proposed algorithms converge to the stationary point of the problem. Simulations were conducted to compare the proposed methods with the existing ones and was found that the proposed algorithms performs better than the existing ones in terms of computational speed and convergence. KeyWords: Nonnegative matrix factorization, Majorization Minimization, Big Data, Parallel, Multiplicative Update … https://analytixon.com/2023/05/01/if-you-did-not-already-know-2032/?utm_source=dlvr.it&utm_medium=tumblr
2 notes · View notes
govindhtech · 4 months
Text
Nexa AI proposes Octopus V4-3B graph of language model
Tumblr media
Octopus V4 Release
Google cloud is happy to inform you Octopus v4 is currently accessible! The master node of Nexa AI’s proposed language model graph is Octopus-V4-3B, a powerful open-source language model with three billion parameters. This model, which is specifically designed for the MMLU benchmark subjects, effectively converts user queries into formats that can be processed by other models.
It is excellent at matching these queries with the right specialized model, guaranteeing accurate and efficient query processing.
The Octopus V4 Quantized
Google cloud have prepared quantized models in guff format for you to run the model on-device.
Overview
The master node of Nexa AI’s proposed language model graph is Octopus-V4-3B, a powerful open-source language model with three billion parameters. This model, which is specifically designed for the MMLU benchmark subjects, effectively converts user queries into formats that can be processed by other models. It is excellent at matching these queries with the right specialized model, guaranteeing accurate and efficient query processing.
Octopus v4: A graph language model
Chen Wei and Li Zhiyuan
Although language models have proven useful in many different contexts, the most advanced models are frequently proprietary.
For instance, Anthropic’s different models and OpenAI’s GPT-4 are costly and energy-intensive.
On the other hand, Llama3 and other rival models have been created by the open-source community.
Additionally, smaller language models that are dedicated to a given sector, such those for financial, medical, or legal duties, have done better than their proprietary counterparts.
This study presents a novel method for integrating \textbf{multiple open-source models}, each optimized for certain tasks, using \textit{functional tokens}.
Using \textit{functional tokens}, Their recently created Octopus v4 model can intelligently route user requests to the most suitable vertical model and reformat the query to optimize performance.
As an advancement over the Octopus v1, v2, and v3 models, Octopus v4 is particularly good at selection, interpreting parameters, and reformatting data.
Additionally, by utilizing the features of the Octopus model and \textit{functional tokens}, Google cloud investigate the use of graph as a flexible data structure that efficiently coordinates multiple open-source models.
Try the Octopus v4 models (\url{this https URL}) on publicly available GitHub (\url{this https URL}), and apologies to a broader language model graph.
Among models of the same level, Google were able to obtain a SOTA MMLU score of 74.8 by turning on models with fewer than 10B parameters.
Google cloud would like to express their gratitude to Mingyuan and Zoey, outstanding contributions to this quantization project.
Benchmark and Dataset
Used MMLU questions to assess the performances.
Assessed using the Ollama llm-benchmark technique.
Important Octopus v4 Features:
Small Size: Because Octopus-V4-3B is small, it can function quickly and effectively on smart devices. Accuracy: Precision is increased by Octopus-V4-3B’s accurate mapping of user queries to the specialized model through the use of a functional token design. Reformat inquiry: Octopus-V4-3B aids in the transformation of informal human language into a more formal structure, enhancing the description of the inquiry and producing answers that are more accurate.
Octopus V4 attributes and capabilities
Although Octopus V4 is a sophisticated system, the following is a summary of its main attributes and capabilities:
Octopus V4: What is it?
It’s a sophisticated language model that is open-source and created by Nexa AI.
In a network of other open-source AI models, it serves as a “master node”.
Although Octopus V4 doesn’t produce text or code directly, it is essential in guiding user inquiries to the network’s most appropriate AI model.
How does the Octopus V4 operate?
Query Classification: Octopus V4 evaluates your query to identify the most suitable worker node (specialized AI model) to handle it.
Reformatting Query: Octopus V4 can restructure your natural language query into a format that is easier for the worker model to comprehend and more formal. This guarantees that the worker model gets the data it requires to provide precise outcomes.
Effective Communication: It makes it easier for you and the worker model to communicate with each other, which guarantees a seamless interaction and retrieval of the needed data.
The advantages of Octopus V4
Increased Accuracy: It helps you obtain more relevant and accurate results by matching your query to the best AI model.
Enhanced Efficiency: The AI interface is more efficient overall and saves time thanks to the query reformatting tool.
Greater Variety of Uses: It’s utilization of several AI models makes a greater number of jobs and functionality possible.
Applications for Octopus V4
Information Retrieval: Envision posing a challenging query regarding a field of science. A specialized AI model built on scientific data may be referred to by Octopus V4 in response to your query, yielding a more accurate result than a general-purpose language model.
Data analysis: Octopus V4 can direct your query to an AI model trained on data analysis jobs if you have a huge dataset and need insights. This model will provide insightful results.
Code Generation: It has the capability to link programmers with AI models that are experts in code generation, providing support for particular programming tasks.
Present Situation:
Although Octopus V4 is still under development, it has the potential to completely change how humans communicate with AI models in the future.
Read more on govindgtech.com
0 notes
jhavelikes · 9 months
Quote
Recent work claims that large language models display \textit{emergent abilities}, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their \textit{sharpness}, transitioning seemingly instantaneously from not present to present, and their \textit{unpredictability}, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, when analyzing fixed model outputs, emergent abilities appear due the researcher’s choice of metric rather than due to fundamental changes in model behavior with scale. Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous, predictable changes in model performance. We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. Via all three analyses, we provide evidence that alleged emergent abilities evaporate with different metrics or with better statistics, and may not be a fundamental property of scaling AI models.
Are Emergent Abilities of Large Language Models a Mirage? | OpenReview
0 notes
jezuk · 9 months
Text
[1904.09828] Magic: The Gathering is Turing Complete
0 notes
Text
If you did not already know
Distilled-Exposition Enhanced Matching Network (DEMN) This paper proposes a Distilled-Exposition Enhanced Matching Network (DEMN) for story-cloze test, which is still a challenging task in story comprehension. We divide a complete story into three narrative segments: an \textit{exposition}, a \textit{climax}, and an \textit{ending}. The model consists of three modules: input module, matching module, and distillation module. The input module provides semantic representations for the three segments and then feeds them into the other two modules. The matching module collects interaction features between the ending and the climax. The distillation module distills the crucial semantic information in the exposition and infuses it into the matching module in two different ways. We evaluate our single and ensemble model on ROCStories Corpus \cite{Mostafazadeh2016ACA}, achieving an accuracy of 80.1\% and 81.2\% on the test set respectively. The experimental results demonstrate that our DEMN model achieves a state-of-the-art performance. … Truncated Variance Reduction (TruVaR) We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets. … Proportional Degree Several algorithms have been proposed to filter information on a complete graph of correlations across stocks to build a stock-correlation network. Among them the planar maximally filtered graph (PMFG) algorithm uses $3n-6$ edges to build a graph whose features include a high frequency of small cliques and a good clustering of stocks. We propose a new algorithm which we call proportional degree (PD) to filter information on the complete graph of normalised mutual information (NMI) across stocks. Our results show that the PD algorithm produces a network showing better homogeneity with respect to cliques, as compared to economic sectoral classification than its PMFG counterpart. We also show that the partition of the PD network obtained through normalised spectral clustering (NSC) agrees better with the NSC of the complete graph than the corresponding one obtained from PMFG. Finally, we show that the clusters in the PD network are more robust with respect to the removal of random sets of edges than those in the PMFG network. … Non-Stationary Streaming PCA We consider the problem of streaming principal component analysis (PCA) when the observations are noisy and generated in a non-stationary environment. Given $T$, $p$-dimensional noisy observations sampled from a non-stationary variant of the spiked covariance model, our goal is to construct the best linear $k$-dimensional subspace of the terminal observations. We study the effect of non-stationarity by establishing a lower bound on the number of samples and the corresponding recovery error obtained by any algorithm. We establish the convergence behaviour of the noisy power method using a novel proof technique which maybe of independent interest. We conclude that the recovery guarantee of the noisy power method matches the fundamental limit, thereby generalizing existing results on streaming PCA to a non-stationary setting. … https://analytixon.com/2022/11/27/if-you-did-not-already-know-1895/?utm_source=dlvr.it&utm_medium=tumblr
0 notes
leedsomics · 2 years
Text
COmic: Convolutional Kernel Networks for Interpretable End-to-End Learning on (Multi-)Omics Data. (arXiv:2212.02504v1 [q-bio.QM])
Motivation: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate as black boxes. In high stakes scenarios, like healthcare, using a black-box model poses safety and security issues. Without an explanation about molecular factors and phenotypes that affected the prediction, healthcare providers are left with no choice but to blindly trust the models. We propose a new type of artificial neural networks, named Convolutional Omics Kernel Networks (COmic). By combining convolutional kernel networks with pathway-induced kernels, our method enables robust and interpretable end-to-end learning on omics datasets ranging in size from a few hundred to several hundreds of thousands of samples. Furthermore, COmic can be easily adapted to utilize multi-omics data. Results: We evaluate the performance capabilities of COmic on six different breast cancer cohorts. Additionally, we train COmic models on multi-omics data using the METABRIC cohort. Our models perform either better or similar to competitors on both tasks. We show how the use of pathway-induced Laplacian kernels opens the black-box nature of neural networks and results in intrinsically interpretable models that eliminate the need for \textit{post-hoc} explanation models. http://dlvr.it/SdyxQS
0 notes
text-it · 3 years
Photo
Tumblr media
♥️Ich wünsche euch einen wunderschönen guten Morgen. Kommt gut durch den Tag 🌞 #tag #gedanken #gedankentanken #gedankenkarussell #thinkpositive #positiv #denkepositiv #gutenmorgen #motivation #positivmindset #zitate #sprüche #textit #textposts #weisheiten #weisheitdestages #zitatezumnachdenken #sprüchezumnachdenken #miteinander https://www.instagram.com/p/CXc4K6woM9g/?utm_medium=tumblr
4 notes · View notes
sun-ni-day · 2 years
Text
SG-1 as a foster household
Hammond is Grandpa
Janet is mom
Danny is the quiet kid, who dissappears at dawn and spends whole day rummaging around abandoned buildings, construction sites, river banks etc. coming home at dusk with the pockets full of "treasures". Knows every immigrant neighbor, chats with them in their native tounge, also has a dozen penpals from all over the world, corresponding in their language. Loves puzzles and cyphers. At some point he has a falling out with the family. He gets in trouble helping some other kids and runs away for a while, but then desides to comeback.
Jack is the one who's constantly running off on "adventures", comes home with various strays. Chill and friendly, but neighborhood kids know not to get him angry.
One day he brings along an immigrant orphan, convinces family to foster him. His name is Teal'c. He was a child soldier for mafia. Turns out Jack and kids got in trouble and Teal'c saved them. Family helps him to adapt to new life. He follows Jack around everywhere, partaking in "adventures". Jack also teaches him about local culture and costoms. Shananigans ensue.
Sam is the one you'll never catch on the streets - she spends whole day either stuck in the garage or doing "experiments" in the back yard. At any given moment at least one device or appliance in the house is in disassembled state coz Sam is trying to "improve it". God help any poor soul that even hints that she's doing something that is "for boys" and maybe should do things "for girls". She WILL fight you.
Jonas is also an immigrant kid, and with them only for a time being, until his family can take him in, but he quickly makes his way into everyone's hearts. He's one of the kids Danny helped so he feels responsible and tries to make up for that while he's there.
Cam is the latest addition, but he's the "mom" sibling of them all. He first meets the kids when he helps them out of trouble, but gets hurt in the process. Kids regularly visit him in the hospital while he's recovering and by the time he's out, family adopts him. Looks up to older siblings and wants to take care of them. Smart and strong, but doesn't flaunt it, he's content with just having his family near. He and Danny click from day one and become Iike two peas in a pod.
Vala just shows up in their house one day and claims to be their cousin. She says she's just visiting but ends up living with them. Everyone is pretty sure she's not who she claims to be, but she's cool and awesome and charming, so despite her being a pathological liar and constant trouble maker kids fall in love with her. She especially enjoys vexing Danny, but he's the one who first understands that deep down, she has a good heart, so he sets himself on helping her become a better person.
When grandpa gets too old to take care of them, Jack, being the oldest, steps up and takes his place.
Later on they are joined by uncle Hank, he "relieves" Jack so he can move out and go on with his own life. (which also allows Jack to finally start dating Sam, who he's been in love with since day one, but they keep it on the DL still).
Uncle Hank's daughter Carolyn also comes to live with them, she and Danny become good friends.
Other characters: Hammond's actual grandson Walter, who comes around regularly to help out. Scandinavian kid Thor who somehow ends up Jack's best friend; he's rich and possibly a royalty, he names his private jets after Jack and Danny. Danny's first love Sha're, who he loses when mafia kidnaps her (family of course tries to help get her back, but not successful; they do manage to help her brother Skara, though). Teal’c's mentor Bra'tac, who's defacto become his father. Sam's biological dad Jacob, who's black ops so he's out of country a lot.
Fell free to add anybody else.
26 notes · View notes
Photo
Tumblr media
Olga de Amaral, Cesta Lunar 81 (Moon Basket 81), 2012,
Acrylic, gold leaf, thread and gesso on linen,
76 3/4 x 53 1/8in. (195 x 135cm.)
Christie’s
64 notes · View notes
stateofbrock · 11 months
Text
Tumblr media
TexTits wip
50 notes · View notes
craigbrownphd · 1 year
Text
If you did not already know
Pymc-learn $\textit{Pymc-learn}$ is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning. It is inspired by $\textit{scikit-learn}$ and focuses on bringing probabilistic machine learning to non-specialists. It uses a general-purpose high-level language that mimics $\textit{scikit-learn}$. Emphasis is put on ease of use, productivity, flexibility, performance, documentation, and an API consistent with $\textit{scikit-learn}$. It depends on $\textit{scikit-learn}$ and $\textit{pymc3}$ and is distributed under the new BSD-3 license, encouraging its use in both academia and industry. Source code, binaries, and documentation are available on http://…/pymc-learn. … Kalman Gradient Descent We introduce Kalman Gradient Descent, a stochastic optimization algorithm that uses Kalman filtering to adaptively reduce gradient variance in stochastic gradient descent by filtering the gradient estimates. We present both a theoretical analysis of convergence in a non-convex setting and experimental results which demonstrate improved performance on a variety of machine learning areas including neural networks and black box variational inference. We also present a distributed version of our algorithm that enables large-dimensional optimization, and we extend our algorithm to SGD with momentum and RMSProp. … Linked Data Ranking Algorithm (LDRANK) The advances of the Linked Open Data (LOD) initiative are giving rise to a more structured Web of data. Indeed, a few datasets act as hubs (e.g., DBpedia) connecting many other datasets. They also made possible new Web services for entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for new applications that can benefit from a combination of the Web of documents and the Web of data. To ease the emergence of these new applications, we propose a query-biased algorithm (LDRANK) for the ranking of web of data resources with associated textual data. Our algorithm combines link analysis with dimensionality reduction. We use crowdsourcing for building a publicly available and reusable dataset for the evaluation of query-biased ranking of Web of data resources detected in Web pages. We show that, on this dataset, LDRANK outperforms the state of the art. Finally, we use this algorithm for the construction of semantic snippets of which we evaluate the usefulness with a crowdsourcing-based approach. … Word2Bits Word vectors require significant amounts of memory and storage, posing issues to resource limited devices like mobile phones and GPUs. We show that high quality quantized word vectors using 1-2 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer. We train word vectors on English Wikipedia (2017) and evaluate them on standard word similarity and analogy tasks and on question answering (SQuAD). Our quantized word vectors not only take 8-16x less space than full precision (32 bit) word vectors but also outperform them on word similarity tasks and question answering. … https://analytixon.com/2023/04/16/if-you-did-not-already-know-2019/?utm_source=dlvr.it&utm_medium=tumblr
0 notes
cryinganime · 7 years
Text
i just want someone to love, respect, and support me like percy jackson does with annabeth chase :(((((
9 notes · View notes
jhavelikes · 1 year
Quote
Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. We argue that progress measures can be found via mechanistic interpretability: reverse-engineering learned behaviors into their individual components. As a case study, we investigate the recently-discovered phenomenon of ``grokking'' exhibited by small transformers trained on modular addition tasks. We fully reverse engineer the algorithm learned by these networks, which uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle. We confirm the algorithm by analyzing the activations and weights and by performing ablations in Fourier space. Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup. Our results show that grokking, rather than being a sudden shift, arises from the gradual amplification of structured mechanisms encoded in the weights, followed by the later removal of memorizing components.
[2301.05217] Progress measures for grokking via mechanistic interpretability
0 notes
foxyou-too · 5 years
Photo
Tumblr media
cushion from zara
5 notes · View notes