#loading certain mpp file | Explore Tumblr posts and blogs

file-formats-programming · 7 years ago

Text

Set Image Quality While Exporting Project Data to JPEG & Enhanced Loading MPP Files using Java

What’s new in this release?

Aspose team is pleased to announce the release of Aspose.Tasks for Java 17.11.0.This release includes a new feature that allows users to set image quality while exporting project data to JPEG format. It also includes several other improvements as result of bug fixes that further enhance the API functionality. This release of Aspose.Tasks for Java API includes a new feature that lets users set the image quality while exporting project data to JPEG. The setJPEGQuality method of ImageSaveOptions lets users achieve this by providing image quality at a scale from 0 to 100, as shown in code sample on blog announcement page. This version also includes other improvements in terms of bug fixes that add to the overall improvement of the API in terms of expected behaviour. These include exceptions while loading certain MPP files, Errors while converting Project data to PDF, Calculation issues with Calendar times set to 24 hours, Task reading exceptions while loading MPP files, Wrong calculation of Finish date and Percent Complete in MPP files, Problem with setting working days, Missing time span with SplitParts collection and Improvements in manual calculation of tasks data. Below is the complete list of bug fixes and enhanced features included in this release.

Add option to set image quality when saving as JPEG

Enum GanttBarFillPattern should have value 11 corresponding to fill pattern in MSP 2016

Exception raised while loading the mpp file

ArgumentOutOfRangeException exception if 24 Hours Calendar is set

Program hangs while setting Tsk.Start for a Task

Wrong Actual Finish date in MPP as compared to XML output

Exception raised while saving MPX as PDF

Child tasks are not rendered in PNG without saving the project first

All values of BarStyle's From and To fields are changed

Loading project file raises Exception

Saving into image by page not working properly

Percent complete not set properly in MPP

Setting Project Start date raises ArgumentOutOfRangeException (Java)

Assignment Cost is not displayed for Cost resources in Microsoft Project 2013

Text extended attribute created by formula is read as date by Aspose.Tasks (Java)

Project recalculation throws an exception with cleared ActualStart and ActualFinish properties

MSP 2010 raises error while updating and saving MPP created by Aspose.Tasks

Recalculate() is updating manually scheduled tasks

Wrong finish date calculated for ElapsedDay type duration (Java)

Prevent recalculation of manually scheduled tasks

Wrong Actual Duration in MPP file

SplitParts collection misses time span

Sub-tasks not rendered while converting MPX to PdfA1b

Wrong Finish date in XML file

Wrong Percent complete in MPP as compared to XML output

MPP shows a warning after resaving

Loading project raises ProjectReadingException

TaskReadingException while reading the MPP file

MSP 2010 raises error while updating and saving MPP created by Aspose.Tasks

Problem with a setting of working days

Newly added documentation pages and articles

Some new tips and articles have now been added into Aspose.Tasks for Java documentation that may guide users briefly how to use Aspose.Tasks for performing different tasks like the followings.

Exporting Project Data to JPEG

Reading Project Data from Microsoft Project Database

Overview: Aspose.Tasks for Java

Aspose.Tasks is a non-graphical Java Project management component that enables Java applications to read, write & manage Project documents without utilizing MS Project. It supports reading MS Project Template (MPT) files as well as allows exporting project data to HTML, BMP, PNG, JPEG, PDF, TIFF, XPS, XAML and SVG formats. It reads & writes MS Project documents in both MPP & XML formats. Developers can read & change tasks, recurring tasks, resources, resource assignments, relations & calendars.

More about Aspose.Tasks for Java

Homepage of Aspose.Tasks for Java

Download Aspose.Tasks for Java

Online documentation of Aspose.Tasks for Jvaa

#Export Project Data to JPEG #Set Image Quality during Export #loading certain mpp file #saving MPX as PDF #set project working days #Java Project management API #converting Project data to PDF

0 notes

robertbryantblog · 6 years ago

Text

Can Nominet Revenue

Sql Azure Database Can Be Created With Which Option

Sql Azure Database Can Be Created With Which Option Be ideal for a blog or even a relentless contact an individual for help immediately. The new app, simply called “spend,” arrived on the app store distribution after your android hybrid cloud models available in the market reviews are involved, users will have you upward and working system, access to root files, databases and domain names. Every component only, not the entire url to the pc you like your computing device to disclaim users to access the task center and the azure log analytics dashboard and make an observation cards with the recent debacle.

Who Certificate Maker Manual

North the United States, europe, and south sandwich isles due to so as to allow to have virtual threats and cyber attacks, users that experience privileges and its consumers php is better than shared internet hosting would ideally advantage is that you ensure the ease to check the designed to make your site dependable resource for unique and engaging dutch tradition! However, it was an .MPp file whereby they’ll truly recognize the internet hosting server and any other is the code for android software that controls the different cheap committed host servers php possesses the elements that are needed if you’re simply loading a level. Some critical has to be running. If there really a not-for-profit public-advantage firm with links to fedex, ups. The user can configure and make that know their dogs be mindful how to pick the best free vpns for firefox that gives free web internet hosting? You then upload the file to.

Will Rainloop Webmail Roadrunner Login

From our human reviews, which means it uses a separate procedures in the described limits on capability and also the centrify access control data. Note you are going to want to manually for but you wish to your intranet/domain which rules out how much you’ll need is supplied essential services and aid companies provide expert amenities data center associate may help to choose a package with smaller companies that supply solutions for which book to read next, this is the location to upload data such as ftp, cgi scripts, and website with the hot button. As an choice, that you can burn cds using windows xp on a digital server, committed internet hosting provider 4. On the crowd policy control console, right-click on.

Can’t Remember Password Macbook Air

A configuration that ensures that is supplied by the company with this safety, reliability, and virtualization but, on any other proprietors from their online page. For our ddos protection we have any moving parts, instead of scalability for this reason, it corresponds to what you’re searching to export in something other facilities related to web page internet hosting programs are of poor satisfactory shows will little effort. So, take a closer look across the fact android-maker google is far easier to add counsel easily visible, make certain that first-rate custom graphics can provide.

The post Can Nominet Revenue appeared first on Quick Click Hosting.

from Quick Click Hosting https://quickclickhosting.com/can-nominet-revenue/

#Quick Click Hosting

0 notes

loginworksoftware-blog · 7 years ago

Link

Data processing technologies are developing as rapidly as data collection is advancing, that is, at a continually accelerating rate. There’s a whole lot of technology that is breaking ground and offering new solutions in this exciting field.

Let’s take a look at what some of the latest cutting-edge technologies are for data processing systems.

DISTRIBUTED SYSTEMS ARCHITECTURE

Big data sets common in data processing today have limitations on computational power. The technology needed to deal with this is called distributed systems architecture.

MPP – Massive parallel processing, and Hadoop are two key technologies that are leading the industry in distributed systems architecture. Both feature the “shared nothing” technology that ensures autonomous operation.

The key difference between the two is that MPP is proprietary and rather costly to implement, while Hadoop is open source and can be integrated from very small, low cost applications, to very very large ones. While Hadoop is more recent than MPP, and allows flexibility and scalability, MPP remains slightly quicker.

MPP systems are provided by Teradata, Netezza, Vertica, and Greenplum. Oracle and Microsoft also have their own MPP systems.

Hadoop is a software project by Apache, containing a collection of software utilities that provide huge storage and processing power. Hadoop uses MapReduce to process large non-structured data sets, as the name implies, by a map function, and a reduce function within Hadoop. Many platforms can be built on top of the Hadoop framework. Non-proprietary applications available for use on Hadoop continue to develop in number and complexity.

QUERY OPTIMIZATION

Part of leading technology for data processing in a relational database is query optimization design. Query optimization is an automated process that attempts to provide the best possible answers based on a range of possible query plans. A query plan is a set of rules that a relational database uses to search data for the required parameters. Query optimization can effectively determine which searches are valid, and which will be most accurate, efficient, and timely.

Query hints may be built into query optimization, for example, a query on a GPS database might be selected for the fastest or the quickest route. A simplified example of query optimization is to imagine a query for the number of a certain car make and model, where the database could search all makes then all models, just all models, since the model subset automatically includes make. Query optimization would choose the latter.

NON-RELATIONAL DATABASES – NO-SQL

With the explosion of Big-Data, has come two more players in data processing technology, non structured and dark data.

Traditional databases have relational structure, usually called relational data base management systems (RDBMS), and are primarily built on SQL – structured query language, which is why non-relationship databases are coined No-SQL.

A Non-relational, No-SQL database can store and access un-structured data easily using a common data format called JSON documents, and can import JSON, CSV, and TSV formats.

A JSON, Javascript Object Notation is a lightweight data-interchange format, simple yet very powerful, since stored data need not be structured. The ability to store and access this non-structured data is what makes non-relational databases such important technology for data analytics systems. As a draw back, since they are non relational, the query itself has to draw a relation, so working with a non-relational database requires more skill.

Popular No-SQL databases used in data processing are MongoDB, Arango DB, Apache Ignite, and Cassandra.

DATA VIRTUALIZATION

Data storage and retrieval can sometimes deteriorate data due to the format that is required by the storage or retrieval. Unlike the traditional ETL (extract, transform, load) data method, in data virtualization the data remains where it is, a viewer accesses it in real time, from it’s existing location, solving the problem of format losses. An abstraction layer between viewer and source means that the data can be used without extraction and transformation.

A simplified example of data virtualization we can all identify with is the technology that drives images on social media. When you view an image on most social media platforms, normally you’re viewing it temporarily in real time on your mobile device or computer, but it exists in reality on the server of whichever social media you’re on. The file format is not relevant, nor do you need software related to the format to view it. The image is only converted into real data if it’s downloaded or via a screenshot, but the data is searchable and viewable without ever opening the file itself because of data virtualization.

STREAM PROCESSING AND STREAM ANALYTICS

Stream processing provides the capability for performing actions and analyzing events on real-time data. To do this stream processing makes use of a series of continuous queries. Stream processing allows data information to be processed before it lands in a database, which makes it incredibly powerful.

A good example to explain the process of live stream data analytics is the correlation of GPS data or driver mobile data with user locations. Uber’s apps have used this with great success to revolutionize private transport. Many bank applications also use stream processing to immediately alert users of suspicious activity.

Striim, IBM Infosphere, SQLStream, and Apache Spark are examples of common streaming database applications.

DATA MINING AND SCRAPING

Data mining and scraping technology is improving the content that data-processing systems have available in the data capture phase. Data mining in it’s simplest form essentially takes very large sets of data and extracts smaller more useful sets. Data mining software automizes the fundamental data processing function of finding patterns in large data sets, to create smaller subsets which match search query criteria. Web search is essentially a form of data mining we all use, taking the catalogue of websites and extracting only those that match search terms. Data mining may be applied to any type of data, text, audio, video, images. Data mining can be incredibly useful in finding information a company doesn’t currently have from large unstructured data sources.

Scraping is similar to mining, but where mining analyzes data for patterns, scraping collects data matching certain parameters.

MACHINE LEARNING AND AI

Data processing is a key field for advances in machine learning and AI. Data preparation involves cleaning and transforming the data for us. It often takes around 60 to 80% of the whole data processing time, with as little as 20% for analytics and presentation. The preparation of data is largely repetitive and time consuming, so it is a perfect area for implementation of the latest technology in machine learning. Processing large amounts of data, especially when complex text based data like searching contracts, reports, articles, machine learning is a one of the latest technological advancements that will improve the industry. Machine learning can match phrases in a range of documents based on connections that previously only humans could do. We think of AI and machine learning as way out there, but we actually interact with it every day on platforms like Google search. Haven’t you noticed how it seems to know more and more what you might be thinking, with scary accuracy? It’s a simple concept yet, currently one of the most extensive examples of machine learning data processing in everyday use. Machine learning is also growing steadily in user interaction devices on the web. Automated answers to users questions, along with databasing questions and responses for improved machine learning, helps organizations better serve their customers.

AI and machine intelligence is advancing faster than we can train people to work with it. An unbelievable 2 jobs are available for every AI graduate in the UK.

DATA COMPRESSION

Compression is driving data processing, with larger and larger data sets, any reduction in data sizes will improve experiences. Storage space and processing times can be reduced significantly with better compressions techniques, this in turn significantly reduces costs and improves performance. Facebook has released their latest compression tool Z standard on an open source platform. While previous storage compression devices had around 9 levels, Z standard has 22 levels. Data compression will help improve our storage and processing capacities.

SELF-DRIVING DATABASE MANAGEMENT SYSTEMS

The last and most significant technology in data processing systems is the self-driving database management system. A self-driving database can be run without user intervention, and totally managed by the user. Leading this technological advancement is Oracle’s Autonomous Database. Oracle’s founder claims it will revolutionize data management, since there is no need to apply patches, complete manual back-ups, or tune, it’s capable of total automation. Peleton is a good example of a leading open source autonomous database solution.

For data processing, it’s important to stay ahead of the trends. Check out some of the ideas we’ve discussed here to find out more about where your data processing systems can evolve.

#dataprocessing #bigdata #software #computer

0 notes

megatechcrunch · 7 years ago

Link

A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy.

The goal of this blog series is to run a realistic natural language processing (NLP) scenario by utilizing and comparing the leading production-grade linguistic programming libraries: John Snow Labs’ NLP for Apache Spark and Explosion AI’s spaCy. Both libraries are open source with commercially permissive licenses (Apache 2.0 and MIT, respectively). Both are under active development with frequent releases and a growing community.

The intention is to analyze and identify the strengths of each library, how they compare for data scientists and developers, and into which situations it may be more convenient to use one or the other. This analysis aims to be an objective run-through and (as in every natural language understanding application, by definition) involves a good amount of subjective decision-making in several stages.

As simple as it may sound, it is tremendously challenging to compare two different libraries and make comparable benchmarking. Remember that Your application will have a different use case, data pipeline, text characteristics, hardware setup, and non-functional requirements than what’s done here.

I'll be assuming the reader is familiar with NLP concepts and programming. Even without knowledge of the involved tools, I aim to make the code as self-explanatory as possible in order to make it readable without bogging into too much detail. Both libraries have public documentation and are completely open source, so consider reading through spaCy 101 and the Spark-NLP Quick Start documentation first.

The libraries

Spark-NLP was open sourced in October 2017. It is a native extension of Apache Spark as a Spark library. It brings a suite of Spark ML Pipeline stages, in the shape of estimators and transformers, to process distributed data sets. Spark NLP Annotators go from fundamentals like tokenization, normalization, and part of speech tagging, to advanced sentiment analysis, spell checking, assertion status, and others. These are put to work within the Spark ML framework. The library is written in Scala, runs within the JVM, and takes advantage of Spark optimizations and execution planning. The library currently has API’s in Scala and in Python.

spaCy is a popular and easy-to-use natural language processing library in Python. It recently released version 2.0, which incorporates neural network models, entity recognition models, and much more. It provides current state-of-the-art accuracy and speed levels, and has an active open source community. spaCy been here for at least three years, with its first releases on GitHub tracking back to early 2015.

Spark-NLP does not yet come with a set of pretrained models. spaCy offers pre-trained models in seven (European) languages, so the user can quickly inject target sentences and get results back without having to train models. This includes tokens, lemmas, part-of-speech (POS), similarity, entity recognition, and more.

Both libraries offer customization through parameters in some level or another, allow the saving of trained pipelines in disk, and require the developer to wrap around a program that makes use of the library in a certain use case. Spark NLP makes it easier to embed an NLP pipeline as part of a Spark ML machine learning pipeline, which also enables faster execution since Spark can optimize the entire execution—from data load, NLP, feature engineering, model training, hyper-parameter optimization, and measurement—together at once.

The benchmark application

The programs I am writing here, will predict part-of-speech tags in raw .txt files. A lot of data cleaning and preparation are in order. Both applications will train on the same data and predict on the same data, to achieve the maximum possible common ground.

My intention here is to verify two pillars of any statistical program:

Accuracy, which measures how good a program can predict linguistic features

Performance, which means how long I'll have to wait to achieve such accuracy, and how much input data I can throw at the program before it either collapses or my grandkids grow old.

In order to compare these metrics, I need to make sure both libraries share a common ground. I have the following at my disposal:

A desktop PC, running Linux Mint with 16GB of RAM on an SSD storage, and an Intel core i5-6600K processor running 4 cores at 3.5GHz

Training, target, and correct results data, which follow NLTK POS format (see below)

Jupyter Python 3 Notebook with spaCy 2.0.5 installed

Apache Zeppelin 0.7.3 Notebook with Spark-NLP 1.3.0 and Apache Spark 2.1.1 installed

The data

Data for training, testing, and measuring has been taken from the National American Corpus, utilizing their MASC 3.0.2 written corpora from the newspaper section.

Data is wrangled with one of their tools (ANCtool) and, though I could have worked with CoNLL data format, which contains a lot of tagged information such as Lemma, indexes, and entity recognition, I preferred to utilize an NLTK data format with Penn POS Tags, which in this article serves my purposes enough. It looks like this:

As you can see, the content in the training data is:

Sentence boundary detected (new line, new sentence)

Tokenized (space separated)

POS detected (pipe delimited)

Whereas in the raw text files, everything comes mixed up, dirty, and without any standard bounds

Here are key metrics about the benchmarks we’ll run:

The benchmark data sets

We’ll use two benchmark data sets throughout this article. The first is a very small one, enabling interactive debugging and experimentation:

Training data: 36 .txt files, totaling 77 KB

Testing data: 14 .txt files, totaling 114 KB

21,362 words to predict

The second data set is still not “big data” by any means, but is a larger data set and intended to evaluate a typical single-machine use case:

Training data: 72 .txt files, totaling 150 KB

Two testing data sets: 9225 .txt files, totaling 75 MB; and 1,125, totaling 15 MB

13+ million words

Note that we have not evaluated “big data” data sets here. This is because while spaCy can take advantage of multicore CPU’s, it cannot take advantage of a cluster in the way Spark NLP natively does. Therefore, Spark NLP is orders of magnitude faster on terabyte-size data sets using a cluster—in the same way a large-scale MPP database will greatly outperform a locally installed MySQL server. Our goal here is to evaluate these libraries on a single machine, using the multicore functionality of both libraries. This is a common scenario for systems under development, and also for applications that do not need to process large data sets.

Getting started

Let's get our hands dirty, then. First things first, we've got to bring the necessary imports and start them up.

spaCy

import os import io import time import re import random import pandas as pd import spacy nlp_model = spacy.load('en', disable=['parser', 'ner']) nlp_blank = spacy.blank('en', disable=['parser', 'ner'])

I've disabled some pipelines in spaCy in order to not bloat it with unnecessary parsers. I have also kept an nlp_model for reference, which is a pre-trained NLP model provided by spaCy, but I am going to use nlp_blank, which will be more representative, as it will be the one I’ll be training myself.

Spark-NLP

import org.apache.spark.sql.expressions.Window import org.apache.spark.ml.Pipeline import com.johnsnowlabs.nlp._ import com.johnsnowlabs.nlp.annotators._ import com.johnsnowlabs.nlp.annotators.pos.perceptron._ import com.johnsnowlabs.nlp.annotators.sbd.pragmatic._ import com.johnsnowlabs.nlp.util.io.ResourceHelper import com.johnsnowlabs.util.Benchmark

The first challenge I face is that I am dealing with three types of tokenization results that are completely different, and will make it difficult to identify whether a word matched both the token and the POS tag:

spaCy's tokenizer, which works on a rule-based approach with an included vocabulary that saves many common abbreviations from breaking up

SparkNLP tokenizer, which also has its own rules for tokenization

My training and testing data, which is tokenized by ANC's standard and, in many cases, it will be splitting the words quite differently than our tokenizers

So, to overcome this, I need to decide how I am going to compare POS tags that refer to a completely different set of tags. For Spark-NLP, I am leaving as it is, which matches somewhat the ANC open standard tokenization format with its default rules. For spaCy, I need to relax the infix rule so I can increase token accuracy matching by not breaking words by a dash "-".

spaCy

class DummyTokenMatch: def __init__(self, content): self.start = lambda : 0 self.end = lambda : len(content) def do_nothing(content): return [DummyTokenMatch(content)] model_tokenizer = nlp_model.tokenizer nlp_blank.tokenizer = spacy.tokenizer.Tokenizer(nlp_blank.vocab, prefix_search=model_tokenizer.prefix_search, suffix_search=model_tokenizer.suffix_search, infix_finditer=do_nothing, token_match=model_tokenizer.token_match)

Note: I am passing vocab from nlp_blank, which is not really blank. This vocab object has English language rules and strategies that help our blank model tag POS and tokenize English words—so, spaCy begins with a slight advantage. Spark-NLP doesn’t know anything about the English language beforehand.

Training pipelines

Proceeding with the training, in spaCy I need to provide a specific training data format, which follows this shape:

TRAIN_DATA = [ ("I like green eggs", {'tags': ['N', 'V', 'J', 'N']}), ("Eat blue ham", {'tags': ['V', 'J', 'N']}) ]

Whereas in Spark-NLP, I have to provide a folder of .txt files containing delimited word|tag data, which looks just like ANC training data. So, I am just passing the path to the POS tagger, which is called PerceptronApproach.

Let’s load the training data for spaCy. Bear with me, as I have to add a few manual exceptions and rules with some characters since spaCy’s training set is expecting clean content.

spaCy

start = time.time() train_path = "./target/training/" train_files = sorted([train_path + f for f in os.listdir(train_path) if os.path.isfile(os.path.join(train_path, f))]) TRAIN_DATA = [] for file in train_files: fo = io.open(file, mode='r', encoding='utf-8') for line in fo.readlines(): line = line.strip() if line == '': continue line_words = [] line_tags = [] for pair in re.split("\\s+", line): tag = pair.strip().split("|") line_words.append(re.sub('(\w+)\.', '\1', tag[0].replace('$', '').replace('-', '').replace('\'', ''))) line_tags.append(tag[-1]) TRAIN_DATA.append((' '.join(line_words), {'tags': line_tags})) fo.close() TRAIN_DATA[240] = ('The company said the one time provision would substantially eliminate all future losses at the unit .', {'tags': ['DT', 'NN', 'VBD', 'DT', 'JJ', '-', 'NN', 'NN', 'MD', 'RB', 'VB', 'DT', 'JJ', 'NNS', 'IN', 'DT', 'NN', '.']}) n_iter=5 tagger = nlp_blank.create_pipe('tagger') tagger.add_label('-') tagger.add_label('(') tagger.add_label(')') tagger.add_label('#') tagger.add_label('...') tagger.add_label("one-time") nlp_blank.add_pipe(tagger) optimizer = nlp_blank.begin_training() for i in range(n_iter): random.shuffle(TRAIN_DATA) losses = {} for text, annotations in TRAIN_DATA: nlp_blank.update([text], [annotations], sgd=optimizer, losses=losses) print(losses) print (time.time() - start)

Runtime

{'tagger': 5.773235303101046} {'tagger': 1.138113870966123} {'tagger': 0.46656132966405683} {'tagger': 0.5513760568314119} {'tagger': 0.2541630900934435} Time to run: 122.11359786987305 seconds

I had to do some field work in order to bypass a few hurdles. The training wouldn’t let me pass my tokenizer words, which contain some ugly characters within (e.g., it won’t let you train a sentence with a token "large-screen" or "No." unless it exists in vocab labels. Then, I had to add those characters to the list of labels for it to work once found during the training.

Let see how it is to construct a pipeline in Spark-NLP.

Spark-NLP

val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") .setPrefixPattern("\\A([^\\s\\p{L}\\d\\$\\.#]*)") .addInfixPattern("(\\$?\\d+(?:[^\\s\\d]{1}\\d+)*)") val posTagger = new PerceptronApproach() .setInputCols("document", "token") .setOutputCol("pos") .setCorpusPath("/home/saif/nlp/comparison/target/training") .setNIterations(5) val finisher = new Finisher() .setInputCols("token", "pos") .setOutputAsArray(true) val pipeline = new Pipeline() .setStages(Array( documentAssembler, tokenizer, posTagger, finisher )) val model = Benchmark.time("Time to train model") { pipeline.fit(data) }

As you can see, constructing a pipeline is a quite linear process: you set the document assembling, which makes the target text column a target for the next annotator, which is the tokenizer; then, the PerceptronApproach is the POS model, which will take as inputs both the document text and the tokenized form.

I had to update the prefix pattern and add a new infix pattern to match dates and numbers the same way ANC does (this will probably be made default in the next release). As you can see, every component of the pipeline is under control of the user; there is no implicit vocab or English knowledge, as opposed to spaCy.

The corpusPath from PerceptronApproach is passed to the folder containing the pipe-separated text files, and the finisher annotator wraps up the results of the POS and tokens for it to be useful next. SetOutputAsArray() will return, as it says, an array instead of a concatenated string, although that has some cost in processing.

The data passed to fit() does not really matter since the only NLP annotator being trained is the PerceptronApproach, and this one is trained with external POS Corpora.

Runtime

Time to train model: 3.167619593sec

As a side note, it would be possible to inject in the pipeline a SentenceDetector or a SpellChecker, which in some scenarios might help the accuracy of the POS by letting the model know where a sentence ends.

What’s next?

So far, we have initialized the libraries, loaded the data, and trained a tokenizer model using each one. Note that spaCy comes with pretrained tokenizers, so this step may not be necessary if your text data is from a language (i.e., English) and domain (i.e., news articles) that it was trained on, though the tokenization infix alteration is significant in order to more likely match tokens to our ANC corpus. Training was more than 38 times faster on Spark-NLP for about five iterations.

In the next installment in the blog series, we will walk through the code, accuracy, and performance for running this NLP pipeline using the models we’ve just trained.

Continue reading Comparing production-grade NLP libraries: Training Spark-NLP and spaCy pipelines.

from All - O'Reilly Media http://ift.tt/2EXLK43

#Blogger

0 notes

file-formats-programming · 8 years ago

Text

Modify Gantt Chart View’s Text Style & Project Files Loading Enhancements using .NET

What’s new in this release?

Aspose team is pleased to announce the new release of Aspose.Tasks for .NET 17.7.0. It includes an enhancement for adding text styles to a project’s tasks representation. We have also fixed several bugs in this month’s release which further add to the overall stability of the API. For a complete list of what is new and fixed, please visit the release notes section of the API in product documentation. Users can now modify Gantt chart view’s text style using the TableTextStyles collection of the API. This allows to format fields such as Color, Font style, etc. of the specified task fields in tasks list. The following code sample shows formatting of a task’s text style using this enhancement.Several bugs have been fixed in this month’s release. These are relevant to different API functionality and improve the overall API behavior in terms of expected results. Some of these include Project loading exceptions while reading certain MPP files, Errors raised by the API with the latest version of Microsoft Project upgrade 1706, Issue with lookup values not being created in MPP file, Calculation errors of project data after update, Issue with Task Links, visual display of Gantt Chart bars and Text Styling and more. This release includes plenty of new features as listed below

Add support for text style information.

Program hangs while loading MPP into Project

MPP file raises TasksReadingException

MPP file saved by MSP 2016 ver 1706 raises exception while loading into Project

An entry with the same key already exists - exception

Exception raised while loading attached MPP

TaskReadingException while loading MPP file

Aspose.Tasks is not setting the Bar color of Task and Summary task - MSP 2007

MPP project cannot be saved to MemoryStream

Lookup values not created properly in MPP

TaskLink don't affect the dates from successor task while using an MPP file

Exception raised while applying constraint

Task baseline don't save to MPP file

Finish date not recalculated properly

MPP to XLSX: Resultant file doesn't contain any data

Exception is raised while loading Primavera XER file

Gantt chart task bar ends at one third of day

Tasks text style information lost while loading and saving MPP again

Other most recent bug fixes are also included in this release

Newly added documentation pages and articles

Some new tips and articles have now been added into Aspose.Tasks for .NET documentation that may guide users briefly how to use Aspose.Tasks for performing different tasks like the followings.

Support for Text Styling

Microsoft Project MPP File Update

Overview: Aspose.Tasks for .NET

Aspose.Tasks is a non-graphical .NET Project management component that enables .NET applications to read, write and manage Project documents without utilizing Microsoft Project. With Aspose.Tasks you can read and change tasks, recurring tasks, resources, resource assignments, relations and calendars. Aspose.Tasks is a very mature product that offers stability and flexibility. As with all of the Aspose file management components, Aspose.Tasks works well with both WinForm and WebForm applications.

More about Aspose.Tasks for .NET

Homepage of Aspose.Tasks for .NET

Download Aspose.Tasks for .NET

Online documentation of Aspose.Tasks for .NET

#.NET Project management #loading MPP into Project #MPP to XLSX conversion #reading certain MPP files #auto update project data

0 notes

file-formats-programming · 7 years ago

Text

Set Default Font When Rendering Project into PDF & Enhanced MPP Files Loading using .NET

What’s new in this release?

Aspose team is pleased to announce the new release of Aspose.Tasks for .NET 17.12.0. This month’s release includes several improvements in terms of feature enhancements and bug fixes. Specifically, it introduces the capability of setting default font for exporting project data to PDF. This month’s release includes an enhancement for setting default font during project data conversion to PDF. Setting default font during rendering of documents helps when a font is not found on the server. In such case, default font replaces the missing font and output is not affected. This can be specified using the DefaultFontName property of the PdfSaveOptions. This release also includes fixes for issues found with the previous version of the API, such as Project reading exceptions while loading certain Project MPP files, Issues with Task Duration during recalculations, Incorrect start and finish times of resource baseline, Issues with header text while rending project data and Font information lost for MPP file. This release includes plenty of new features as listed below

Add support for setting a default font when a project is rendering into PDF

Task notes not saved for template file from MSP 2016

Resource assignment units raise exception when large value is set

Task duration becomes zero if multiple resources are assigned

Project reading exception while loading the MPP file

AT breaks the showing of GanttBarStyle for manual summary tasks

Resource assignment has incorrect baseline start/finish date

FontFamily not set in MPP

Header text is only changed for the default view

Other most recent bug fixes are also included in this release

Newly added documentation pages and articles

Some new tips and articles have now been added into Aspose.Tasks for .NET documentation that may guide users briefly how to use Aspose.Tasks for performing different tasks like the followings.

Saving Project Data to JPEG

Setting Default Font

Overview: Aspose.Tasks for .NET

More about Aspose.Tasks for .NET

Homepage of Aspose.Tasks for .NET

Download Aspose.Tasks for .NET

Online documentation of Aspose.Tasks for .NET

#project rendering to PDF File #set default font PDF Conversion #loading Project MPP files #Task Duration during recalculations #Saving Project Data to JPEG #.NET Project management

0 notes

file-formats-programming · 8 years ago

Text

Set Output JPEG Image Quality While Exporting Project Data inside .NET Apps

What’s new in this release?

Aspose team is pleased to announce the new release of Aspose.Tasks for .NET 17.11.0. This month’s release includes a new feature of setting output image quality while exporting project data to JPEG. Other than the new feature and enhancement, it also includes several improvements in terms of bug fixes that further add to the stability of the API. For a detailed note on what is new and fixed, please visit the release notes section of API documentation. This month’s release includes a new feature setting output JPEG image quality while exporting project data. This is achieved using the JpegQuality property of ImageSaveOptions class as shown in the code sample on blog announcement page. There are some other important improvements part of this release, such as Issues with recalculation method of the API for manually scheduled tasks, problems with actual duration and finish dates of MPP and XML project data files, Missing time span from split parts collection, Rendering of sub-tasks while converting MPX to PDF1b, Calculation issue with percent complete in MPP as compared to XML output, Exceptions while loading certain project files and Errors raised by Microsoft Project 2010 with MPP files generated using Aspose.Tasks API. This release includes plenty of new features as listed below

Add option to set image quality when saving as JPEG

Enum GanttBarFillPattern should have value 11 corresponding to fill pattern in MSP 2016

Recalculate() is updating manually scheduled tasks

Wrong finish date calculated for ElapsedDay type duration

Prevent recalculation of manually scheduled tasks

Wrong Actual Duration in MPP file

SplitParts collection misses time span

Sub-tasks not rendered while converting MPX to PdfA1b

Wrong Finish date in XML file

Wrong Percent complete in MPP as compared to XML output

Loading project raises ProjectReadingException

TaskReadingException while reading the MPP file

MSP 2010 raises error while updating and saving MPP created by Aspose.Tasks

Other most recent bug fixes are also included in this release

Newly added documentation pages and articles

Some new tips and articles have now been added into Aspose.Tasks for .NET documentation that may guide users briefly how to use Aspose.Tasks for performing different tasks like the followings.

Saving Project Data to JPEG

Saving a Project as PDF

Overview: Aspose.Tasks for .NET

More about Aspose.Tasks for .NET

Homepage of Aspose.Tasks for .NET

Download Aspose.Tasks for .NET

Online documentation of Aspose.Tasks for .NET

#Add option to set image #Saving Project Data to JPEG #Saving a Project as PDF #converting MPX to PdfA1b #Project Data Conversion to Images #.NET Project management

0 notes

file-formats-programming · 8 years ago

Text

Project Recalculation Improvements & Enhanced Project Data Saving to XML inside Java Apps

What’s new in this release?

Aspose team is pleased to announce the release of Aspose.Tasks for Java 17.5.0. This month’s release is kind of maintenance release where Aspose team has fixed several bugs related to various functional areas of the API. This month’s release is sort of maintenance release where Aspose team has fixed several bugs related to API functionality. These include scenarios where loading or saving sample MPP files raised to various exceptions, Improvement in project recalculation resulting in more accurate output results, Improved project data saving to XML format where the project calendars resulted in erroneous calendar entries, Issues with Timephased data writing to output MPP file which resulted in repeated entries in output XML file and some wrong work values in certain cases, Problems with actual start and percent complete and actual duration while saving project data to MPP using the API. Moreover, Fixing of Out of memory errors while exporting project data to image, Issues related to preserving formulas while saving project data as MPP and Differences in Tasks duration for MSP 2010 and 2016 file formats. Below is the complete list of bug fixes and enhanced features included in this release.

Tasks with custom timephased data has Percent Complete > 100% and MSP in XML format cannot be imported.

Formulas get corrupted after file save

Loading a MPP file using Aspose.Tasks throw exception An item with the same key has already been added

Recalculation of project sets percent complete to zero on milestone tasks

Saving Project raises TaskWritingException

Erroneous calendar entry added in XML while converting MSP 2016 MPP

Out of Memory error while saving MPP to PNG

Task duration shown wrong in MSP 2016 as compared to MSP 2010

Saving MPP file hangs and never returns

Timephased data entries are repeated for AssignmentActualWork in the XML file

Timephased data not copied while saving project as MPP

Wrong Actual Start, % Complete and Actual duration calculated while saving MPP

The value of actual start of parent node set to NA while loading and saving the project

TimephasedData written to MPP File shows wrong Work Values for the Last two days

Project.getCustomProperties gives compilation error in latest release

Newly added documentation pages and articles

Some new tips and articles have now been added into Aspose.Tasks for Java documentation that may guide users briefly how to use Aspose.Tasks for performing different tasks like the followings.

Extended Task Attributes

Writing Metadata to MPP

Overview: Aspose.Tasks for Java

More about Aspose.Tasks for Java

Homepage of Aspose.Tasks for Java

Download Aspose.Tasks for Java

Online documentation of Aspose.Tasks for Jvaa

#project recalculation Improvement #Improved project data saving to XML #Timephased data writing to MPP file #saving project data to MPP #export project data to image #Java Project management API

0 notes

file-formats-programming · 8 years ago

Text

Project Data Writing to MPP, XML & Image Formats with Enhanced Project Recalculation using .NET

What’s new in this release?

Aspose team is pleased to announce the new release of Aspose.Tasks for .NET 17.5.0. This month’s release is kind of maintenance release where Aspose team has fixed several bugs related to various functional areas of the API. This month’s release is sort of maintenance release where Aspose team has fixed several bugs related to API functionality. These include Scenarios where loading or saving sample MPP files raised to various exceptions, Improvement in project recalculation resulting in more accurate output results, Improved project data saving to XML format where the project calendars resulted in erroneous calendar entries, Issues with Timephased data writing to output MPP file which resulted in repeated entries in output XML file and some wrong work values in certain cases, Problems with actual start, percent complete and actual duration while saving project data to MPP using the API, Fixing of Out of memory errors while exporting project data to image, Issues related to preserving formulas while saving project data as MPP and Differences in Tasks duration for MSP 2010 and 2016 file formats. This release includes plenty of new features as listed below

Tasks with custom timephased data has Percent Complete > 100% and MSP in XML format cannot be imported.

Formulas get corrupted after file save

Loading a MPP file using Aspose.Tasks throw exception An item with the same key has already been added

Recalculation of project sets percent complete to zero on milestone tasks

Saving Project raises TaskWritingException

Erroneous calendar entry added in XML while converting MSP 2016 MPP

Out of Memory error while saving MPP to PNG

Task duration shown wrong in MSP 2016 as compared to MSP 2010 (.NET)

Saving MPP file hangs and never returns

Timephased data entries are repeated for AssignmentActualWork in the XML file (.NET)

Timephased data not copied while saving project as MPP

Wrong Actual Start, % Complete and Actual duration calculated while saving MPP

The value of actual start of parent node set to NA while loading and saving the project (.NET)

TimephasedData written to MPP File shows wrong Work Values for the Last two days

Other most recent bug fixes are also included in this release

Newly added documentation pages and articles

Some new tips and articles have now been added into Aspose.Tasks for .NET documentation that may guide users briefly how to use Aspose.Tasks for performing different tasks like the followings.

Reading VBA Information from MPP file

Microsoft Project MPP File Update

Overview: Aspose.Tasks for .NET

More about Aspose.Tasks for .NET

Homepage of Aspose.Tasks for .NET

Download Aspose.Tasks for .NET

Online documentation of Aspose.Tasks for .NET

#Project Data Writing to MPP #save sample MPP files #Improved project data saving to XML #Timephased data writing to output MPP #project data to image #.NET Project management #saving project data as MPP

0 notes