#detectron
Explore tagged Tumblr posts
Photo

#trituenhantaoio #detectron #facebook #AI #introduce #implement #stateoftheart #algorithm #object #bounding #detection https://www.instagram.com/p/CDhyTDlJ8cC/?igshid=1x9znb51jjaak
#trituenhantaoio#detectron#facebook#ai#introduce#implement#stateoftheart#algorithm#object#bounding#detection
0 notes
Link
1 note
·
View note
Text
The hardship of the setting DensePose - starting
So, I’ve quite my attempts to create a DensePose and Detectron on my laptop for some time now .. But I am trying to get back to be able to do it. My goal is to be able to run the recognition of the pose from the video sent to me and define the body type of the persons inside by black box of business rules. So this is a post I’ve found. Last time I tried to set it it was about 7 month ago. Hoping things have changed at this period of time and it will be less hard.. http://gianni.rosagallina.com/en/posts/2018/10/04/caffe2-gpu-windows-1.html
0 notes
Text
I just found a screenshot in my album and was reminded that I used a mask-CNN object detection package last semester that was trained on gta5 footage. And the image I used for demo was actually Trev. I hadn’t joined the fandom at that point and I probably wasn’t as excited as I am now but it was a really interesting package. It is called detectron, developed by facebook. There’re also some cool videos on youtube where you can see this mode l in action.


18 notes
·
View notes
Photo

And @xseanbonner grabbed the Detectron “Nucliometer” EFC pedal before I had a chance to officially post it! Sean has been doing work in Japan to help bring residents of radiation affected areas Geiger counters. This helps build trust that some areas are in fact safe to move back to. Wonderful idea. Sean’s been a long time supporter of RML so I’m very happy he got this unique piece. Especially since it’s directly related to the work he does. Looks like we may have one other similar box to do down the road. #pedalboard #botiquepedals #fuzzpedal #RMLfx #radiation https://www.instagram.com/p/B7Zz7TklKET/?igshid=wfcj1m3m2v97
2 notes
·
View notes
Photo
Top Deep Learning Projects | Artificial Intelligence Projects | Deep Learning Training | Edureka http://ehelpdesk.tk/wp-content/uploads/2020/02/logo-header.png [ad_1] AI and Deep Learning with Tensor... #androiddevelopment #angular #artificialintelligenceprojectideas #artificialintelligenceprojects #c #css #dataanalysis #datascience #deeplearning #deeplearningapplications #deeplearningprojectideas #deeplearningprojects #deeplearningprojectsforbeginners #deeplearningprojectsinpython #deeplearningsimplified #detectron #development #docker #edureka #facebookproject #googleautoml #googleml #googleproject #iosdevelopment #java #javascript #machinelearning #machinelearningprojects #mlprojectsforcollege #node.js #open-sourcemachinelearningprojects #open-sourceprojects #projects #python #react #tensorflow #unity #waveglow #webdevelopment #ytccon
0 notes
Photo
Detecting pedestrians and bikers on a drone with Jetson Xavier
Drones are one of the coolest technologies every maker and enthusiast wants to lay their hands on. At the same time as drones are becoming common, AI is rapidly advancing and we are now in a state where object detection and semantic segmentation are possible right onboard the drone. In this blog post, I will share how to perform object detection on images taken by a drone.
A few basics about the problem at hand
First, it is important to realize that we cannot realistically run object detection on general purpose embedded hardware such as the raspberry pi and special purpose hardware built for AI inferencing is required.
Second, if you run commonly used object detection models, such as YOLO or SSD trained on COCO and Pascal VOC datasets, they won’t do well at all. This is because the view of an object from a height is quite different from that on the ground. Thus, the distribution of the inference data will be very different from that encountered by the model during training, which will cause it to fail.
Some solutions
As mentioned in the title In this post, I will be using the highest end embedded processor for autonomous robots available at the time of writing, Jetson AGX Xavier from Nvidia. If you want to use a Jetson TX2 or nano, I will provide some suggestions for improving their performance towards the end of the post.
To remedy the problem of differing data distribution, researchers at Stanford University have released a dataset called Stanford Drone dataset, which contains several videos taken from drones along with labels for each frame of each video. There are six classes being detected: Biker, Car, Bus, Cart, Skater and Pedestrian.
Any decent object detection model trained the Stanford dataset should do a good job of detecting these six objects. In this post, we will be using RetinaNet, a very good object detection model released by Facebook (FAIR), which shapes the loss function in such a way that the model learns to focus on hard examples during training and thus learns much better. More details about RetinaNet can be found here.
Getting the model and converting to fp16
Although the Detectron 2 model zoo by FAIR has several models which can be downloaded and used, they are all trained on COCO and Pascal datasets, which as we discussed are not useful for our task. It would be a big task to train a model from scratch on the Stanford dataset. Fortunately, I found that Nvidia provided a model trained on the this dataset as a part of one of their webinars on DeepStream SDK. I am not trying to endorse Nvidia in this blog post, but if you want to get started without training your own model, the quickest way would be to register for the webinar and download the model as a part of the webinar resources. The file you are looking for is stanford_resnext50.onnx which is about 150 MB in size.
Once you have this file, you can use the C++ API of retinanet-examples repository from GitHub to convert the onnx file to an engineplan file which is compiled specifically for the Jetson device you are using. Here is a walkthrough of these steps:
git clone https://github.com/NVIDIA/retinanet-examples.git cd retinanet-examples/extras/cppapi mkdir build && cd build cmake -DCMAKE_CUDA_FLAGS="--expt-extended-lambda -std=c++11" .. make cp /path/to/onnx/file . #copy onnx file to this directory./export stanford_resnext50.onnx engine.plan
After engine.plan has been created successfully, you can use the infer utility to test out the performance of the model on an image:
./infer engine.plan image.jpg
This will write a file called detections.png which will have the bounding boxes of the objects detected in the image. I gave an image from one of the videos in the Stanford dataset as input to the model and here is what it spat out:
We can see that the model has detected several pedestrians walking on the sidewalk. This would not be possible with Pascal or COCO trained models. The aspect ratio of this image looks odd because the infer utility resizes the image to 1280x1280, which is the input size of the model.
Inference on video
Now that we can detect pedestrians in a single image, it is straightforward to extend this to videos by editing the infer.cpp file in cppapi directory. I came up with this:
https://gist.github.com/dataplayer12/6b1fe64b783d2b319f43bfa2f8bd9a5c#file-infervideo-cpp
To use this script, save it as infervideo.cpp in the cppapi directory and edit CMakeLists.txt to add the lines to add the infervideo executable and link it to retinanet and other libraries.
Once you are done, you can switch over to the build directory and invoke cmake and make as before:
Once the targets have been built you will see a new executable in the build directory called infervideo, which can be used as:
cmake -DCMAKE_CUDA_FLAGS="--expt-extended-lambda -std=c++11" .. && make
Once the targets have been built you will see a new executable in the build directory called infervideo, which can be used as:
./infervideo engine.plan input.mov output.mp4
This will create a new video called output.mp4 which will show the bounding boxes for each object. If you want to perform object detection on a live video stream from the drone, you can simply provide the gstreamer pipeline of your camera as the second argument to the script and it will handle cameras as well.
I used the script above to run inference on a video taken at the IIT Delhi campus in New Delhi, my alma mater. I used a low threshold of 0.2 for drawing bounding boxes, which is why there are some false positives in the video below.
Improving performance
If you run the script provided above on the Xavier, you will find that each frame of the video takes 150 ms for inference. This is painfully slow on the Xavier and would be even slower on the smaller Jetsons like TX2 or nano. Here are some things we can do to improve the performance:
The engine we created in this post was for fp16 precision. You can run this model on integer precision, which will improve its performance significantly. For this, you can use a small subset of the data from the Stanford dataset and the trtexec utility from tensorRT to create an INT8 calibration file and provide that file to export utility in our build directory.
In practice, any realtime object detection pipeline does not perform full inference on each frame, but usually mixes it up with computationally inexpensive trackers such as the Kalman filter or optical flow. You can use opencv’s KalmanFilter class to keep track of objects across frames and perform inference only once per 4 or 5 frames. If the drone does not have a sudden jerky motion between inferences, this will work well in practice.
The model we are using is very large as it takes in 1280x1280 images. You could train the model on lower resolution images or even a custom dataset to significantly improve the latency and throughput of the model. Instructions for training the model are on retinanet-examplesrepository, but this is best done on a good x86 workstation with a CUDA enabled GPU.
Conclusion
This blog post is meant for anyone who having trouble deploying a retinanet model on Jetson Xavier and to chronicle my efforts towards getting a good object detection pipeline running on a drone. While we get significantly better results than would be possible with a Coco/Pascal model, there are many improvements to be made to make the model run in realtime on a Jetson device. Please use this as a starting point for your own projects and if you have any other suggestions to improve the performance, please do comment below.
0 notes
Photo
"[P] Implementation of VoVNet(CVPRW'19)"- Detail: Hi,I implemented VoVNet which is efficient backbone network presented in CVPR workshop on CEFRL.My implementations provide ImageNet classification and object detections in Detectron.Highlight2x faster than DenseNet on ImageNet classificationmore accurate than ResNet, especially on the small object detectionImageNet classification : http://bit.ly/31ylYiJ : http://bit.ly/2IPPTu3. Caption by stigma0617. Posted By: www.eurekaking.com
0 notes
Text
Detectronでおしゃ文字をつくる - バスキュール技術ブログ [はてなブックマーク]
Detectronでおしゃ文字をつくる - バスキュール技術ブログ

エンジニアの丸山(@maruware)です。 1年ほど前にデイリーポータルZで「現実化したフォント」がおしゃれなのでマスターしたい :: デイリーポータルZ という記事が掲載されていました。 こういうの(記事内より引用) おしゃれですね。記事内ではこれを"おしゃ文字"と名付けていました。 一方でDeep Learning的な研究の進...
from kjw_junichiのはてなブックマーク http://bit.ly/2VFBtlw
0 notes
Photo

#trituenhantaoio #detectron #facebook #AI #introduce #implement #stateoftheart #algorithm #object #bounding #detection — view on Instagram https://ift.tt/30wiONN
0 notes
Photo
via @amagitakayosi
めっちゃいい / “Detectronでおしゃ文字をつくる - バスキュール技術ブログ” https://t.co/sfVZefYUvc
— amagi (@amagitakayosi) April 12, 2019
0 notes
Text
Rosetta: large scale system for text detection and recognition in images
Rosetta: large scale system for text detection and recognition in images Borisyuk et al., KDD’18
Rosetta is Facebook’s production system for extracting text (OCR) from uploaded images.
In the last several years, the volume of photos being uploaded to social media platforms has grown exponentially to the order of hundreds of millions every day, presenting technological challenges for processing increasing volumes of visual information… our problem can be stated as follows: to build a robust and accurate system for optical character recognition capable of processing hundreds of millions of images per day in realtime.

Images uploaded by clients are added to a distributed processing queue from which Rosetta inference machines pull jobs. Online image processing consists of the following steps:
The image is downloaded to a local machine in the Rosette cluster and pre-processing steps such as resizing (to 800px in the larger dimension) and normalization are performed.
A text detection model is executed to obtain bounding box coordinates and scores for all the words in the image.
The word location information is passed to a text recognition model that extracts characters given each cropped word region from the image.
The extracted text along with the location of the text in the image is stored in TAO.
Downstream applications such as search can then access the extracted textual information corresponding to the image directly from TAO.

The most interesting part is of course the text extraction using the two-step process outlined above.
This two-step process has several benefits, including the ability to decouple training process and deployment updates to detection and recognition models, run recognition of words in parallel, and independently support text recognition for different languages.

Text detection in Rosetta
Text detection is the most compute and latency sensitive component. After evaluating several different approaches the authors settled on the Faster-RCNN detection model. Amongst other reasons, Faster-RCNN was readily available to them as part of the Facebook Detectron platform. Since Detectron has been open-sourced by Facebook, that means it’s readily available to you too!
Faster-RCNN learns a fully convolutional CNN that can represent an image as a convolutional feature map. It also learns a region proposal network that takes the feature map as an input and produces a set of k proposal bounding boxes that contain text with high likelihood, together with their confidence score. There are a number of different choices for the convolutional body of Faster-RCNN. In tests, ShuffleNet proved the fastest (up to 4.5x faster than ResNet-50).
To train the text detection model the team initially used the COCO-Text dataset, but the wide-variety of text in images uploaded to Facebook (including e.g. lots of images with overlaid words) didn’t match well to the COCO-Text training set. In the end, the team used three different datasets for training: first an artificially generated dataset with text overlaid images; then COCO-Text; and finally a human-rated dataset specifically collected for Facebook client applications. The following table shows how accuracy improved as the various datasets were introduced.

Text recognition in Rosetta
Text recognition is done using a fully-convolutional model called CTC (because it uses a sequence-to-sequence CTC loss during training) that outputs a sequence of characters. The last convolutional layer predicts the most likely character at every image position of the input word.

(Enlarge)
… every column of the feature map corresponds to the probability distribution of all characters of the alphabet at that position in the image, and CTC finds the alignments between those predictions, which may contain duplicate characters or a blank character, and the ground truth label.
For example, given the input training word LEARNING, the model might produce the sequence of characters ‘L-EE-A-RR-N-I-NN-G,’ which includes blanks (‘-’) and duplicates.
Decoding greedily takes the most likely character at every position of the sequence, and the in post-processing contiguous duplicate characters not delimited by the blank character are removed.
… a pre-defined dictionary would be too limiting for many real-world applications, which require recognizing more than just simple words as in the case of URLs, emails, special symbols and different languages. Therefore, an important architectural decision and a natural choice was to use a character-based recognition model.
A fixed width is needed at training time to be able to efficiently train using batches of images. Word images are resized to 32×128 pixels, with right-zero padding if the original is less than 128 pixels wide. This minimises the amount of distortion introduced in the images. During testing images are resized to a height of 32 pixels, preserving their aspect ratio (regardless of resulting width).
The number of character probabilities emitted is dependent on the width of the word image. A stretching factor of 1.2 was found to lead to superior results compared to using the original aspect ratio (you get 20% more output probabilities that way).
Training the CTC model proved difficult with it either diverging after just a few iterations or training too slowly to be of practical use. The solution was to use curriculum learning, i.e., starting with a simpler problem and increasing the difficulty as the model improves.
Training started with words of three characters or less, with the maximum word length increasing at every epoch. The width of images was also reduced initially. Training started with a tiny learning rate and this was also gradually increased at every epoch.

The overall accuracy of the system was further improved by 1.54% by introducing random jittering in the training set — randomly moving the bounding box coordinates of ground truth to model the behaviour of noise from the detection model.
Our system is deployed to production and processes images uploaded to Facebook everyday.
the morning paper published first on the morning paper
0 notes
Text
Facebook open sources Detectron
https://research.fb.com/facebook-open-sources-detectron/ Comments
0 notes
Photo

I am the “Detectron” ...sounds a lot like “Electron”. That can only mean we made it into an Electron Fuzz 🙀#pedalboard #fuzz #vintageelectronics https://www.instagram.com/p/B7UqBnolV5B/?igshid=dbem21agowhl
0 notes
Text
2019, viti i Inteligjencës Artificiale
Para se të flasim për pesë arritjet më të mira të AI-së në vitin 2019, është e rëndësishme të përmendim se është bërë shumë punë në AI që nga konceptimi i saj. Qindra laboratorë në të gjithë botën po punojnë për mijëra koncepte të AI-së. Ndërsa AI po zhvillohet vazhdimisht, është shumë e vështirë të gjurmosh arritjet e një viti, të cilin AI ka pretenduar. Inteligjenca artificiale ka për qëllim të zëvendësojë inteligjencën konjitive të njeriut brenda makinerive. Një sasi e madhe e të dhënave përfshihen në makina dhe ato janë trajnuar të përgjigjen në përputhje me rrethanat.
Më poshtë do japim 5 arritjet e Inteligjencës Artificiale të marra në vitin 2019.
1.Inteligjenca artificiale diagnostikoi me sukses kancerin e mushkërive
Sipas Organizatës Botërore të Shëndetit (WHO), kanceri i mushkërive shkaktoi 9.6 milion vdekje në 2018 në të gjithë botën. Kanceri i mushkërive është lloji më i zakonshëm i kancerit. Kjo mund të diagnostikohet ose duke skanuar duke përfshirë rreze X (për një masë jonormale të nodujve), skanimin e CT (për lezione të vogla), testin e sputumit ose biopsinë e cila është një metodologji komplekse.
Në vitin 2019, Shkencëtarët nga Google AI dhe Northwestern Medicine, Chicago, IL, bashkëpunuan në një projekt për të vendosur inteligjencën artificiale për parashikim të shpejtë, të lirë dhe të saktë të kancerit të mushkërive. Programit nuk iu tha se çfarë të kërkoni në ato imazhe, megjithatë, ajo ishte e ngarkuar me informacionin në lidhje me të cilin pacienti zhvilloi kancer të mushkërive. Ky program u testua kundër radiologëve specialistë. Rezultatet zbuluan se AI tejkaloi radiologët me 5% rritje dhe 11% ulje pozitive të rremë. Ende nuk dihet se cilat modele janë vërejtur nga programi, megjithatë rezultatet ishin inkurajuese. Ky program ishte aq efikas sa skanoi 500 imazhe në vetëm 10 minuta. Kjo do të ndihmojë shumë në uljen e kostos së diagnozës, si dhe rreziqet e sëmundjes.
2.Facebook Detectron
Përpunimi i figurës dhe njohja e modelit janë fushat kryesore të hulumtimit të inteligjencës artificiale. Ju mund të jeni të njohur me aplikacionin për njohjen e fytyrës të shkaktuar nga AI që përdoret për të zbuluar një fytyrë gjatë fotografimit ose skanimit të fytyrës për qëllime të sigurisë. Në mënyrë të ngjashme, Facebook ka filluar programin e tij të njohjes së objekteve (Biblioteka Computer Vision), mundësuar nga AI që përdoret për të zbuluar llojin e objekteve.. Detectron 2 është përdorur për të zbuluar format me shpejtësi më të shpejtë. Mund të bëjë dallimin lehtësisht midis njeriut dhe një biçiklete. Në të ardhmen, më shumë modele të të dhënave do t’i jepen programit për të rritur aftësitë e njohjes.
3.PEGASUS: Para-trajnimi me fjali të nxjerra nga boshllëqet për përmbledhje abstrakte
Puna dhe përparimi i madh janë bërë në nën-fushën e AI-së të quajtur Procesi i Gjuhës Natyrore (NLP) dhe njohja e tekstit. Përmbledhja automatike e tekstit ka qenë gjithmonë një sfidë për studiuesit e AI. Ekipi i Google Brain dhe Imperial College në Londër kanë hartuar me sukses një sistem të quajtur “Para-trainimi me fjali të ekstraktuara për sekuencat Abstrakte SUmmarization-në-sekuencë”, ose thjesht, PEGASUS. Ky program është testuar për 12 detyra përmbledhëse përfshirë lajme, literaturë shkencore, histori, informacion, email, patenta dhe fatura legjislative. Pegasus ka bërë të mundur arritjen e përmbledhjes së burimeve të ulëta. Ideja kryesore ishte të përdorim fjalë të vetme në vend të frazave të gjata që përcjellin mesazhin e duhur.
4.Teknologjia DeepMind-Wavenet prezantoi një mënyrë për të rikuperuar zërin origjinal të pacientëve me dëmtim të të folurit
Al ka vërtetuar rëndësinë e saj në diagnostikimin e sëmundjes mjaft efektive. Por sëmundje të caktuara janë të lidhura drejtpërdrejt me shqisat njerëzore si shikimi, erë, dëgjimi, etj. Skleroza shkakton vdekjen e atyre neuroneve që kontrollojnë veprimet vullnetare. Kjo çon në një humbje totale të kontrollit mbi trupin. ASL nuk ka asnjë arsye të njohur, pra asnjë kurë. Tim Shaw, një sulmues i njohur në NFL, kohët e fundit u përball me ASL, dhe humbet aftësinë e tij për të folur qartë.
Në vitin 2019, njerëzit nga DeepMind bashkëpunuan me projektin Euphonia të Google që përfshin Tim Shaw për sinjalet e tij të zërit. Qëllimi i tyre ishte të vërtetonin se teknologjia e tekstit në të folur mund të gjenerojë zë natyral me cilësi të lartë, duke përdorur një sasi shumë të vogël të të dhënave të trajnimit. Kishte kërkesa të caktuara për të ndjekur këtë projekt.
Së pari, ishte e nevojshme një teknologji për të njohur fjalimin e njerëzve me shqiptim jonormal. Së dyti, personalizimi i teknologjisë nga teksti në të folur për të gjeneruar zërin e tyre natyror. Këto kërkesa u përmbushën duke përdorur orë të regjistrimit në studio dhe trainim të programit për sinjalet e zërit të dëmtuar kundër tekstit të caktuar.
Duke kombinuar WaveNetwork dhe “Sample-Effect Ad adaptive Adapter to Text-to-Speech” (TTS), ata ishin në gjendje të gjeneronin zërin e vërtetë të Tim. Kjo ishte mjaft befasuese për Tim dhe familjen e tij. Por arritja ishte e madhe. Kjo punë e bërë ka premtuar se AI do të zgjidhë shumë probleme të tjera të të folurit në të ardhmen.
5.SingularityNet- Platforma e parë për Ekonominë e Decentralizuar të AI
Imagjinoni nëse e gjithë AI është zhvilluar dhe kontrolluar nga peshq të mëdhenj si Google ose IBM, dhe ajo merr kontrollin e pothuajse çdo aspekti të jetës tuaj. A nuk do të ndjeheshi skllavëruar nga disa? Më parë, AI ishte ndërtuar për platforma të centralizuara që bashkonin të dhënat globale në një vend. Por viti 2019 ka sjellë një shpresë të re në epokën e AI.
Klientët e mundshëm mund të kërkojnë katalogun e SingularityNet për shërbimet në dispozicion ose të kërkojnë shërbime të personalizuara të AI. Kjo mund të hap dyert për kompanitë e shkallës së ulët të AI për t’u bashkuar në garën e revolucionit llogaritës./Apel.al
from WordPress https://apel.al/2019-viti-i-inteligjences-artificiale/ via IFTTT
0 notes
Photo

A German wild nature park near the town of Güstrow installed a dozen webcams so that anyone could watch animals in the wild. Most of the cameras are hanging in a natural forest, through which wild animals pass from time to time. And although the places for cameras were chosen meaningfully (the lake, near which bears like to wallow during the day, the place where they leave food for lynx, etc.), it is quite difficult to catch animals. In order not to spend a lot of time waiting, I assembled a simple script over the weekend, which every X minutes downloads pictures, passes them through a neural network detector and, if something interesting was found, throws them into a Telegram channel.
Last time I used Facebook Detectron to measure the queue length, so now I decided to try a pretrained Tensorflow Object Detection out of the box as a detection model. The quality turned out to be so-so: a lot of false positives, and the model regularly recognizes the lynx and wolves as sheep and dogs. Perhaps, next time I'll give a try to the YOLO detector. In short, I tuned some simple heuristics on top of the model in favor of the detection accuracy, so now a couple of times per day a photo of some animal comes to the channel together with a link to the webcam image, so you can continue watching it real time.
Some pics and code in my medium post.
0 notes