#tera data interview questions
Explore tagged Tumblr posts
interviewmaterial · 3 years ago
Link
Interview Material is a online platform who teaches you about interview questions and answers
0 notes
tyaudahlah · 6 years ago
Text
#CeritaTyaHariIni: Semester Sinting
uas udah kelar, tapi aku masih bangun jam segini dengan bekal tidur sejaman lebih dari abis maghrib sampe jam set 9. ga taraweh, tepar soalnya, physically and mentally. aku masih ditemenin lagu-lagunya westlife, yang sekarang lagi gilirannya if i let you go, yang music videonya mereka di tepi pantai gitu, beuh ingat bat. ini lagu kedua yang aku apal abis my love soalnya. legendary.
oh iya, mau cerita tentang betapa fucked up nya aku semester ini. and even tho final exam has passed, i’m still not in a right state of mind right now.
bentar, mau ganti playlist dulu. malah nyanyi nih lama-lama.
oke, banting setir ke instrumental aja. lanjot.
kilas balik sebentar dari awal semester. kejadian di januari sama februari udah aku ceritain kan ya. nah di maret.. keknya aku belum cerita. di ujung maret lalu, sebelum aku berdamai sama diri sendiri lebih tepatnya, i’ve reach my lowest point during college life. aku tuh kalo misalnya berdiri dari tempat tinggi dan liat kebawah, selalu bergidik sendiri karna otomatis bayangin gimana kalo aku jatuh. tapi waktu itu, aku sendirian di kamar, natap dinding. but out of nowhere, i picture myself falling from a roof.
tapi dari semua mental breakdowns aku selama itu, i learned the hard way kalau ga semua masalah mental itu kaitannya karna jauh dari Tuhan. aku merasa bodoh waktu dulu-dulu ngasih saran bernada agamis ke temen. yeah, God is the one who will help you to get through all the mess. tapi ketika mereka cerita ke aku, yang sesama manusia ini, i realize that’s not what they need at the moment. mereka ga butuh digurui soal agama, soal apa yang harus mereka lakuin. yang dibutuhin tuh support, ditenangin, bilang it’s okay, gapapa nangis, gapapa capek, di embrace lah intinya.
trus di awal april syukurnya bisa sort things out sih, sama diri sendiri. things got a bit easier since then. kebodohanqu yang membuat kamar jadi banjir sampe merembes ke teras tetangga berhasil aku atasi dengan tenang, bahkan sambil ketawa.
oh iya, waktu aku ulang tahun sebenernya pengen nulis, tapi ga sempet gegara tugas dan belum nyicil materi uas samsek waktu itu.
my birthday was nothing special sih. tapi beda sama tahun lalu. kalo tahun lalu itu aku ultah pas udah kelar uas semua, dan seharian itu aku nangis mulu. kenapa ya.. keknya karna baru pertama kali self reflection deh. di tahun ini, aku syukurnya masih menyempatkan untuk punya me time di pagi hari, di sela-sela tugas yang sangat demanding. ga nangis, banyakan senyumnya, walopun ga banyak-banyak amat. pas pagi tuh aku jawab self reflection questions gitu di word, trus ngebalesin beberapa yang ngucapin ultah di wa sama di ig. trus... apa lagi ya. ga inget. tapi hari itu, sahabat-sahabat terdekat cuma kek “habede” gitu doang, and my parents or my brother didn’t even call. tapi aku ga hard feeling gitu loh. biasa aja, serius. i’m definitely okay with that since it’s not a big deal for me anymore.
satu hal yang aku realize di transisi ke 19 ini adalah, aku ngerasa aku grow up, dan dari reflection questions yang aku jawab di tahun lalu, aku jadi bisa tau personal development aku sejauh mana. aku jadi ngeh kenapa orang suka journalling. ketika dibaca ulang apa aja yang udah ditulis, rasanya campur aduk menyaksikan semua yang berhasil dilewatin.
enough with birthday talk, ayo ngomongin tentang tugas-tugas.
menjelang uas, ada 4 tugas besar. bikin mini proposal, laporan fgd sama indepth interview, makalah, dan rencana program. semuanya dari mk yang berbeda, dan tugas yang proposal itu satu-satunya tugas individu, yang akhirnya bisa aku kelarin tanggal 1 mei, sehari sebelum hari uas pertama. dan jam 1 tadi akhirnya aku kelarin tugas terakhir.
gegara banyak tugas, aku sama temen-temenku bingung mau nentuin prioritas, apakah belajar buat uas dulu atau ngerjain tugas dulu. mana deadlinenya tuh mepet-mepet semua. kan gila. akhirnya seiring waktu dijalanin aja sih.
oh iya, uas mk pertama di hari pertama berhasil bikin aku yang selalu santai soal ujian ini, menangis. judul mk nya manajemen dan analisis data, pokoknya ngolah data pake uji statistik di aplikasi. meskipun pake aplikasi ya tetep aja susah. malem itu aku belajar bareng temen dari jam 9 malem sampe 2 pagi. aku sekalian sahur buat puasa ganti. trus sisa 2 mk yang juga diujikan di hari itu aku baca dari jam 2 sampe pagi. iya, beneran ga tidur sama sekali.
hasilnya, ketika dihadapin sama soal yang diluar ekspektasi, tangan aku gemeteran megang kertas ujian. sumpa, ngebingungin banget. mana dosennya telat masuk buat ngawas dan akhirnya jam ujian kita yang dikurangin dan ga dikasih tambahan. akhirnya dari 3 soal esai itu, cuma nomor 2 yang keisi, itupun ga sepanjang jawaban orang lain keknya. soal nomor 3 ga kelar, nomor 1 ga sempet dikerjain sama sekali. abis ujian, tiba-tiba nangis. capek batin, anjir. padahal itu mk yang bener-bener diprioritasin, tapi malah itu yang paling ga maksimal.
itu hari pertama.
hari kedua, aku bener-bener ga maksimal jawab di mk kedua. hari ketiga, soalnya edan. materi yang dikasih beda sama yang dimasukin ke soal. hari keempat ga usah ditanya. masih ga beres. masa dua dosen di mk yang sama ngasih soal mereka masing-masing yang kalo ditotalin jadi 70, sementara mk lain paling banyak 50 doang.
pak, buk, kalian dimasukin di satu tim pengajar buat diskusi, kerjasama, sepakatin apa yang mau dikasih ke mahasiswa. ini malah seolah-olah dari mk yang beda hhhhhhhh kesal ukuran jumbo nih.
anyway, meskipun uas udah kelar, aku ga ada lega-leganya sama sekali. masih kepikiran tentang hasilnya. takut kalo ip aku beneran nurun. padahal harusnya setelah semua kecerobohan aku kemaren, aku ngebayar pake ip. tapi malah gini endingnya. mana departemenku tiba-tiba majuin tanggal event, trus aku yang in charge of promotional tools mesti diburu buat nyelesaiin poster h-7. anjir, dikata gampang apa hah. mana kerja sendiri lagi ini, team member 22nya masih pada uas.
trus sabtu meeting, tapi aku mesti beres-beres kosan. trus diminta ditunda atau dipercepat beresin kamarnya hhhhhhh yaa Allah mau marah tapi gimana ya.
gegara itu sih tadi kesel, sampe ke ubun-ubun. kek.. ga dikasih space buat napas dikit gitu loh. ibaratnya baru kelar lari 500 meter, baru mau duduk eh disuruh lari lagi. mo mampus.
akhirnya aku nelfon ayah, trus untuk pertama kalinya nangis ke dia sambil ceritain gimana parahnya ujian aku semester ini. fyi, aku paling anti nangis di depan keluarga, apalagi orang tua. tapi aku setakut itu buat ngecewain mereka, terutama ayah yang lebih academic oriented dari ibu. bukan, bukan karna bakal dihukum atau apa. cukup sekali aja dulu aku pernah bikin ibu kecewa, sekali itu aja. ga mau lagi.
di telfon, aku ngasih tau keadaan aku supaya nanti aku ga ngeliat raut wajah kecewanya kalo emang ip aku nurun. dan ayah ngasih aku kalimat penenang, yang lucunya pernah aku ucapin ke temen aku. dia bilang, “bukan Tya sendiri kan yang susah? ya udah, gapapa. toh ga mungkin nilai kita selalu bagus. banyak kan ya yang sebelumnya udah dapet C? kalo Tya dapet C, perbaiki aja nanti. gapapa. kalo soal pendapat orang, kan subjektif. mereka ga tau realitanya seperti apa. udah, ga usah terlalu dipikirin hasilnya.”
dan setelah itu baru lega. baru bener-bener pol pasrah aja sama Allah, mau gimana nanti hasilnya.
intinya, dari januari sampe mei, ga ada bulan yang aku lalui tanpa nangis sesenggukan, mijet-mijet pelipis sama punggung, dan begadang ga tidur sampe pagi. semester 4 emang se-fucked up itu.
tinggal nunggu aja sampe kapan badanku tahan. paling nanti ngedrop.
dah ah, mau kuat-kuatin makan nasi dulu daripada kelaparan menjelang siang gegara pas sahur kenyang minum doang.
ciao.
----
04:16, 08.05.19 // tya yang nulis ini sejak jam 2 tadi.
2 notes · View notes
alliedmarketresearchs · 4 years ago
Text
New Research Report On Consumer Robotics Market Is Growing In Huge Demand In 2020-2027
Robotics is that branch of technology that deals with the construction, design, operation, and application of robots. Robotic products are based on artificial intelligence. The consumer robotics market is evolving rapidly from the past two decades. Robotics covers a wide range of products. It includes children’s toys, homecare systems, and smart ‘humanoid’ robots that provide social and personal engagement. The key elements that are used in consumer robotics include processors, actuators, software, sensors, cameras, power supplies, displays, manipulators, communication technologies, microcontrollers, and mobile platforms. The global consumer robotic market has a very highly competitive arena due to the participation of well-diversified regional and international players in the industry. Moreover, this growing market is attracting new and innovative players in the market which further increases the competitive rivalry. The usage of robotics in consumer tasks gives time and cost effective techniques to complete task and provides comfort related benefits.  Moreover, this technology also offers reduction in efforts required and increases peace of mind and similar other consumer benefits. Consumer robotics is preferred because of its features which includes small size, durability, and low cost. Due to their small size, they take very small spaces to fit in and save spaces for several other components in design.
Market scope and structure analysis :
Ø  Market size available for years
2020–2027
Ø  Base Year Considered
2019
Ø  Forecast Period
2021–2027
Ø  Forecast Unit
Value (USD)
Ø  Segment Covered
Application, Type, and Region 
Ø  Regions Covered
North America (U.S. and Canada), Europe (Germany, UK, France, Italy, Spain and Rest of Europe), Asia-Pacific (China, Japan, India, Australia, Malaysia, Thailand, Indonesia, and Rest of Asia-Pacific), LAMEA (Middle East, Brazil, Mexico, and Rest of LAMEA)
Ø  Companies Covered
Ecovas , Xiaomi, iLife, dyson, miele, Neato robotics, Cecotec, Ubteach, CANBOT0, Yujin ROBOT, iRobot Corporation, Jibo Inc., 3D Robotics Inc., Honda Motor Co. Ltd., Bossa Nova Robotics, DJI.
  Get a sample of the report @ https://www.alliedmarketresearch.com/request-sample/6815
COVID-19 Scenario Analysis:
In this scenario, consumers concentrate only on necessary products (food, sanitizers, and medicine). They avoid entertainment and lifestyle products, so due to this the demand for consumer robotics products is hampered. It also impacts the supply side negatively.
Top Impacting Factors: Market Scenario Analysis, Trends, Drivers, and Impact Analysis
In the developed countries, the growing need for convenience and rising consumer’s spending power is the major factor that contributes in the surge in demand for consumer robotic products. The rise in paying capacity of people in developing countries due to the increase in their disposable income also drives the growth of the consumer robotic market. People are now willing to pay more than before for the products, which increases their comforts. 
Rise in the security threat among consumers, high speed innovation, and increase in number of players drive the growth of this market.
However, the performance issues with the robotics products are holding back the growth of the market. High speed innovation may expand the consumer robot market growth during the forecast period.
New product launches to flourish the market
The advancements in artificial intelligence, navigation systems, ubiquity of the internet, and rise of hand-held computing devices are the major current trends supporting the development of the consumer robotics market on a very large scale. The surge in the usage of hand-held computing devices like smart phones, tablets, and smart watches has made development of robotic devices for office and consumer applications easy. For instance, in January 2019 iRobot company declared its entry into robotic lawn mower market with Tera. Terra has features of ground mapping and advanced navigation technology. And in October 2018, iRobot decided to collaborate with Google (US) to integrate Google’s Artificial Intelligence assistant into various robotic vacuums produced by the company. This technology enables consumers to control the robots through their voice commands.
Challenges in the market
The challenge that hinders the growth of the robotic market is the humanoid physical appearance of the robots that does not appeal the end users. Moreover, the investment required for innovation in this field is huge. Restrain from the demand side to replace the manual labor with advanced robotics technology in the traditional field, is one of the factors that hampers the growth of the market. The high price of robots is another major element which restricts consumers to adopt robotic products for their day to day activities. The consumer robotics market is still in the nascent stage and it is difficult to cut down the cost of manufacturing robotic technology. 
 Request a discount on the report @ https://www.alliedmarketresearch.com/purchase-enquiry/6815
Key Segments Covered:
Ø  Application
Entertainment
Security     & Surveillance
Education
·         Others
Ø  Type
Autonomous
·         Semi-autonomous
 Key Benefits of the Report:
·         This study presents the analytical depiction of the global consumer robotics industry along with the current trends and future estimations to determine the imminent investment pockets.
·         The report presents information related to key drivers, restraints, and opportunities along with detailed analysis of the global consumer robotics market share.
·         The current market is quantitatively analyzed from 2020 to 2027 to highlight the global consumer robotics market growth scenario.
·         Porter’s five forces analysis illustrates the potency of buyers & suppliers in the market. 
·         The report provides a detailed global consumer robotics market analysis based on competitive intensity and how the competition will take shape in the coming years. 
Questions Answered in the Consumer Robotics Market Research Report:
·         What are the leading market players active in the consumer robotics market?
·         What the current trends will influence the market in the next few years?
·         What are the driving factors, restraints, and opportunities in the market?
·         What future projections would help in taking further strategic steps?
To know more about the report @ https://www.alliedmarketresearch.com/consumer-robotics-market-A06450
About Allied Market Research:
Allied Market Research (AMR) is a full-service market research and business-consulting wing of Allied Analytics LLP based in Portland, Oregon. Allied Market Research provides global enterprises as well as medium and small businesses with unmatched quality of "Market Research Reports" and "Business Intelligence Solutions." AMR has a targeted view to provide business insights and consulting services to assist its clients to make strategic business decisions and achieve sustainable growth in their respective market domains. AMR offers its services across 11 industry verticals including Life Sciences, Consumer Goods, Materials & Chemicals, Construction & Manufacturing, Food & Beverages, Energy & Power, Semiconductor & Electronics, Automotive & Transportation, ICT & Media, Aerospace & Defense, and BFSI.
We are in professional corporate relations with various companies and this helps us in digging out market data that helps us generate accurate research data tables and confirms utmost accuracy in our market forecasting. Each and every data presented in the reports published by us is extracted through primary interviews with top officials from leading companies of domain concerned. Our secondary data procurement methodology includes deep online and offline research and discussion with knowledgeable professionals and analysts in the industry.
Contact Us:
David Correa
5933 NE Win Sivers Drive
#205, Portland, OR 97220
United States
USA/Canada (Toll Free): 1-800-792-5285, 1-503-894-6022, 1-503-446-1141
UK: +44-845-528-1300
Hong Kong: +852-301-84916
India (Pune): +91-20-66346060
Fax: +1(855)550-5975
Web: https://www.alliedmarketresearch.com
Follow Us on LinkedIn: https://www.linkedin.com/company/allied-market-research
0 notes
logicmojo52 · 4 years ago
Text
Top 10 Data Analytics Tools
The developing interest and significance of information examination in the market have produced numerous openings around the world. It turns out to be marginally difficult to waitlist the top information investigation apparatuses as the Data structure and algorithm are more well known, easy to understand and execution situated than the paid variant. There are many open source apparatuses which doesn't need a lot/any coding and figures out how to convey preferable outcomes over paid renditions for example - R programming in information mining and Tableau public, Python in information representation. The following is the rundown of top 10 of information investigation apparatuses, both open source and paid rendition, in view of their fame, learning and execution.
1. R Programming
R is the main investigation apparatus in the business and generally utilized for measurements and information demonstrating. It can without much of a stretch control your information and present in an unexpected way. It has surpassed SAS from multiple points of view like limit of information, execution and result. R incorporates and runs on a wide assortment of stages viz - UNIX, Windows and MacOS. It has 11,556 bundles and permits you to peruse the bundles by classifications. R likewise gives apparatuses to consequently introduce all bundles according to client necessity, which can likewise be very much gathered with Big information.
2. Scene Public:
Scene Public is a free programming that interfaces any information source be it corporate Data Warehouse, Microsoft Excel or electronic information, and makes information representations, maps, dashboards and so on with ongoing updates introducing on web. They can likewise be shared through web-based media or with the customer. It permits the admittance to download the document in various arrangements. On the off chance that you need to see the force of scene, we should have excellent information source. Scene's Big Data abilities makes them significant and one can dissect and envision information better than some other information perception programming on the lookout.
Tumblr media
3. Python
Python is an article arranged scripting language which is not difficult to peruse, compose, keep up and is a free open-source apparatus. It was created by Guido van Rossum in late 1980's which upholds both useful and organized programming strategies.
Python is not difficult to learn as it is fundamentally the same as JavaScript, Ruby, and PHP. Likewise, Python has excellent AI libraries viz. Scikitlearn, Theano, Tensorflow and Keras. Another significant component of Python is that it very well may be amassed on any stage like SQL worker, a MongoDB information base or JSON. Python can likewise deal with text information quite well.
4. SAS
Sas is a programming climate and language for information control and an innovator in examination, created by the SAS Institute in 1966 and further created in 1980's and 1990's. SAS is effectively open, managable and can dissect information from any sources. SAS presented an enormous arrangement of items in 2011 for client insight and various SAS modules for web, online media and showcasing examination that is generally utilized for profiling clients and possibilities. It can likewise foresee their practices, oversee, and advance interchanges.
 5. Apache Spark
The University of California, Berkeley's AMP Lab, created Apache in 2009. Apache Spark is a quick enormous scope information preparing motor and executes applications in Hadoop groups multiple times quicker in memory and multiple times quicker on circle. Flash is based on information science and its idea makes information science easy. Sparkle is additionally well known for information pipelines and AI models improvement.
Sparkle additionally incorporates a library - MLlib, that gives a reformist arrangement of machine calculations for dreary information science strategies like Classification, Regression, Collaborative Filtering, Clustering, and so on
6. Dominate
Dominate is an essential, mainstream and generally utilized scientific apparatus practically on the whole businesses. Regardless of whether you are a specialist in Sas, R or Tableau, you will in any case have to utilize Excel. Dominate becomes significant when there is a prerequisite of examination on the customer's interior information. It breaks down the intricate errand that sums up the information with a review of rotate tables that helps in separating the information according to customer necessity. Dominate has the development business examination alternative which helps in demonstrating capacities which have prebuilt choices like programmed relationship recognition, a making of DAX measures and time gathering.
7. RapidMiner:
RapidMiner is an incredible coordinated information science stage created by the very organization that performs prescient examination and other progressed investigation like information mining, text examination, AI and visual examination with no programming. RapidMiner can join with any information source types, including Access, Excel, Microsoft SQL, Tera information, Oracle, Sybase, IBM DB2, Ingres, MySQL, IBM SPSS, Dbase and so on The apparatus is exceptionally amazing that can create examination dependent on genuine information change settings, for example you can handle the organizations and informational indexes for prescient investigation.
8. KNIME
KNIME Developed in January 2004 by a group of programmers at University of Konstanz. KNIME is driving open source, detailing, and incorporated examination apparatuses that permit you to break down and model the information through visual programming, it coordinates different segments for information mining and AI by means of its measured information pipelining idea.
9. QlikView
QlikView has numerous one of a kind highlights like protected innovation and has in-memory information handling, which executes the outcome extremely quick to the end clients and stores the information in the actual report. Information relationship in QlikView is consequently kept up and can be packed to practically 10% from its unique size. Information relationship is envisioned utilizing colors - a particular tone is given to related information and another tone for non-related information.
10. Splunk:
Splunk is an apparatus that examines and search the machine-created information. Splunk pulls all content based log information and gives a basic method to look through it, a client can pull on the whole sort of information, and play out such a fascinating measurable investigation on it, and present it in various organizations.
For More Details, Visit Us:
data structures in java
system design interview questions
Google interview questions
cracking the coding interview python
0 notes
siva3155 · 6 years ago
Text
100+ TOP HADOOP Interview Questions and Answers
HADOOP Interview Questions for freshers experienced :-
1. What is Hadoop framework? Hadoop is a open source framework which is written in java by apche software foundation. This framework is used to wirite software application which requires to process vast amount of data (It could handle multi tera bytes of data). It works in-paralle on large clusters which could have 1000 of computers (Nodes) on the clusters. It also process data very reliably and fault-tolerant manner. See the below image how does it looks. 2. On What concept the Hadoop framework works? It works on MapReduce, and it is devised by the Google. 3. What is MapReduce ? Map reduce is an algorithm or concept to process Huge amount of data in a faster way. As per its name you can divide it Map and Reduce. The main MapReduce job usually splits the input data-set into independent chunks. (Big data sets in the multiple small datasets) MapTask: will process these chunks in a completely parallel manner (One node can process one or more chunks). The framework sorts the outputs of the maps. Reduce Task : And the above output will be the input for the reducetasks, produces the final result. Your business logic would be written in the MappedTask and ReducedTask. Typically both the input and the output of the job are stored in a file-system (Not database). The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. 4. What is compute and Storage nodes? Compute Node: This is the computer or machine where your actual business logic will be executed. Storage Node: This is the computer or machine where your file system reside to store the processing data. In most of the cases compute node and storage node would be the same machine. 5. How does master slave architecture in the Hadoop? The MapReduce framework consists of a single master JobTracker and multiple slaves, each cluster-node will have one TaskskTracker. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them and re-executing the failed tasks. The slaves execute the tasks as directed by the master. 6. How does an Hadoop application look like or their basic components? Minimally an Hadoop application would have following components. Input location of data Output location of processed data. A map task. A reduced task. Job configuration The Hadoop job client then submits the job (jar/executable etc.) and configuration to the JobTracker which then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and monitoring them, providing status and diagnostic information to the job-client. 7. Explain how input and output data format of the Hadoop framework? The MapReduce framework operates exclusively on pairs, that is, the framework views the input to the job as a set of pairs and produces a set of pairs as the output of the job, conceivably of different types. See the flow mentioned below (input) -> map -> -> combine/sorting -> -> reduce -> (output) 8. What are the restriction to the key and value class ? The key and value classes have to be serialized by the framework. To make them serializable Hadoop provides a Writable interface. As you know from the java itself that the key of the Map should be comparable, hence the key has to implement one more interface WritableComparable. 9. What is commodity hardware? Commodity hardware is an low-cost system identified by the less-availability and low-quality. The commodity hardware for comprises of RAM as it performs an number of services that require to RAM for the execution. One doesn’t require high-end hardware of configuration or super computers to run of Hadoop, it can be run on any of commodity hardware. 10. Which interface needs to be implemented to create Mapper and Reducer for the Hadoop? org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer
Tumblr media
HADOOP Interview Questions 11. What Mapper does? Maps are the individual tasks that transform i nput records into intermediate records. The transformed intermediate records do not need to be of the same type as the input records. A given input pair may map to zero or many output pairs. 12. What is the InputSplit in map reduce software? An InputSplit is a logical representation of a unit (A chunk) of input work for a map task; e.g., a filename and a byte range within that file to process or a row set in a text file. 13. What is the InputFormat ? The InputFormat is responsible for enumerate (itemise) the InputSplits, and producing a RecordReader which will turn those logical work units into actual physical input records. 14. Where do you specify the Mapper Implementation? Generally mapper implementation is specified in the Job itself. 15. How Mapper is instantiated in a running job? The Mapper itself is instantiated in the running job, and will be passed a MapContext object which it can use to configure itself. 16. Which are the methods in the Mapper interface? Ans : The Mapper contains the run() method, which call its own setup() method only once, it also call a map() method for each input and finally calls it cleanup() method. All above methods you can override in your code. 17. What happens if you don’t override the Mapper methods and keep them as it is? If you do not override any methods (leaving even map as-is), it will act as the identity function, emitting each input record as a separate output. 18. What is the use of Context object? The Context object allows the mapper to interact with the rest of the Hadoop system. It Includes configuration data for the job, as well as interfaces which allow it to emit output. 19. How can you add the arbitrary key-value pairs in your mapper? You can set arbitrary (key, value) pairs of configuration data in your Job, e.g. with Job.getConfiguration().set("myKey", "myVal"), and then retrieve this data in your mapper with Context.getConfiguration().get("myKey"). This kind of functionality is typically done in the Mapper's setup() method. 20. How does Mapper’s run() method works? The Mapper.run() method then calls map(KeyInType, ValInType, Context) for each key/value pair in the InputSplit for that task 21. Which object can be used to get the progress of a particular job ? Context 22. What is next step after Mapper or MapTask? Ans : The output of the Mapper are sorted and Partitions will be created for the output. Number of partition depends on the number of reducer. 23.Name the most common InputFormats defined in Hadoop? Which one is default ? Following 3 are most common InputFormats defined in Hadoop TextInputFormat KeyValueInputFormat SequenceFileInputFormat TextInputFormat is the hadoop default. 24.What is the difference between TextInputFormat and KeyValueInputFormat class? TextInputFormat: It reads lines of text files and provides the offset of the line as key to the Mapper and actual line as Value to the mapper KeyValueInputFormat: Reads text file and parses lines into key, val pairs. Everything up to the first tab character is sent as key to the Mapper and the remainder of the line is sent as value to the mapper. 25.What is InputSplit in Hadoop? When a hadoop job is run, it splits input files into chunks and assign each split to a mapper to process. This is called Input Split 26. What are Edge Nodes in Hadoop? Edge nodes are gateway nodes in the Hadoop which act as the interface between the Hadoop cluster and external network.They run client applications and cluster administration tools in the Hadoop and are used as staging areas for the data transfers to the Hadoop cluster. Enterprise-class storage capabilities (like 900GB SAS Drives with Raid HDD Controllers) is required for the Edge Nodes,and asingle edge node for usually suffices for multiple of Hadoop clusters. 27.What is the purpose of RecordReader in Hadoop? The InputSplit has defined a slice of work, but does not describe how to access it. The RecordReader class actually loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper. The RecordReader instance is defined by the InputFormat 28.After the Map phase finishes, the hadoop framework does "Partitioning, Shuffle and sort". Explain what happens in this phase? Partitioning Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same Shuffle After the first map tasks have completed, the nodes may still be performing several more map tasks each. But they also begin exchanging the intermediate outputs from the map tasks to where they are required by the reducers. This process of moving map outputs to the reducers is known as shuffling. Sort Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer 29.What is a Combiner? The Combiner is a "mini-reduce" process which operates only on data generated by a mapper. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers. 30.What is job tracker? Job Tracker is the service within Hadoop that runs Map Reduce jobs on the cluster 31.What are some typical functions of Job Tracker? The following are some typical tasks of Job Tracker Accepts jobs from clients It talks to the NameNode to determine the location of the data It locates TaskTracker nodes with available slots at or near the data It submits the work to the chosen Task Tracker nodes and monitors progress of each task by receiving heartbeat signals from Task tracker 32.What is task tracker? Task Tracker is a node in the cluster that accepts tasks like Map, Reduce and Shuffle operations - from a JobTracker 33.Whats the relationship between Jobs and Tasks in Hadoop? One job is broken down into one or many tasks in Hadoop. 34.Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What will hadoop do ? It will restart the task again on some other task tracker and only if the task fails more than 4 (default setting and can be changed) times will it kill the job 35.Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this ? Speculative Execution 36.How does speculative execution works in Hadoop  ? Job tracker makes different task trackers process same input. When tasks complete, they announce this fact to the Job Tracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the Task Trackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Using command line in Linux, how will you see all jobs running in the hadoop cluster kill a job hadoop job -list hadoop job -kill jobid 37.What is Hadoop Streaming  ? Streaming is a generic API that allows programs written in virtually any language to be used as Hadoop Mapper and Reducer implementations 38.What is the characteristic of streaming API that makes it flexible run map reduce jobs in languages like perl, ruby, awk etc.  ? Hadoop Streaming allows to use arbitrary programs for the Mapper and Reducer phases of a Map Reduce job by having both Mappers and Reducers receive their input on stdin and emit output (key, value) pairs on stdout. 39.Whats is Distributed Cache in Hadoop ? Distributed Cache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications during execution of the job. The framework will copy the necessary files to the slave node before any tasks for the job are executed on that node. 40.What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it  ? This is because distributed cache is much faster. It copies the file to all trackers at the start of the job. Now if the task tracker runs 10 or 100 mappers or reducer, it will use the same copy of distributed cache. On the other hand, if you put code in file to read it from HDFS in the MR job then every mapper will try to access it from HDFS hence if a task tracker run 100 map jobs then it will try to read this file 100 times from HDFS. Also HDFS is not very efficient when used like this. 41.What mechanism does Hadoop framework provides to synchronize changes made in Distribution Cache during runtime of the application  ? This is a trick questions. There is no such mechanism. Distributed Cache by design is read only during the time of Job execution 42.Have you ever used Counters in Hadoop. Give us an example scenario ? Anybody who claims to have worked on a Hadoop project is expected to use counters 43.Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job  ? Yes, The input format class provides methods to add multiple directories as input to a Hadoop job 44.Is it possible to have Hadoop job output in multiple directories. If yes then how ? Yes, by using Multiple Outputs class 45.What will a hadoop job do if you try to run it with an output directory that is already present? Will it overwrite it warn you and continue throw an exception and exit The hadoop job will throw an exception and exit. 46.How can you set an arbitrary number of mappers to be created for a job in Hadoop ? This is a trick question. You cannot set it 47.How can you set an arbitary number of reducers to be created for a job in Hadoop ? You can either do it progamatically by using method setNumReduceTasksin the JobConfclass or set it up as a configuration setting 48.How will you write a custom partitioner for a Hadoop job ? To have hadoop use a custom partitioner you will have to do minimum the following three Create a new class that extends Partitioner class Override method getPartition In the wrapper that runs the Map Reducer, either add the custom partitioner to the job programtically using method setPartitionerClass or add the custom partitioner to the job as a config file (if your wrapper reads from config file or oozie) 49.How did you debug your Hadoop code ? There can be several ways of doing this but most common ways are By using counters The web interface provided by Hadoop framework 50.Did you ever built a production process in Hadoop ? If yes then what was the process when your hadoop job fails due to any reason? Its an open ended question but most candidates, if they have written a production job, should talk about some type of alert mechanisn like email is sent or there monitoring system sends an alert. Since Hadoop works on unstructured data, its very important to have a good alerting system for errors since unexpected data can very easily break the job. 51.Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it ? This is an open ended question but a candidate who claims to be an intermediate developer and has worked on large data set (10-20GB min) should have run into this problem. There can be many ways to handle this problem but most common way is to alter your algorithm and break down the job into more map reduce phase or use a combiner if possible. 52.What is HDFS? HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications 53.What does the statement "HDFS is block structured file system" means? It means that in HDFS individual files are broken into blocks of a fixed size. These blocks are stored across a cluster of one or more machines with data storage capacity 54.What does the term "Replication factor" mean? Replication factor is the number of times a file needs to be replicated in HDFS 55.What is the default replication factor in HDFS? 3 56.What is the default block size of an HDFS block? 64Mb 57.What is the benefit of having such big block size (when compared to block size of linux file system like ext)? It allows HDFS to decrease the amount of metadata storage required per file (the list of blocks per file will be smaller as the size of individual blocks increases). Furthermore, it allows for fast streaming reads of data, by keeping large amounts of data sequentially laid out on the disk 58.Why is it recommended to have few very large files instead of a lot of small files in HDFS? This is because the Name node contains the meta data of each and every file in HDFS and more files means more metadata and since namenode loads all the metadata in memory for speed hence having a lot of files may make the metadata information big enough to exceed the size of the memory on the Name node True/false question. What is the lowest granularity at which you can apply replication factor in HDFS - You can choose replication factor per directory - You can choose replication factor per file in a directory - You can choose replication factor per block of a file - True - True - False 59.What is a datanode in HDFS? Individual machines in the HDFS cluster that hold blocks of data are called datanodes 60.What is a Namenode in HDFS? The Namenode stores all the metadata for the file system 61.What alternate way does HDFS provides to recover data in case a Namenode, without backup, fails and cannot be recovered? There is no way. If Namenode dies and there is no backup then there is no way to recover data 62.Describe how a HDFS client will read a file in HDFS, like will it talk to data node or namenode ... how will data flow etc? To open a file, a client contacts the Name Node and retrieves a list of locations for the blocks that comprise the file. These locations identify the Data Nodes which hold each block. Clients then read file data directly from the Data Node servers, possibly in parallel. The Name Node is not directly involved in this bulk data transfer, keeping its overhead to a minimum. Using linux command line. how will you - List the the number of files in a HDFS directory - Create a directory in HDFS - Copy file from your local directory to HDFS hadoop fs -ls hadoop fs -mkdir hadoop fs -put localfile hdfsfile 63.Advantages of Hadoop? Bringing compute and storage together on commodity hardware: The result is blazing speed at low cost. Price performance: The Hadoop big data technology provides significant cost savings (think a factor of approximately 10) with significant performance improvements (again, think factor of 10). Your mileage may vary. If the existing technology can be so dramatically trounced, it is worth examining if Hadoop can complement or replace aspects of your current architecture. Linear Scalability: Every parallel technology makes claims about scale up.Hadoop has genuine scalability since the latest release is expanding the limit on the number of nodes to beyond 4,000. Full access to unstructured data: A highly scalable data store with a good parallel programming model, MapReduce, has been a challenge for the industry for some time. Hadoop programming model does not solve all problems, but it is a strong solution for many tasks. 64.Definition of Big data? According to Gartner, Big data can be defined as high volume, velocity and variety information requiring innovative and cost effective forms of information processing for enhanced decision making. 65.How Big data differs from database ? Datasets which are beyond the ability of the database to store, analyze and manage can be defined as Big. The technology extracts required information from large volume whereas the storage area is limited for a database. 67.Pig for Hadoop - Give some points? Pig is Data-flow oriented language for analyzing large data sets. It is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig language layer currently consists of a textual language called Pig Latin, which has the following key properties: Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain. Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency. Extensibility. Users can create their own functions to do special-purpose processing. Features of Pig: data transformation functions datatypes include sets, associative arrays, tuples high-level language for marshalling data developed at yahoo! 68.Hive for Hadoop - Give some points? Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Keypoints: • SQL-based data warehousing application – features similar to Pig – more strictly SQL-type • Supports SELECT, JOIN, GROUP BY,etc • Analyzing very large data sets – log processing, text mining, document indexing • Developed at Facebook 69.Map Reduce in Hadoop? Map reduce : it is a framework for processing in parallel across huge datasets usning large no. of computers referred to cluster, it involves two processes namely Map and reduce. Map Process: In this process input is taken by the master node,which divides it into smaller tasks and distribute them to the workers nodes. The workers nodes process these sub tasks and pass them back to the master node. Reduce Process : In this the master node combines all the answers provided by the worker nodes to get the results of the original task. The main advantage of Map reduce is that the map and reduce are performed in distributed mode. Since each operation is independent, so each map can be performed in parallel and hence reducing the net computing time. 70.What is a heartbeat in HDFS? A heartbeat is a signal indicating that it is alive. A data node sends heartbeat to Name node and task tracker will send its heart beat to job tracker. If the Name node or job tracker does not receive heart beat then they will decide that there is some problem in data node or task tracker is unable to perform the assigned task. 71.What is a metadata? Metadata is the information about the data stored in data nodes such as location of the file, size of the file and so on. 72.Is Namenode also a commodity? No. Namenode can never be a commodity hardware because the entire HDFS rely on it. It is the single point of failure in HDFS. Namenode has to be a high-availability machine. 73.Can Hadoop be compared to NOSQL database like Cassandra? Though NOSQL is the closet technology that can be compared to Hadoop, it has its own pros and cons. There is no DFS in NOSQL. Hadoop is not a database. It’s a filesystem (HDFS) and distributed programming framework (MapReduce). 74.What is Key value pair in HDFS? Key value pair is the intermediate data generated by maps and sent to reduces for generating the final output. 75.What is the difference between MapReduce engine and HDFS cluster? HDFS cluster is the name given to the whole configuration of master and slaves where data is stored. Map Reduce Engine is the programming module which is used to retrieve and analyze data. 76.What is a rack? Rack is a storage area with all the datanodes put together. These datanodes can be physically located at different places. Rack is a physical collection of datanodes which are stored at a single location. There can be multiple racks in a single location. 77.How indexing is done in HDFS? Hadoop has its own way of indexing. Depending upon the block size, once the data is stored, HDFS will keep on storing the last part of the data which will say where the next part of the data will be. In fact, this is the base of HDFS. 78.History of Hadoop? Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open source web search engine, itself a part of the Lucene project. The name Hadoop is not an acronym; it’s a made-up name. The project’s creator, Doug Cutting, explains how the name came about: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Subprojects and “contrib” modules in Hadoop also tend to have names that are unrelated to their function, often with an elephant or other animal theme (“Pig,” for example). Smaller components are given more descriptive (and therefore more mundane) names. This is a good principle, as it means you can generally work out what something does from its name. For example, the jobtracker keeps track of MapReduce jobs. 79.What is meant by Volunteer Computing? Volunteer computing projects work by breaking the problem they are trying to solve into chunks called work units, which are sent to computers around the world to be analyzed. SETI@home is the most well-known of many volunteer computing projects. 80.How Hadoop differs from SETI (Volunteer computing)? Although SETI (Search for Extra-Terrestrial Intelligence) may be superficially similar to MapReduce (breaking a problem into independent pieces to be worked on in parallel), there are some significant differences. The SETI@home problem is very CPU-intensive, which makes it suitable for running on hundreds of thousands of computers across the world. Since the time to transfer the work unit is dwarfed by the time to run the computation on it. Volunteers are donating CPU cycles, not bandwidth. MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data center with very high aggregate bandwidth interconnects. By contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet with highly variable connection speeds and no data locality. 81.Compare RDBMS and MapReduce? Data size: RDBMS - Gigabytes MapReduce - Petabytes Access: RDBMS - Interactive and batch MapReduce - Batch Updates: RDBMS - Read and write many times MapReduce - Write once, read many times Structure: RDBMS - Static schema MapReduce - Dynamic schema Integrity: RDBMS - High MapReduce - Low Scaling: RDBMS - Nonlinear MapReduce - Linear 82.What is HBase? A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and point queries (random reads). 82.What is ZooKeeper? A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed applications. 83.What is Chukwa? A distributed data collection and analysis system. Chukwa runs collectors that store data in HDFS, and it uses MapReduce to produce reports. (At the time of this writing, Chukwa had only recently graduated from a “contrib” module in Core to its own subproject.) 84.What is Avro? A data serialization system for efficient, cross-language RPC, and persistent data storage. (At the time of this writing, Avro had been created only as a new subproject, and no other Hadoop subprojects were using it yet.) 85.core subproject in Hadoop - What is it? A set of components and interfaces for distributed filesystems and general I/O (serialization, Java RPC, persistent data structures). 86.What are all Hadoop subprojects? Pig, Chukwa, Hive, HBase, MapReduce, HDFS, ZooKeeper, Core, Avro 87.What is a split? Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the userdefined map function for each record in the split. Having many splits means the time taken to process each split is small compared to the time to process the whole input. So if we are processing the splits in parallel, the processing is better load-balanced. On the other hand, if splits are too small, then the overhead of managing the splits and of map task creation begins to dominate the total job execution time. For most jobs, a good split size tends to be the size of a HDFS block, 64 MB by default, although this can be changed for the cluster 88.Map tasks write their output to local disk, not to HDFS. Why is this? Map output is intermediate output: it’s processed by reduce tasks to produce the final output, and once the job is complete the map output can be thrown away. So storing it in HDFS, with replication, would be overkill. If the node running the map task fails before the map output has been consumed by the reduce task, then Hadoop will automatically rerun the map task on another node to recreate the map output. 89.MapReduce data flow with a single reduce task- Explain? The input to a single reduce task is normally the output from all mappers. The sorted map outputs have to be transferred across the network to the node where the reduce task is running, where they are merged and then passed to the user-defined reduce function. The output of the reduce is normally stored in HDFS for reliability. For each HDFS block of the reduce output, the first replica is stored on the local node, with other replicas being stored on off-rack nodes. 90.MapReduce data flow with multiple reduce tasks- Explain? When there are multiple reducers, the map tasks partition their output, each creating one partition for each reduce task. There can be many keys (and their associated values) in each partition, but the records for every key are all in a single partition. The partitioning can be controlled by a user-defined partitioning function, but normally the default partitioner. 91.MapReduce data flow with no reduce tasks- Explain? It’s also possible to have zero reduce tasks. This can be appropriate when you don’t need the shuffle since the processing can be carried out entirely in parallel. In this case, the only off-node data transfer is used when the map tasks write to HDFS 92.What is a block in HDFS? Filesystems deal with data in blocks, which are an integral multiple of the disk block size. Filesystem blocks are typically a few kilobytes in size, while disk blocks are normally 512 bytes. 93.Why is a Block in HDFS So Large? HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks. By making a block large enough, the time to transfer the data from the disk can be made to be significantly larger than the time to seek to the start of the block. Thus the time to transfer a large file made of multiple blocks operates at the disk transfer rate. 94.File permissions in HDFS? HDFS has a permissions model for files and directories. There are three types of permission: the read permission (r), the write permission (w) and the execute permission (x). The read permission is required to read files or list the contents of a directory. The write permission is required to write a file, or for a directory, to create or delete files or directories in it. The execute permission is ignored for a file since you can’t execute a file on HDFS. 95.What is Thrift in HDFS? The Thrift API in the “thriftfs” contrib module exposes Hadoop filesystems as an Apache Thrift service, making it easy for any language that has Thrift bindings to interact with a Hadoop filesystem, such as HDFS. To use the Thrift API, run a Java server that exposes the Thrift service, and acts as a proxy to the Hadoop filesystem. Your application accesses the Thrift service, which is typically running on the same machine as your application. 96.How Hadoop interacts with C? Hadoop provides a C library called libhdfs that mirrors the Java FileSystem interface. It works using the Java Native Interface (JNI) to call a Java filesystem client. The C API is very similar to the Java one, but it typically lags the Java one, so newer features may not be supported. You can find the generated documentation for the C API in the libhdfs/docs/api directory of the Hadoop distribution. 97.What is FUSE in HDFS Hadoop? Filesystem in Userspace (FUSE) allows filesystems that are implemented in user space to be integrated as a Unix filesystem. Hadoop’s Fuse-DFS contrib module allows any Hadoop filesystem (but typically HDFS) to be mounted as a standard filesystem. You can then use Unix utilities (such as ls and cat) to interact with the filesystem. Fuse-DFS is implemented in C using libhdfs as the interface to HDFS. Documentation for compiling and running Fuse-DFS is located in the src/contrib/fuse-dfs directory of the Hadoop distribution. 98.Explain WebDAV in Hadoop? WebDAV is a set of extensions to HTTP to support editing and updating files. WebDAV shares can be mounted as filesystems on most operating systems, so by exposing HDFS (or other Hadoop filesystems) over WebDAV, it’s possible to access HDFS as a standard filesystem. 99.If no custom partitioner is defined in the hadoop then how is data partitioned before its sent to the reducer? The default partitioner computes a hash value for the key and assigns the partition based on this result 100.What is Sqoop in Hadoop? It is a tool design to transfer the data between Relational database management system(RDBMS) and Hadoop HDFS. Thus, we can sqoop the data from RDBMS like mySql or Oracle into HDFS of Hadoop as well as exporting data from HDFS file to RDBMS. Sqoop will read the table row-by-row and the import process is performed in Parallel. Thus, the output may be in multiple files. Example: sqoop INTO "directory"; (SELECT * FROM database.table WHERE condition;) HADOOP Questions and Answers pdf Download Read the full article
0 notes
reportsmonitor · 7 years ago
Text
Kinesiology Tape Market Growth, Analysis by Recent Trends, Development and Forecast To 2023
Kinesiology Tape Market Research Report 2023 
The report has been compiled through extensive primary research (through interviews, surveys, and observations of seasoned analysts) and secondary research (which entails reputable paid sources, trade journals, and industry body databases). The report also features a complete qualitative and quantitative assessment by analyzing data gathered from industry analysts and market participants across key points in the industry’s value chain.
REQUEST A FREE SAMPLE @  https://www.reportsmonitor.com/request_sample/67591 
The Kinesiology Tape market Report provides a detailed analysis of the Kinesiology Tape industry. It provides an analysis of the past 5 years and a future forecast till the year 2023.It also studies the future market trends in the global market. It studies the market by the various parameters such as sales, volumes, and the revenues. It also gives us an insight in the Kinesiology Tape industry and is also making strategic decisions. It is also prepared with a view to understand the current market trends, opportunities and global analysis in the market.
For your understanding and application wise distribution, the consumption figures are also given. With the important data of supply and consumption the gap between both the parameters is explained in detail. It also provides a detailed overview of the competitive landscape and also analysis the gross market landscape. It also studies the sales, production revenue market price also the import and export value and the trending section of the Kinesiology Tape market report.
This Write up presents in detail analysis of Kinesiology Tape Market especially market drivers, challenges, vital trends, standardization, deployment models, opportunities, future roadmap, manufacturer’s case studies, value chain, organization profiles, Sales Price and Sales Revenue, Sales Market Comparison and strategies.
The research covers the current market size of the Global Kinesiology Tape Market. The Kinesiology Tape market can be split based on Major Players, product types and major applications.
Top manufacturers, with production, price, revenue (value) and market share for each manufacturer; the top players including such as KT TAPE, PerformTex, Mueller, Kinesio Taping, SpiderTech, StrengthTape, LP Support, Towatek Korea, Kindmax, RockTape, Atex Medical, Healixon, K-active, TERA Medical, Medsport, Nitto Denko, DL Medical&Health, Major Medical, Raphael, Socko, GSPMED, Godlisha.
The in-depth information by segments of Kinesiology Tape market helps monitor future profitability & to make critical decisions for growth. The Kinesiology Tape report will the intensive investigation of the key business players to get a handle on their business routes in which, yearly income, organization profile and their commitment to the Global Kinesiology Tape piece of the overall industry.
CHECK DISCOUNT FOR THIS REPORT @ https://www.reportsmonitor.com/check_discount/67591 
The Kinesiology Tape Market, presents critical information and factual data about the Global Kinesiology Tape Market providing an overall statistical study of this market on the basis of market drivers, market limitations, and its future prospects. The widespread Global Kinesiology Tape opportunities and trends are also taken into consideration in Kinesiology Tape industry.
Kinesiology Tape Market On the basis of product, this report displays the production, revenue, price, market share and growth rate of each type, primarily split into Roll Form, Pre-cut Shape.
Kinesiology Tape market On the basis on the end users/applications, this report focuses on the status and outlook for major applications/end users, sales volume, market share and growth rate for each application, including Achilles tendonitis, Plantar fasciitis, Jumpers knee (PFS), ACL/MCL issues, Rotator cuff, Groin and hamstring pulls.
All aspects of the Kinesiology Tape industry report are quantitatively as well as qualitatively assessed to study the Global as well as regional market comparatively. The basic information such as the definition, prevalent chain and the government regulations pertaining to the Kinesiology Tape market are also discussed in the report.
Geographically, this report studies the top producers and consumers, focuses on product capacity, production, value, consumption, market share and growth opportunity in these key regions, covering
USA, Europe Union, Japan, China, India, South East Asia.
Some points from TOC:
Global Kinesiology Tape Market Size (Sales Volume) Comparison by Type Global Kinesiology Tape Market Size (Sales Volume) Market Share by Type (Product Category) in 2018 Global Kinesiology Tape Market Size (Value) Comparison by Region Global Kinesiology Tape Market Competition by Players/Suppliers Global Kinesiology Tape Sales (Volume) and Revenue (Value) by Type (Product Category)…Continued
Browse full Report Description and TOC @ https://www.reportsmonitor.com/report/67591/Kinesiology-Tape-Market 
The report answers several questions about the Kinesiology Tape market. These questions include:
What will be the market size of Kinesiology Tape market in 2023? What will be the Kinesiology Tape growth rate in 2023? Which key factors drive the Kinesiology Tape market? Who are the key market players for Kinesiology Tape? Which strategies are used by top players in the Kinesiology Tape market? What are the key market trends in Kinesiology Tape?
TO BUY THIS REPORT @ https://www.reportsmonitor.com/buyNow/67591 
Thanks for reading this article; you can also get individual chapter wise section or region wise report version like Asia, United States, Europe.
About Us
Reports Monitor is a market research and consulting company that provides syndicated research reports, customized research reports, and consulting services. To help clients make informed business decisions, we offer market intelligence studies ensuring relevant and fact-based research across a range of industries including Healthcare, Technology, Chemicals, Materials, and Energy. With an intrinsic understanding of many business environments, Reports Monitor provides strategic objective insights.
We periodically update our market research studies to ensure our clients get the most recent, relevant, and valuable information. Reports Monitor has a strong base of analysts and consultants from assorted areas of expertise. Our industry experience and ability to zero-in on the crux of any challenge gives you and your organization the ability to secure a competitive advantage.
Contact Us
Jay Matthews
Direct: +1 513 549-5911 (U.S.)
+44 203 318 2846 (U.K.)
Website: www.reportsmonitor.com
from WordPress https://ift.tt/2vxTIhO via IFTTT
0 notes
logicmojo52 · 5 years ago
Text
Top 10 Data Analytics Tools
The developing interest and significance of information examination in the market have produced numerous openings around the world. It turns out to be marginally hard to waitlist the top information examination devices as the open source instruments are more famous, easy to use and execution arranged than the paid adaptation. There are many open source apparatuses which doesn't need a lot/any coding and figures out how to convey preferred outcomes over paid variants for example - R programming in information mining and Tableau public, Python in information perception. The following is the rundown of top 10 of information investigation apparatuses, both open source and paid rendition, in light of their notoriety, learning and execution.
 1. R Programming
 R is the main investigation device in the business and broadly utilized for measurements and information demonstrating. It can without much of a stretch control your information and present in an unexpected way. It has surpassed SAS from numerous points of view like limit of information, execution and result. R orders and runs on a wide assortment of stages viz - UNIX, Windows and MacOS. It has 11,556 bundles and permits you to peruse the bundles by classes. R likewise gives instruments to consequently introduce all bundles according to client necessity, which can likewise be very much amassed with Big information.
 2. Scene Public:
 Scene Public is a free programming that associates any information source be it corporate Data Warehouse, Microsoft Excel or electronic information, and makes information representations, maps, dashboards and so on with continuous updates introducing on web. They can likewise be shared through online media or with the customer. It permits the admittance to download the document in various arrangements. In the event that you need to see the intensity of scene, at that point we should have generally excellent information source. Scene's Big Data capacities makes them significant and one can investigate and envision information better than some other information perception programming on the lookout.
 3. Python
 Python is an article situated scripting language which is anything but difficult to peruse, compose, keep up and is a free open source apparatus. It was created by Guido van Rossum in late 1980's which underpins both utilitarian and organized programming techniques.
 Python is anything but difficult to learn as it is fundamentally the same as JavaScript, Ruby, and PHP. Additionally, Data structure and algorithm in python, Python has generally excellent AI libraries viz. Scikitlearn, Theano, Tensorflow and Keras. Another significant component of Python is that it tends to be collected on any stage like SQL worker, a MongoDB information base or JSON. Python can likewise deal with text information well overall.
 4. SAS
 Sas is a programming climate and language for information control and a pioneer in investigation, created by the SAS Institute in 1966 and further created in 1980's and 1990's. SAS is effectively open, managable and can dissect information from any sources. SAS presented a huge arrangement of items in 2011 for client insight and various SAS modules for web, web-based media and promoting examination that is broadly utilized for profiling clients and possibilities. It can likewise anticipate their practices, oversee, and improve interchanges.
 5. Apache Spark
 The University of California, Berkeley's AMP Lab, created Apache in 2009. Apache Spark is a quick huge scope information preparing motor and executes applications in Hadoop groups multiple times quicker in memory and multiple times quicker on circle. Sparkle is based on information science and its idea makes information science easy. Sparkle is additionally mainstream for information pipelines and AI models improvement.
 Flash additionally incorporates a library - MLlib, that gives a reformist arrangement of machine calculations for dull information science methods like Classification, Regression, Collaborative Filtering, Clustering, and so forth
 6. Dominate
 Dominate is an essential, well known and broadly utilized logical device practically in all businesses. Regardless of whether you are a specialist in Sas, R or Tableau, you will even now have to utilize Excel. Dominate becomes significant when there is a prerequisite of investigation on the customer's inside information. It investigates the unpredictable assignment that sums up the information with a see of rotate tables that helps in separating the information according to customer necessity. Dominate has the development business examination alternative which helps in displaying capacities which have prebuilt choices like programmed relationship recognition, a production of DAX measures and time gathering.
 7. RapidMiner:
 RapidMiner is an amazing coordinated information science stage created by the very organization that performs prescient examination and other progressed investigation like information mining, text investigation, AI and visual examination with no programming. RapidMiner can fuse with any information source types, including Access, Excel, Microsoft SQL, Tera information, Oracle, Sybase, IBM DB2, Ingres, MySQL, IBM SPSS, Dbase and so on The device is exceptionally incredible that can create examination dependent on genuine information change settings, for example you can control the arrangements and informational collections for prescient investigation.
 8. KNIME
 KNIME Developed in January 2004 by a group of programming engineers at University of Konstanz. KNIME is driving open source, detailing, and coordinated investigation instruments that permit you to examine and show the information through visual programming, it incorporates different parts for information mining and AI by means of its measured information pipelining idea.
 9. QlikView
 QlikView has numerous remarkable highlights like protected innovation and has in-memory information handling, which executes the outcome quick to the end clients and stores the information in the report itself. Information relationship in QlikView is naturally kept up and can be compacted to practically 10% from its unique size. Information relationship is pictured utilizing colors - a particular tone is given to related information and another shading for non-related information.
 10. Splunk:
 Splunk is an instrument that investigates and search the machine-produced information. Splunk pulls all content based log information and gives a basic method to look through it, a client can pull in all sort of information, and play out such an intriguing factual investigation on it, and present it in various arrangements.
For More details, Visit us: - amazon interview questions
Google interview questions
system design interview questions
python data structures
data structures in java
0 notes
siva3155 · 6 years ago
Text
100+ TOP HADOOP Interview Questions and Answers
HADOOP Interview Questions for freshers experienced :-
1. What is Hadoop framework? Hadoop is a open source framework which is written in java by apche software foundation. This framework is used to wirite software application which requires to process vast amount of data (It could handle multi tera bytes of data). It works in-paralle on large clusters which could have 1000 of computers (Nodes) on the clusters. It also process data very reliably and fault-tolerant manner. See the below image how does it looks. 2. On What concept the Hadoop framework works? It works on MapReduce, and it is devised by the Google. 3. What is MapReduce ? Map reduce is an algorithm or concept to process Huge amount of data in a faster way. As per its name you can divide it Map and Reduce. The main MapReduce job usually splits the input data-set into independent chunks. (Big data sets in the multiple small datasets) MapTask: will process these chunks in a completely parallel manner (One node can process one or more chunks). The framework sorts the outputs of the maps. Reduce Task : And the above output will be the input for the reducetasks, produces the final result. Your business logic would be written in the MappedTask and ReducedTask. Typically both the input and the output of the job are stored in a file-system (Not database). The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. 4. What is compute and Storage nodes? Compute Node: This is the computer or machine where your actual business logic will be executed. Storage Node: This is the computer or machine where your file system reside to store the processing data. In most of the cases compute node and storage node would be the same machine. 5. How does master slave architecture in the Hadoop? The MapReduce framework consists of a single master JobTracker and multiple slaves, each cluster-node will have one TaskskTracker. The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them and re-executing the failed tasks. The slaves execute the tasks as directed by the master. 6. How does an Hadoop application look like or their basic components? Minimally an Hadoop application would have following components. Input location of data Output location of processed data. A map task. A reduced task. Job configuration The Hadoop job client then submits the job (jar/executable etc.) and configuration to the JobTracker which then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and monitoring them, providing status and diagnostic information to the job-client. 7. Explain how input and output data format of the Hadoop framework? The MapReduce framework operates exclusively on pairs, that is, the framework views the input to the job as a set of pairs and produces a set of pairs as the output of the job, conceivably of different types. See the flow mentioned below (input) -> map -> -> combine/sorting -> -> reduce -> (output) 8. What are the restriction to the key and value class ? The key and value classes have to be serialized by the framework. To make them serializable Hadoop provides a Writable interface. As you know from the java itself that the key of the Map should be comparable, hence the key has to implement one more interface WritableComparable. 9. What is commodity hardware? Commodity hardware is an low-cost system identified by the less-availability and low-quality. The commodity hardware for comprises of RAM as it performs an number of services that require to RAM for the execution. One doesn’t require high-end hardware of configuration or super computers to run of Hadoop, it can be run on any of commodity hardware. 10. Which interface needs to be implemented to create Mapper and Reducer for the Hadoop? org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer
Tumblr media
HADOOP Interview Questions 11. What Mapper does? Maps are the individual tasks that transform i nput records into intermediate records. The transformed intermediate records do not need to be of the same type as the input records. A given input pair may map to zero or many output pairs. 12. What is the InputSplit in map reduce software? An InputSplit is a logical representation of a unit (A chunk) of input work for a map task; e.g., a filename and a byte range within that file to process or a row set in a text file. 13. What is the InputFormat ? The InputFormat is responsible for enumerate (itemise) the InputSplits, and producing a RecordReader which will turn those logical work units into actual physical input records. 14. Where do you specify the Mapper Implementation? Generally mapper implementation is specified in the Job itself. 15. How Mapper is instantiated in a running job? The Mapper itself is instantiated in the running job, and will be passed a MapContext object which it can use to configure itself. 16. Which are the methods in the Mapper interface? Ans : The Mapper contains the run() method, which call its own setup() method only once, it also call a map() method for each input and finally calls it cleanup() method. All above methods you can override in your code. 17. What happens if you don’t override the Mapper methods and keep them as it is? If you do not override any methods (leaving even map as-is), it will act as the identity function, emitting each input record as a separate output. 18. What is the use of Context object? The Context object allows the mapper to interact with the rest of the Hadoop system. It Includes configuration data for the job, as well as interfaces which allow it to emit output. 19. How can you add the arbitrary key-value pairs in your mapper? You can set arbitrary (key, value) pairs of configuration data in your Job, e.g. with Job.getConfiguration().set("myKey", "myVal"), and then retrieve this data in your mapper with Context.getConfiguration().get("myKey"). This kind of functionality is typically done in the Mapper's setup() method. 20. How does Mapper’s run() method works? The Mapper.run() method then calls map(KeyInType, ValInType, Context) for each key/value pair in the InputSplit for that task 21. Which object can be used to get the progress of a particular job ? Context 22. What is next step after Mapper or MapTask? Ans : The output of the Mapper are sorted and Partitions will be created for the output. Number of partition depends on the number of reducer. 23.Name the most common InputFormats defined in Hadoop? Which one is default ? Following 3 are most common InputFormats defined in Hadoop TextInputFormat KeyValueInputFormat SequenceFileInputFormat TextInputFormat is the hadoop default. 24.What is the difference between TextInputFormat and KeyValueInputFormat class? TextInputFormat: It reads lines of text files and provides the offset of the line as key to the Mapper and actual line as Value to the mapper KeyValueInputFormat: Reads text file and parses lines into key, val pairs. Everything up to the first tab character is sent as key to the Mapper and the remainder of the line is sent as value to the mapper. 25.What is InputSplit in Hadoop? When a hadoop job is run, it splits input files into chunks and assign each split to a mapper to process. This is called Input Split 26. What are Edge Nodes in Hadoop? Edge nodes are gateway nodes in the Hadoop which act as the interface between the Hadoop cluster and external network.They run client applications and cluster administration tools in the Hadoop and are used as staging areas for the data transfers to the Hadoop cluster. Enterprise-class storage capabilities (like 900GB SAS Drives with Raid HDD Controllers) is required for the Edge Nodes,and asingle edge node for usually suffices for multiple of Hadoop clusters. 27.What is the purpose of RecordReader in Hadoop? The InputSplit has defined a slice of work, but does not describe how to access it. The RecordReader class actually loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper. The RecordReader instance is defined by the InputFormat 28.After the Map phase finishes, the hadoop framework does "Partitioning, Shuffle and sort". Explain what happens in this phase? Partitioning Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same Shuffle After the first map tasks have completed, the nodes may still be performing several more map tasks each. But they also begin exchanging the intermediate outputs from the map tasks to where they are required by the reducers. This process of moving map outputs to the reducers is known as shuffling. Sort Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer 29.What is a Combiner? The Combiner is a "mini-reduce" process which operates only on data generated by a mapper. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers. 30.What is job tracker? Job Tracker is the service within Hadoop that runs Map Reduce jobs on the cluster 31.What are some typical functions of Job Tracker? The following are some typical tasks of Job Tracker Accepts jobs from clients It talks to the NameNode to determine the location of the data It locates TaskTracker nodes with available slots at or near the data It submits the work to the chosen Task Tracker nodes and monitors progress of each task by receiving heartbeat signals from Task tracker 32.What is task tracker? Task Tracker is a node in the cluster that accepts tasks like Map, Reduce and Shuffle operations - from a JobTracker 33.Whats the relationship between Jobs and Tasks in Hadoop? One job is broken down into one or many tasks in Hadoop. 34.Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What will hadoop do ? It will restart the task again on some other task tracker and only if the task fails more than 4 (default setting and can be changed) times will it kill the job 35.Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this ? Speculative Execution 36.How does speculative execution works in Hadoop  ? Job tracker makes different task trackers process same input. When tasks complete, they announce this fact to the Job Tracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the Task Trackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Using command line in Linux, how will you see all jobs running in the hadoop cluster kill a job hadoop job -list hadoop job -kill jobid 37.What is Hadoop Streaming  ? Streaming is a generic API that allows programs written in virtually any language to be used as Hadoop Mapper and Reducer implementations 38.What is the characteristic of streaming API that makes it flexible run map reduce jobs in languages like perl, ruby, awk etc.  ? Hadoop Streaming allows to use arbitrary programs for the Mapper and Reducer phases of a Map Reduce job by having both Mappers and Reducers receive their input on stdin and emit output (key, value) pairs on stdout. 39.Whats is Distributed Cache in Hadoop ? Distributed Cache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications during execution of the job. The framework will copy the necessary files to the slave node before any tasks for the job are executed on that node. 40.What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it  ? This is because distributed cache is much faster. It copies the file to all trackers at the start of the job. Now if the task tracker runs 10 or 100 mappers or reducer, it will use the same copy of distributed cache. On the other hand, if you put code in file to read it from HDFS in the MR job then every mapper will try to access it from HDFS hence if a task tracker run 100 map jobs then it will try to read this file 100 times from HDFS. Also HDFS is not very efficient when used like this. 41.What mechanism does Hadoop framework provides to synchronize changes made in Distribution Cache during runtime of the application  ? This is a trick questions. There is no such mechanism. Distributed Cache by design is read only during the time of Job execution 42.Have you ever used Counters in Hadoop. Give us an example scenario ? Anybody who claims to have worked on a Hadoop project is expected to use counters 43.Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job  ? Yes, The input format class provides methods to add multiple directories as input to a Hadoop job 44.Is it possible to have Hadoop job output in multiple directories. If yes then how ? Yes, by using Multiple Outputs class 45.What will a hadoop job do if you try to run it with an output directory that is already present? Will it overwrite it warn you and continue throw an exception and exit The hadoop job will throw an exception and exit. 46.How can you set an arbitrary number of mappers to be created for a job in Hadoop ? This is a trick question. You cannot set it 47.How can you set an arbitary number of reducers to be created for a job in Hadoop ? You can either do it progamatically by using method setNumReduceTasksin the JobConfclass or set it up as a configuration setting 48.How will you write a custom partitioner for a Hadoop job ? To have hadoop use a custom partitioner you will have to do minimum the following three Create a new class that extends Partitioner class Override method getPartition In the wrapper that runs the Map Reducer, either add the custom partitioner to the job programtically using method setPartitionerClass or add the custom partitioner to the job as a config file (if your wrapper reads from config file or oozie) 49.How did you debug your Hadoop code ? There can be several ways of doing this but most common ways are By using counters The web interface provided by Hadoop framework 50.Did you ever built a production process in Hadoop ? If yes then what was the process when your hadoop job fails due to any reason? Its an open ended question but most candidates, if they have written a production job, should talk about some type of alert mechanisn like email is sent or there monitoring system sends an alert. Since Hadoop works on unstructured data, its very important to have a good alerting system for errors since unexpected data can very easily break the job. 51.Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it ? This is an open ended question but a candidate who claims to be an intermediate developer and has worked on large data set (10-20GB min) should have run into this problem. There can be many ways to handle this problem but most common way is to alter your algorithm and break down the job into more map reduce phase or use a combiner if possible. 52.What is HDFS? HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications 53.What does the statement "HDFS is block structured file system" means? It means that in HDFS individual files are broken into blocks of a fixed size. These blocks are stored across a cluster of one or more machines with data storage capacity 54.What does the term "Replication factor" mean? Replication factor is the number of times a file needs to be replicated in HDFS 55.What is the default replication factor in HDFS? 3 56.What is the default block size of an HDFS block? 64Mb 57.What is the benefit of having such big block size (when compared to block size of linux file system like ext)? It allows HDFS to decrease the amount of metadata storage required per file (the list of blocks per file will be smaller as the size of individual blocks increases). Furthermore, it allows for fast streaming reads of data, by keeping large amounts of data sequentially laid out on the disk 58.Why is it recommended to have few very large files instead of a lot of small files in HDFS? This is because the Name node contains the meta data of each and every file in HDFS and more files means more metadata and since namenode loads all the metadata in memory for speed hence having a lot of files may make the metadata information big enough to exceed the size of the memory on the Name node True/false question. What is the lowest granularity at which you can apply replication factor in HDFS - You can choose replication factor per directory - You can choose replication factor per file in a directory - You can choose replication factor per block of a file - True - True - False 59.What is a datanode in HDFS? Individual machines in the HDFS cluster that hold blocks of data are called datanodes 60.What is a Namenode in HDFS? The Namenode stores all the metadata for the file system 61.What alternate way does HDFS provides to recover data in case a Namenode, without backup, fails and cannot be recovered? There is no way. If Namenode dies and there is no backup then there is no way to recover data 62.Describe how a HDFS client will read a file in HDFS, like will it talk to data node or namenode ... how will data flow etc? To open a file, a client contacts the Name Node and retrieves a list of locations for the blocks that comprise the file. These locations identify the Data Nodes which hold each block. Clients then read file data directly from the Data Node servers, possibly in parallel. The Name Node is not directly involved in this bulk data transfer, keeping its overhead to a minimum. Using linux command line. how will you - List the the number of files in a HDFS directory - Create a directory in HDFS - Copy file from your local directory to HDFS hadoop fs -ls hadoop fs -mkdir hadoop fs -put localfile hdfsfile 63.Advantages of Hadoop? Bringing compute and storage together on commodity hardware: The result is blazing speed at low cost. Price performance: The Hadoop big data technology provides significant cost savings (think a factor of approximately 10) with significant performance improvements (again, think factor of 10). Your mileage may vary. If the existing technology can be so dramatically trounced, it is worth examining if Hadoop can complement or replace aspects of your current architecture. Linear Scalability: Every parallel technology makes claims about scale up.Hadoop has genuine scalability since the latest release is expanding the limit on the number of nodes to beyond 4,000. Full access to unstructured data: A highly scalable data store with a good parallel programming model, MapReduce, has been a challenge for the industry for some time. Hadoop programming model does not solve all problems, but it is a strong solution for many tasks. 64.Definition of Big data? According to Gartner, Big data can be defined as high volume, velocity and variety information requiring innovative and cost effective forms of information processing for enhanced decision making. 65.How Big data differs from database ? Datasets which are beyond the ability of the database to store, analyze and manage can be defined as Big. The technology extracts required information from large volume whereas the storage area is limited for a database. 67.Pig for Hadoop - Give some points? Pig is Data-flow oriented language for analyzing large data sets. It is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig language layer currently consists of a textual language called Pig Latin, which has the following key properties: Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain. Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency. Extensibility. Users can create their own functions to do special-purpose processing. Features of Pig: data transformation functions datatypes include sets, associative arrays, tuples high-level language for marshalling data developed at yahoo! 68.Hive for Hadoop - Give some points? Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Keypoints: • SQL-based data warehousing application – features similar to Pig – more strictly SQL-type • Supports SELECT, JOIN, GROUP BY,etc • Analyzing very large data sets – log processing, text mining, document indexing • Developed at Facebook 69.Map Reduce in Hadoop? Map reduce : it is a framework for processing in parallel across huge datasets usning large no. of computers referred to cluster, it involves two processes namely Map and reduce. Map Process: In this process input is taken by the master node,which divides it into smaller tasks and distribute them to the workers nodes. The workers nodes process these sub tasks and pass them back to the master node. Reduce Process : In this the master node combines all the answers provided by the worker nodes to get the results of the original task. The main advantage of Map reduce is that the map and reduce are performed in distributed mode. Since each operation is independent, so each map can be performed in parallel and hence reducing the net computing time. 70.What is a heartbeat in HDFS? A heartbeat is a signal indicating that it is alive. A data node sends heartbeat to Name node and task tracker will send its heart beat to job tracker. If the Name node or job tracker does not receive heart beat then they will decide that there is some problem in data node or task tracker is unable to perform the assigned task. 71.What is a metadata? Metadata is the information about the data stored in data nodes such as location of the file, size of the file and so on. 72.Is Namenode also a commodity? No. Namenode can never be a commodity hardware because the entire HDFS rely on it. It is the single point of failure in HDFS. Namenode has to be a high-availability machine. 73.Can Hadoop be compared to NOSQL database like Cassandra? Though NOSQL is the closet technology that can be compared to Hadoop, it has its own pros and cons. There is no DFS in NOSQL. Hadoop is not a database. It’s a filesystem (HDFS) and distributed programming framework (MapReduce). 74.What is Key value pair in HDFS? Key value pair is the intermediate data generated by maps and sent to reduces for generating the final output. 75.What is the difference between MapReduce engine and HDFS cluster? HDFS cluster is the name given to the whole configuration of master and slaves where data is stored. Map Reduce Engine is the programming module which is used to retrieve and analyze data. 76.What is a rack? Rack is a storage area with all the datanodes put together. These datanodes can be physically located at different places. Rack is a physical collection of datanodes which are stored at a single location. There can be multiple racks in a single location. 77.How indexing is done in HDFS? Hadoop has its own way of indexing. Depending upon the block size, once the data is stored, HDFS will keep on storing the last part of the data which will say where the next part of the data will be. In fact, this is the base of HDFS. 78.History of Hadoop? Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open source web search engine, itself a part of the Lucene project. The name Hadoop is not an acronym; it’s a made-up name. The project’s creator, Doug Cutting, explains how the name came about: The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Subprojects and “contrib” modules in Hadoop also tend to have names that are unrelated to their function, often with an elephant or other animal theme (“Pig,” for example). Smaller components are given more descriptive (and therefore more mundane) names. This is a good principle, as it means you can generally work out what something does from its name. For example, the jobtracker keeps track of MapReduce jobs. 79.What is meant by Volunteer Computing? Volunteer computing projects work by breaking the problem they are trying to solve into chunks called work units, which are sent to computers around the world to be analyzed. SETI@home is the most well-known of many volunteer computing projects. 80.How Hadoop differs from SETI (Volunteer computing)? Although SETI (Search for Extra-Terrestrial Intelligence) may be superficially similar to MapReduce (breaking a problem into independent pieces to be worked on in parallel), there are some significant differences. The SETI@home problem is very CPU-intensive, which makes it suitable for running on hundreds of thousands of computers across the world. Since the time to transfer the work unit is dwarfed by the time to run the computation on it. Volunteers are donating CPU cycles, not bandwidth. MapReduce is designed to run jobs that last minutes or hours on trusted, dedicated hardware running in a single data center with very high aggregate bandwidth interconnects. By contrast, SETI@home runs a perpetual computation on untrusted machines on the Internet with highly variable connection speeds and no data locality. 81.Compare RDBMS and MapReduce? Data size: RDBMS - Gigabytes MapReduce - Petabytes Access: RDBMS - Interactive and batch MapReduce - Batch Updates: RDBMS - Read and write many times MapReduce - Write once, read many times Structure: RDBMS - Static schema MapReduce - Dynamic schema Integrity: RDBMS - High MapReduce - Low Scaling: RDBMS - Nonlinear MapReduce - Linear 82.What is HBase? A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and point queries (random reads). 82.What is ZooKeeper? A distributed, highly available coordination service. ZooKeeper provides primitives such as distributed locks that can be used for building distributed applications. 83.What is Chukwa? A distributed data collection and analysis system. Chukwa runs collectors that store data in HDFS, and it uses MapReduce to produce reports. (At the time of this writing, Chukwa had only recently graduated from a “contrib” module in Core to its own subproject.) 84.What is Avro? A data serialization system for efficient, cross-language RPC, and persistent data storage. (At the time of this writing, Avro had been created only as a new subproject, and no other Hadoop subprojects were using it yet.) 85.core subproject in Hadoop - What is it? A set of components and interfaces for distributed filesystems and general I/O (serialization, Java RPC, persistent data structures). 86.What are all Hadoop subprojects? Pig, Chukwa, Hive, HBase, MapReduce, HDFS, ZooKeeper, Core, Avro 87.What is a split? Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the userdefined map function for each record in the split. Having many splits means the time taken to process each split is small compared to the time to process the whole input. So if we are processing the splits in parallel, the processing is better load-balanced. On the other hand, if splits are too small, then the overhead of managing the splits and of map task creation begins to dominate the total job execution time. For most jobs, a good split size tends to be the size of a HDFS block, 64 MB by default, although this can be changed for the cluster 88.Map tasks write their output to local disk, not to HDFS. Why is this? Map output is intermediate output: it’s processed by reduce tasks to produce the final output, and once the job is complete the map output can be thrown away. So storing it in HDFS, with replication, would be overkill. If the node running the map task fails before the map output has been consumed by the reduce task, then Hadoop will automatically rerun the map task on another node to recreate the map output. 89.MapReduce data flow with a single reduce task- Explain? The input to a single reduce task is normally the output from all mappers. The sorted map outputs have to be transferred across the network to the node where the reduce task is running, where they are merged and then passed to the user-defined reduce function. The output of the reduce is normally stored in HDFS for reliability. For each HDFS block of the reduce output, the first replica is stored on the local node, with other replicas being stored on off-rack nodes. 90.MapReduce data flow with multiple reduce tasks- Explain? When there are multiple reducers, the map tasks partition their output, each creating one partition for each reduce task. There can be many keys (and their associated values) in each partition, but the records for every key are all in a single partition. The partitioning can be controlled by a user-defined partitioning function, but normally the default partitioner. 91.MapReduce data flow with no reduce tasks- Explain? It’s also possible to have zero reduce tasks. This can be appropriate when you don’t need the shuffle since the processing can be carried out entirely in parallel. In this case, the only off-node data transfer is used when the map tasks write to HDFS 92.What is a block in HDFS? Filesystems deal with data in blocks, which are an integral multiple of the disk block size. Filesystem blocks are typically a few kilobytes in size, while disk blocks are normally 512 bytes. 93.Why is a Block in HDFS So Large? HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks. By making a block large enough, the time to transfer the data from the disk can be made to be significantly larger than the time to seek to the start of the block. Thus the time to transfer a large file made of multiple blocks operates at the disk transfer rate. 94.File permissions in HDFS? HDFS has a permissions model for files and directories. There are three types of permission: the read permission (r), the write permission (w) and the execute permission (x). The read permission is required to read files or list the contents of a directory. The write permission is required to write a file, or for a directory, to create or delete files or directories in it. The execute permission is ignored for a file since you can’t execute a file on HDFS. 95.What is Thrift in HDFS? The Thrift API in the “thriftfs” contrib module exposes Hadoop filesystems as an Apache Thrift service, making it easy for any language that has Thrift bindings to interact with a Hadoop filesystem, such as HDFS. To use the Thrift API, run a Java server that exposes the Thrift service, and acts as a proxy to the Hadoop filesystem. Your application accesses the Thrift service, which is typically running on the same machine as your application. 96.How Hadoop interacts with C? Hadoop provides a C library called libhdfs that mirrors the Java FileSystem interface. It works using the Java Native Interface (JNI) to call a Java filesystem client. The C API is very similar to the Java one, but it typically lags the Java one, so newer features may not be supported. You can find the generated documentation for the C API in the libhdfs/docs/api directory of the Hadoop distribution. 97.What is FUSE in HDFS Hadoop? Filesystem in Userspace (FUSE) allows filesystems that are implemented in user space to be integrated as a Unix filesystem. Hadoop’s Fuse-DFS contrib module allows any Hadoop filesystem (but typically HDFS) to be mounted as a standard filesystem. You can then use Unix utilities (such as ls and cat) to interact with the filesystem. Fuse-DFS is implemented in C using libhdfs as the interface to HDFS. Documentation for compiling and running Fuse-DFS is located in the src/contrib/fuse-dfs directory of the Hadoop distribution. 98.Explain WebDAV in Hadoop? WebDAV is a set of extensions to HTTP to support editing and updating files. WebDAV shares can be mounted as filesystems on most operating systems, so by exposing HDFS (or other Hadoop filesystems) over WebDAV, it’s possible to access HDFS as a standard filesystem. 99.If no custom partitioner is defined in the hadoop then how is data partitioned before its sent to the reducer? The default partitioner computes a hash value for the key and assigns the partition based on this result 100.What is Sqoop in Hadoop? It is a tool design to transfer the data between Relational database management system(RDBMS) and Hadoop HDFS. Thus, we can sqoop the data from RDBMS like mySql or Oracle into HDFS of Hadoop as well as exporting data from HDFS file to RDBMS. Sqoop will read the table row-by-row and the import process is performed in Parallel. Thus, the output may be in multiple files. Example: sqoop INTO "directory"; (SELECT * FROM database.table WHERE condition;) HADOOP Questions and Answers pdf Download Read the full article
0 notes