Tumgik
#longwrite
ai-news · 23 days
Link
The field of large language models (LLMs) has seen tremendous advancements, particularly in expanding their memory capacities to process increasingly extensive contexts. These models can now handle inputs with over 100,000 tokens, allowing them to p #AI #ML #Automation
0 notes
jcmarchi · 1 month
Text
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
New Post has been published on https://thedigitalinsider.com/longwriter-unleashing-10000-word-generation-from-long-context-llms/
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Current long-context large language models (LLMs) can process inputs up to 100,000 tokens, yet they struggle to generate outputs exceeding even a modest length of 2,000 words. Controlled experiments reveal that the model’s effective generation length is inherently limited by the examples seen during supervised fine-tuning (SFT). In other words, this output limitation stems from the scarcity of long-output examples in existing SFT datasets.
Recent advancements in long-context LLMs have led to the development of models with significantly expanded memory capacities, capable of processing history exceeding 100,000 tokens in length. However, despite their ability to handle extensive inputs, current long-context LLMs struggle to generate equally lengthy outputs.
To explore this limitation, LongWriter probes the maximum output length of state-of-the-art long-context models with multiple queries that require responses of varying lengths, such as “Write a 10,000-word article on the history of the Roman Empire.” The results show that all models consistently fail to produce outputs beyond 2,000 words in length. Meanwhile, analysis of user interaction logs reveals that over 1% of user prompts explicitly request outputs exceeding this limit, highlighting a pressing need in current research to overcome this limitation.
To address this, LongWriter introduces AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, LongWriter constructs LongWriter-6k, a dataset containing 6,000 SFT data samples with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, LongWriter successfully scales the output length of existing models to over 10,000 words while maintaining output quality.
LongWriter also develops LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. The 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models.
In this article, we will discuss the LongWriter framework, explore its architecture, and compare its performance against state-of-the-art long-context large language models. Let’s get started.
Recent advancements in long context large language models (LLMs) have led to the creation of models with significantly increased memory capacities, capable of processing histories that exceed 100,000 tokens. Despite this ability to handle extensive inputs, current long-context LLMs struggle to generate outputs of comparable length. To investigate this limitation, LongWriter examines the maximum output length of state-of-the-art long-context models through various queries that require different response lengths, such as “Write a 10,000-word article on the history of the Roman Empire.” Based on the findings, LongWriter observes that all models consistently fail to generate outputs longer than 2,000 words. Furthermore, an analysis of user interaction logs indicates that over 1% of user prompts specifically request outputs beyond this limit, highlighting an urgent need in current research to address this issue. 
LongWriter’s study reveals a key insight: the constraint on output length is primarily rooted in the characteristics of the Supervised Fine-Tuning (SFT) datasets. Specifically, LongWriter finds that a model’s maximum generation length is effectively capped by the upper limit of output lengths present in its SFT dataset, despite its exposure to much longer sequences during the pretraining phase. This finding explains the ubiquitous 2,000-word generation limit across current models, as existing SFT datasets rarely contain examples exceeding this length. Furthermore, as many datasets are distilled from state-of-the-art LLMs, they also inherit the output length limitation from their source models.
To address this limitation, LongWriter introduces AgentWrite, a novel agent-based pipeline designed to leverage off-the-shelf LLMs to automatically construct extended, coherent outputs. AgentWrite operates in two stages: First, it crafts a detailed writing plan outlining the structure and target word count for each paragraph based on the user’s input. Then, following this plan, it prompts the model to generate content for each paragraph in a sequential manner. LongWriter’s experiments validate that AgentWrite can produce high-quality and coherent outputs of up to 20,000 words.
Building upon the AgentWrite pipeline, LongWriter leverages GPT-4o to generate 6,000 long-output SFT data, named LongWriter-6k, and adds this data to train existing models. Notably, LongWriter-6k successfully unlocks the model’s ability to generate well-structured outputs exceeding 10,000 words in length. To rigorously evaluate the effectiveness of this approach, LongWriter develops the LongBench-Write benchmark, which contains a diverse set of user writing instructions, with output length specifications ranging from 0-500 words, 500-2,000 words, 2,000-4,000 words, and beyond 4,000 words. Evaluation on LongBench-Write shows that LongWriter’s 9B size model achieves state-of-the-art performance, even compared to larger proprietary models. LongWriter further constructs preference data and uses DPO to help the model better follow long writing instructions and generate higher quality written content, which has also been proven effective through experiments.
To summarize, LongWriter’s work makes the following novel contributions:
Analysis of Generation Length Limits: LongWriter identifies the primary factor limiting the output length of current long-context LLMs, which is the constraint on the output length in the SFT data.
AgentWrite: To overcome this limitation, LongWriter proposes AgentWrite, which uses a divide-and-conquer approach with off-the-shelf LLMs to automatically construct SFT data with ultra-long outputs. Using this method, LongWriter constructs the LongWriter-6k dataset.
Scaling Output Window Size of Current LLMs: LongWriter incorporates the LongWriter-6k dataset into its SFT data, successfully scaling the output window size of existing models to 10,000+ words without compromising output quality. LongWriter shows that DPO further enhances the model’s long-text writing capabilities.
AgentWrite: Automatic Data Construction
To utilize off-the-shelf LLMs for automatically generating SFT data with longer outputs, LongWriter designs AgentWrite, a divide-and-conquer style agent pipeline. AgentWrite first breaks down long writing tasks into multiple subtasks, with each subtask requiring the model to write only one paragraph. The model then executes these subtasks sequentially, and LongWriter concatenates the subtask outputs to obtain the final long output. Such an approach of breaking down a complex task into multiple subtasks using LLM agents has already been applied in various fields, such as problem-solving, software development, and model evaluation. LongWriter’s work is the first to explore integrating planning to enable models to complete complex long-form writing tasks. Each step of AgentWrite is introduced in detail below.
Step I: Plan
Inspired by the thought process of human writers, who typically start by making an overall plan for long writing tasks, LongWriter utilizes the planning capabilities of LLMs to output such a writing outline given a writing instruction. This plan includes the main content and word count requirements for each paragraph. The prompt used by LongWriter is as follows:
“I need you to help me break down the following long-form writing instruction into multiple subtasks. Each subtask will guide the writing of one paragraph in the essay and should include the main points and word count requirements for that paragraph. The writing instruction is as follows: User Instruction. Please break it down in the following format, with each subtask taking up one line:
Paragraph 1 – Main Point: [Describe the main point of the paragraph, in detail] – Word Count: [Word count requirement, e.g., 400 words] Paragraph 2 – Main Point: [Describe the main point of the paragraph, in detail] – Word Count: [Word count requirement, e.g. 1000 words].
Make sure that each subtask is clear and specific, and that all subtasks cover the entire content of the writing instruction. Do not split the subtasks too finely; each subtask’s paragraph should be no less than 200 words and no more than 1000 words. Do not output any other content.”
Step II: Write
After obtaining the writing plan from Step I, LongWriter calls the LLM serially to complete each subtask, generating the writing content section by section. To ensure the coherence of the output, when LongWriter calls the model to generate the n-th section, the previously generated n−1 sections are also input, allowing the model to continue writing the next section based on the existing writing history. Although this serial manner prevents parallel calls to the model to complete multiple subtasks simultaneously, and the input length becomes longer, LongWriter shows in validation that the overall coherence and quality of the writing obtained this way are far superior to the output generated in parallel. The prompt in use by LongWriter is:
“You are an excellent writing assistant. I will give you an original writing instruction and my planned writing steps. I will also provide you with the text I have already written. Please help me continue writing the next paragraph based on the writing instruction, writing steps, and the already written text.
Writing instruction: User Instruction Writing steps: The writing plan generated in Step I Already written text: Previous generated (n-1) paragraphs
Please integrate the original writing instruction, writing steps, and the already written text, and now continue writing The plan for the n-th paragraph, i.e., the n-th line in the writing plan.”
Validation
LongWriter tests the generation length and quality of the proposed AgentWrite method on two long-form writing datasets. The first one, LongWrite-Ruler, is used to measure exactly how long of an output the method can provide. The second, LongBench-Write, is mainly used to evaluate how well the model-generated content aligns with user instructions in terms of length and writing quality.
LongBench-Write: To evaluate the model’s performance on a more diverse range of long-form writing instructions, LongWriter collects 120 varied user writing prompts, with 60 in Chinese and 60 in English. To better assess whether the model’s output length meets user requirements, LongWriter ensures that all these instructions include explicit word count requirements. These instructions are divided into four subsets based on the word count requirements: 0-500 words, 500-2,000 words, 2,000-4,000 words, and over 4,000 words. Additionally, the instructions are categorized into seven types based on the output type: Literature and Creative Writing, Academic and Monograph, Popular Science, Functional Writing, News Report, Community Forum, and Education and Training.
During evaluation, LongWriter adopts two metrics: one for scoring the output length and another for scoring the output quality. The model’s output length is scored based on how close it is to the requirements specified in the instructions. For output quality, LongWriter uses the LLM-as-a-judge approach, selecting the state-of-the-art GPT-4o model to score the output across six dimensions: Relevance, Accuracy, Coherence, Clarity, Breadth and Depth, and Reading Experience. The final score is computed by averaging the length score and the quality score.
Validation results: LongWriter presents the output length measurement on LongWrite-Ruler and finds that AgentWrite successfully extends the output length of GPT-4o from a maximum of 2k words to approximately 20k words. LongWriter also assesses both the output quality and adherence to the required output length on LongBench-Write, showing that GPT-4o can successfully complete tasks with outputs under 2,000 words in length when evaluating AgentWrite’s performance.
Supervised Fine-Tuning
LongWriter conducts training based on two of the latest open-source models, namely GLM-4-9B and Llama-3.1-8B. Both of these are base models and support a context window of up to 128k tokens, making them naturally suitable for training on long outputs. To make the training more efficient, LongWriter adopts packing training with loss weighting. The training on the two models results in two models: LongWriter-9B (abbreviated for GLM-4-9B-LongWriter) and LongWriter-8B (abbreviated for Llama-3.1-8B-LongWriter).
At the same time, LongWriter notices that if the loss is averaged by sequence, i.e., taking the mean of each sequence’s average loss within a batch, the contribution of each target token to the loss in long output data would be significantly less than those with shorter outputs. In LongWriter’s experiments, it is also found that this leads to suboptimal model performance on tasks with long outputs. Therefore, LongWriter chooses a loss weighting strategy that averages the loss by token, where the loss is computed as the mean of losses across all target tokens within that batch.
All models are trained using a node with 8xH800 80G GPUs and DeepSpeed+ZeRO3+CPU offloading. LongWriter uses a batch size of 8, a learning rate of 1e-5, and a packing length of 32k. The models are trained for 4 epochs, which takes approximately 2,500-3,000 steps.
Alignment (DPO)
To further improve the model’s output quality and enhance its ability to follow length constraints in instructions, LongWriter performs direct preference optimization (DPO) on the supervised fine-tuned LongWriter-9B model. The DPO data comes from GLM-4’s chat DPO data (approximately 50k entries). Additionally, LongWriter constructs 4k pairs of data specifically targeting long-form writing instructions. For each writing instruction, LongWriter samples 4 outputs from LongWriter-9B and scores these outputs following a specific method. A length-following score is also combined as computed. The highest-scoring output is then selected as the positive sample, and one of the remaining three outputs is randomly chosen as the negative sample.
The resulting model, LongWriter-9B-DPO, is trained for 250 steps on the above data mixture. LongWriter follows a specific recipe for DPO training.
LongWriter: Experiments and Results
LongWriter evaluates 4 proprietary models and 5 open-source models on LongBench-Write, along with the trained LongWriter models. To the best of LongWriter’s knowledge, Suri-IORPO is the only prior model that is also aligned for long-form text generation. It is trained based on Mistral-7B-Instruct-v0.2 using LoRA. Consistent with the evaluation setup on LongWrite-Ruler, LongWriter sets the output temperature to 0.5 and configures the model’s generation max tokens parameter to the maximum allowed by its API call. For open-source models, it is set to 32,768.
Most previous models are unable to meet the length requirement of over 2,000 words, while LongWriter models consistently provide longer and richer responses to such prompts. 
Observing the output length score SlS_lSl​ for prompts in each required length range, LongWriter finds that previous models generally perform poorly (scoring below 70) on prompts in the [2k, 4k) range, with only Claude 3.5 Sonnet achieving a decent score. For prompts in the [4k, 20k) range, almost all previous models are completely unable to reach the target output length, even scoring 0 (meaning all output lengths are less than one-third of the required length). By adding training data from LongWriter-6k, LongWriter’s trained model can effectively reach the required output length while maintaining good quality, as suggested by the​ scores in the [2k, 20k) range and the scatter plots.
DPO effectively improves both the model’s output quality and its ability to follow length requirements in long generation. 
By comparing the scores of LongWriter-9B and LongWriter9B-DPO, we find that DPO significantly improves both Sl (+4%) and Sq (+3%) scores, and the improvement is consistent across all ranges. This shows that in long generation scenario, DPO still helps to improve the model’s output quality and can better align the model’s output length with 8 Preprint Figure 7: Cumulative average NLL loss of GLM4-9B and Llama-3.1-8B at different positions of LongWriter models’ outputs. Figure 8: LongWrite-Ruler test results of LongWriter models, showing their maximum generation lengths between 10k-20k words. the requested length. The latter conclusion has also been recently observed in Yuan et al. (2024) in shorter generations. We also manually annotate pairwise wins and losses for GPT-4o and three longwriter models on their outputs in LongBench-Write and visualize the results in Figure 9. We can see that humans prefer the DPO-trained model over LongWriter-9B in 58% of the cases. Moreover, despite having fewer parameters, LongWriter-9B-DPO achieves a tie with GPT-4o. 
The output length limit of the LongWriter models is extended to between 10k and 20k words, while more data with long outputs is required to support even longer outputs. 
Following the LongWrite-Ruler tes,we also present the LongWrite-Ruler test results of LongWriter models. The results suggest that their maximum generation lengths are between 10k-20k words. The lack of SFT data with longer outputs is likely the primary reason preventing the model from achieving longer output lengths. 
Final Thoughts
In this work, we have talked about LongWriter, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, identifies a 2,000-word generation limit for current LLMs and proposes increasing their output window size by adding long-output data during alignment. To automatically construct long-output data, LongWriter develops AgentWrite, an agent-based pipeline that uses off-the-shelf LLMs to create extended, coherent outputs. LongWriter successfully scales the output window size of current LLMs to over 10,000 words with the constructed LongWriter-6k. Extensive ablation studies on the training data demonstrate the effectiveness of this approach. For future work, LongWriter suggests the following three directions: 1. Expand the AgentWrite framework to construct data with longer outputs to further extend LLMs’ output window size. 2. Refine the AgentWrite framework to achieve higher quality long-output data. 3. Longer model outputs bring challenges to inference efficiency. Several methods have been proposed to improve inference efficiency. It is worth investigating how these methods can ensure improved model efficiency without compromising generation quality.
0 notes
cellmint · 4 years
Text
Star Gaze
Human Revenant x Wattson
Word count-1981
Warning/Tags-red tears, comfort, stars
Rating-G
Summary-
Human Revenant takes wattson for a night stroll under the stars.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Revenant woke up from another nightmare. The image of him coming out as a simulacrum turned his guts upsidedown. He felt worried and scared. He fixed his scarf and walked silent in the dark. He rubbed his forehead looking confused. He got himself a nice cold water and drank it he washed his face after a while trying to focus on something else.
He wonder what his favorite defensive legend was doing. He walked down the hall each step he made was silent than the next. For a trained assassin that woke up from cryostasis it was fine. He looked thinking for a while of how he got to the Apex games. He was put to sleep by Hammond Robotics next thing he knew was waking up by the sound of Wattson opening the chamber.
Then he joined the games somewhere or another? Thinking about it made him feel sick. He could have sworn he died by an explosion. Then again he had a few adjustments and improvements on his body and gear, prosthetics perhaps maybe hammond removed some of his limbs.... Nono that be sick he looked down at his prostetics..
He heard someone pacing around he moved his face forward reading the room. He saw Wattson bitting her thumb nervously pacing around. She seemed to be thinking about something. She looked to the side and saw Revenant.
"Rev!" She spoke.
"W...what's wrong?" He looked at her.
"I locked myself out of my room and another experiment end up  failing."she looked down.
Revenant looked back and side to side in case some legends would be walking around the hall but it was only them. He took a few steps seeing as she was crying and gently placed a hand on her shoulder.  He lifted his finger and removed a single tear away from her soft face. He gave her a small kiss on the forehead.
"Hm... Come on." He spoke in a soft tone.
She look a bit confused and decides to follow.
"We are sneaking out of the dropship. I want you to look at a surprise" He smile.
The drop ship was located on the ground getting repairs and supplies. But legends are supposed to stay at night within it for protocol.
"Isn't it bad to leave the dropship?" Wattson ask as they kept walking.
"Nobody is going to find out relax" He held her hand leading the way.
As they both walk into the hallway they could hear some workers within the drop ship talk. There sounds getting closer to them. Revenant stop and took Wattson into a door opening it and going inside without being detected. The room was pitch black super small it was a janitors closet. Revenant look feeling wattsons breath on his neck as the space was cramped. She stayed still focusing on the sound of Revenants chest. He was so silent it was as if he didn't have a heartbeat. Revenant focus on the workers keeping his body away from Wattsons as respect.
"Did you hear something?" A voice ask.
"Don't know don't care. I just want to go sleep man." One of them responded tired.
The footsteps got far away and revenant open the door and walked out with Wattson. She was a bit nervous following him. He was blushing a bit rubbing his neck.
They went downstairs and got out threw the back of the dropship. Revenant could feel Wattsons hand grab his. He look a bit surprised. Her blue eyes could put him in a trance the more he kept looking. Her hair was uncovered since she was just wearing the jacket. They walked into skyhook. The town was alive with people, merchants street food and music. The smell of food hit there noses making them hungry.
"Want something to eat?" Revenant ask.
"Qui." She smiled.
Revenant bought some street food, eat it while walking and looked around.
Wattson followed eating the piece of meat on a stick. The taste was so smokey and the bbq sauce just combined perfectly. It wasn't too bitter or sweet the sauce was wonderful. Almost as good as Mirage's pork chops. She smile looking at Revenant as he kept walking, she noticed as he sometimes start peeking looking at her. She made a small smile going to his shoulder walking.
"Like it?" He ask.
"Magnifique." She smiled.
"Street food always taste best at night. Want some more?" He ask.
"No thank you, what kind of meat is it?" She ask taking another bite of the meat enjoying the flavors
"Leviathan." He respond.
"Wow!" She looked surprised.
They reached a mountain and she looked surprised.
"Come on take the zipline." Revenant spoke.
She took it first and he followed. They reached the top and walk to the opposite side walking on the edge. Wattson look around following Revenant. It was pretty dark to see anything.
The cold air could be felt blowing behind them as it slowly died down, Revenant didn't worry about the cold he enjoyed it. Wattson had a bit of a shiver and look at him.
"Its about to start." He spoke.
He grab wattsons hand and walked to the edge and sat down. He eat some small pieces of bread from the street food. It felt so warm, it had a bit of sauce on it. They where located in snipers mountain so the view of the sky was crystal clear. 
He looked up at the sky closing his eyes. She look at him as the sky was pitch black and got uncovered by a black cloud. Suddenly the sky glowed a faint blue and a lot of stars could be seen. Some meteors and even an aurora showed up.
Wattson was surprised by the sudden change of mood. She smile watching the sky. Revenant opened his eyes and felt Wattson plce her head on his right shoulder, he placed his arm around her giving her a kiss.
He felt her soft skin brush against his lips. Felt almost like kissing a scented flower and cotton. She grab his hands and shiver a bit do to the cold.
"Merci." She spoke softly and quietly.
"That's why I'm here." He respond.
"Whenever you feel alone, sad, frustrated, angry, im here for you." He spoke.
"Aren't you a bit nervous if we get caught outside of the dropship?" She wonder.
"Of course not because I'll make sure we don't get caught." He spoke.
She look up at the sky. So many different worlds and universes out there. So many planets each of them with living beings and even with there own ecosystems. Wattson wonder if she would be able to see some of them. She felt hypnotized by the stars and the colors of it. Purple, blue, black all of the colors shining brightly. The stars gleam some shooting stars could be seeing catching her attention. She lied down her back against the short grass it felt soft instead of prickly. Revenant join her and watched the show.
It was as if the stars wanted to put a show for them to enjoy. Each of them danced around one and another. The colors where just so soft, warm and even cold. Revenant took a deep breath calming down. He saw as just for once he could be happy and take his mind off Hammond robotics. Being frozen for so many years no contact with the outside world, unable to eat or even live his life. He wonder what happen to the people that he once knew.
Wattson look at Revenant as he got lost in his own thought. His gaze wasn't even focusing on the stars it was as if he was trapped in his own mind with his own problems. He's always there for her but he never wants to talk about his problems to feel better. Is it because of humiliation or fear that she won't handle it?
"Rev?" She ask.
Revenant kept thinking what would happen if he lived his full life, without being involved with Hammond. He felt a tear run down his face. He felt a soft hand embrace him in a hug. He could hear her heartbeat pulling him out of his thoughts.
"Rev" she spoke with tears seeing him.
He had tears running down his face, he looked surprised being hugged  by Wattson s he realized what she was doing.
"Are you alright?" Her soft words made his ears focus on her.
"I was just thinking." He lift himself up his back no longer touched the grass and he hid his face from her.
"It's alright to cry. What you been threw must've not been easy for you. Its alright, want to talk?" She ask.
"N...no." He look away collecting his emotions.
"Are you sure." She wonder.
"I....I don't know." He look at her with tears.
She look surprised a bit and paid attention. He look at her and moved his eyes away to distract himself. She look surprise when Revenant started to cry red, a red streak ran down his eye to his cheeks. She quickly grab a napkin from her pocket and placed it om his face.
"Your bleeding?!" She spoke surprised.
"I'm fine." He spoke.
Holding her hand. He gave her a kiss and look at her.
"I worked for Hammond as a hired hitman. Sometimes I worked for the person that can pay me the highest. I had a pretty good life until, I was set up. In one of my missions I was sent to kill a a mafia boss he lead me to a Hammond facility he worked in and next thing you know the sound of an explosion gets you deaf and you get burned by it. A scientist took me and repaired me forcing me to live. Next thing I knew I was thrown inside of a claustrophobic chamber and frozen." He spoke.
Revenant looked at his prosthetic right arm as a small little ball came out of it and he played with it. He uses it a lot for the games to cancel peoples abilities.Wattson look at him, and held his hands stopping him.
"The important thing is your alive and you have new friends and you can start again." She smile.
"Start again. I lost my perfect life, they made me this fucked up thing. All the suffering, they removed my spine while I was awake. You clearly don't understand!" He spoke showing his teeth and fangs pissed.
He forced her to let go off his hands and stood up looking away mad.
"You want to take down hammond for what they did to you. Your feelings are completely understandable. But thinking I don't understand you are very wrong. Being pissed at me for something I don't have to do anything with Hammond." She respond.
Revenant looked at her about to talk back and felt her embrace him in a tight hug.
"Nono!" He growl.
He was so mad he just felt Wattson. He looked quiet and embraced her back in a hug.
"Trust me if I put myself in your position I would be very angry and the time to adjust would be overwhelming. Its alright to be mad. But only at the people that did this to you. Not the ones that want to see you better." Wattson spoke.
Revenant look at her and stayed quiet fixing his golden hair covering it within his scarf. She looked down piking up the napkin from the floor.
"Now lets enjoy the night watching the stars together." She sat down and look as the universe itself displayed for them to watch and enjoy every minute.
"Thank you." Revenant spoke looking at the sky.
21 notes · View notes
gek002 · 4 years
Text
My eyes fluttered closed as I had lost all hope. It wasn’t until I caught a glimpse of beauty that saved my life. It was a flash of yellow. To the touch it was electrifying. I knew it was alive, and I was too. There was still some beauty left in this world when I was surrounded my darkness. I grasped my hands around it as tears flooded down my face.
It was my mothers yellow rose
2 notes · View notes
deepblueruin · 7 years
Text
If Your Loved Ones Have Anxiety, This Is To Help You Understand What It's Like
When I’m having anxiety issues, it can be hard to explain to people what is happening to me. So I put together a Twitter thread to try and capture what it’s like. However, the format didn’t feel big enough to accommodate the details.
At the same time, I also suffer from the great disease of our generation: The TL;DR Disease. 
So I’m going to rely on lists and GIFs to make sure you stay with me till the very end.
Tumblr media
(GIF credit: Oh Shut Up Harry Tumblr)
Things to know about anxiety:
Anxiety expresses itself differently in different people.
One person can feel different magnitudes of anxiety within the same day.
Sometimes there is a direct and significant trigger, say an angry text from your boss, and sometimes there isn't.
Things to know about me in this context:
I am not a psychology, counseling, therapy or psychiatry professional.
I am, therefore, not qualified to create an all-knowing, comprehensive list.
However, I do have anxiety issues and can try and offer a glimpse of what it feels like.
What happens to me when anxiety strikes:
My heart starts to pound in my ears. 
My pulse races and my breathing involuntarily becomes faster and shallower, like I’m running up a flight of stairs.
This actually happens while I’m doing something mundane like going to bed or taking a shower or sitting in the passenger seat of a car.
It often feels chemical or physiological, like something’s seriously wrong with my body.
I feel a constriction in my chest. Like someone’s pulling and pulling the strings to a corset that’s already three sizes too small.
I often experience what I like to call a Rumbly In My Tumbly. This is a tame phrase for the way my stomach rumbles and churns with something sickening, thick and glutinous.
I've gone impeccably dressed to job interviews only to ask to use the bathroom minutes later because I just have to go 💩 .
Sometimes my whole body will clench.
At other times, it’s just one body part that I clench involuntarily. I don’t even realise it until I wake up the next morning with a sore butt or a pulled calf muscle.  
At crucial work moments, my throat will choke up while talking like there's a wooly sock down it.
I blank out. I forget what I was doing or about to do. 
I have left several exams feeling lightheaded and wobbly in the knees – like I might collapse any second.
And of course there is the embarrassing cherry on the cake: Uncontrollable crying. Like no matter what I do, the tears won’t stop. 
Tumblr media
(GIF Credit: Lysergic-Asshole Tumblr)
Other people’s anxiety is different.
My friends have felt nausea, vomiting, sweaty palms. They’ve felt their own bodies going cold or shutting down.
Some people report feeling like they’re having a heart attack. This is a full, high blown panic attack and needs medical attention.
Some patterns of thoughts in my head during high-anxiety situations (in random order):
I feel like I’ve completely lost control over the situation. (There’s nothing that can fix this/I don’t see any options.)
I blame myself for everything, even things outside my scope of influence. (It’s all my fault. Everything I come in contact with goes to shit)
I make gross generalizations (Nothing will ever change/ Maybe I’m just stupid/ I’m not fit to be a filmmaker)
I fail to see a way out of the situation (If I leave this job/relationship/project, it means I’m... lazy/callous/someone who gives up easily)
I obsessively focus on what others think of me (She must think I’m a bitch / He’s going to think I don’t have it in me to survive at a job like this)
I submit completely to the idea that something larger is at play. (My luck is just fucked/I’m being punished for that time I broke that person’s heart)
I imagine the worst outcome in every situation. (I’m going to reach 5 and a half minutes late and my boss will fire me/He’s going to hear what I have to say and break up with me)
I script entire scenes of what people might say and how I should respond in a situation that I think is going to be tense.  
These thoughts don’t occur in isolation. It’s more like setting off a sequence of infinite dominoes.
Without coping mechanisms, I will replay these thoughts in my head until I’ve grown physically sick or exhausted.
Everything is colored in overpowering dread.
Tumblr media
GIF credit: Cartoon Network
Important things to remember:
The anxious person in your circle may be exhibiting totally different symptoms. Learn to watch for them.
If your friend is good at covering up physical symptoms, their high-anxiety thoughts will manifest in casual speech.
Anxiety can impair people’s responses in social situations. Your person could get aggressive or emotional without significant triggers.
Anxiety is that it takes up so much of your attention, you forget important tasks and events. The consequences of this create feelings of shame, frustration and even more anxiety.
Things that help me:
Writing my thoughts in a journal
Talking
Giving myself permission to cry (Either I do this privately or in strictly in the company of non-judgmental people)
Unhooking my bra / Loosening a button / Taking off a jacket
Changing into something comfortable if the situation permits
Drinking a glass of water slowly and deliberately
Bathing in cool water
On particularly bad symptom days, lying down/curling into a fetal position
A nap
Eating helps, but I have to be very, very careful because it can often send me into a spiral of emotional overeating.
Taking a small walk/removing myself from a hostile situation
Listening to calming music or affirmation tapes
Making extensive notes about pending tasks that are stressing me out
My mum sent me this excellent video about breathing techniques.
My friend Sukanya taught me about tapping. This video explains the technique.
Tumblr media
How you could help your anxious friend/child/partner:
First things first. Respectfully encourage the person to see a counselor.
If a counselor suggests your friend should see a psychiatrist, be supportive. Medication works for many people.
If your friend is anxious at the idea of taking medication, encourage them to ask as many questions as they need to.
Listen to your friend.
Ask if they want to be touched or held. Respect explicit consent. (Do not hug a person having a panic attack; it could potentially choke them)
If you see research-backed material about coping with anxiety, do share with them. It’s a nice way to show acceptance.
Speak in your calmest, indoor voice.
It’s okay to offer perspective, but try not to do it in a way that belittles the anxious person.
If you think you have feedback for your person, give it respectfully. Gently point what behavior you think they could change. If you must criticize, criticize a behavior not the person.
“Hey I know you’re going through a lot, but you used some really harsh words back there” is better than “You’re a mean person.”
Expect resistance. Gentle nudging is okay, but don’t aggressively push someone.
It can be frustrating to watch someone panic about something that you think is trivial. Try not to let it get to you. Temporarily remove yourself from their company if you find their anxiety affecting you. In the long run, this is better for everyone.
Finally, this is hard but give your friend the permission to remove themselves from your company if they tell you it makes them anxious. Try not to hold a grudge about this.
Some things you should avoid saying in anxiety situations:
Is there a way to fix this problem once and for all? 
Calm the fuck down.
Stop overthinking. (It’s a great idea but howwwww? Like how does anyone do it? If you find out, please tell me)
Is your period due? (Note: this doesn’t upset me personally because I genuinely experience high anxiety before my period. So it serves as a reminder. But it could trigger other women so if you must ask, be respectful)
Wow. So you made it to the end. 
Tumblr media
(GIF Source: Yseult Tumblr)
If you’re a friend/partner/ally to a person with anxiety, and you made it this far, I’m sending you warm fluffy fluffs. You care enough about your anxious buddy to read a long-ass post from a stranger on the internet. 
If you’re a person with anxiety, I give you props. And hope and love and strength to get through this day and the next. 
I’m pretty sure I’ve missed out a lot. Feel free to add your thoughts. Originally, this was a long twitter thread and if you wish to read it in its original form, go here.
1 note · View note
srekaindustries · 3 years
Text
Nine Bonuses of Using Horticulture Markers in Auckland
If you plan to grow a wide variety of flowers, plants, vegetables, and flowers, this post is for you. Are you a seasoned gardener? You’d know that it becomes hard for a mower to remember the names of veggies. It’s why people use Horticulture markers Auckland that help plant flowers a bit easier.
Tree tags and plant tags make it easier to remember the names of plants. It holds no matter it is a botanical garden or you’re an at-home gardener. The benefits of using the Auckland horticulture marker are endless.
Benefits of using horticulture markers in Auckland
We will talk about ways you can make the most of horticulture markers.
Horticulture markers in Auckland are useful in the following ways.
You’re growing food and have people to harvest the crops.
Learning how to identify veggies
Trying to save bulbs and crops
Share a plot of the plant and want to keep things private from the person next to you.
Horticulture markers Auckland help you locate the place that you use no longer.
It helps you educate others all about a walk-through garden.
It helps you group plants as per your requirements.
Identify plants that got sold already.
Monitor plants by marking veggies that got planted before.
Are you ready to plant crops? Now that you know the benefits of using a plant label, it’s time to learn how you can use it. What are the types of plant markers available in Auckland?
Types of plant markers available in Auckland
Plastic plant markers
Plant markers help you color-code veggies. You can write on  Utility Markers Wellington using a sharpie and all-weather marker. You can likewise laser etch the label to get your company name, logo, and address engraved on it. It is versatile and can work for an at-home retail garden. Plastic plant tags are temporary and can work for about a year or so. It depends on how harsh the climate is. Plastic plant markers in Auckland are available in vibrant shades like yellow, green, blue, or pink. If you need colored plant tags, reach out to our store today.
Write-on plant tags that last long
Write-on plant tags get crafted of thin metal.
You can engrave on it using a sharp object or pen.
Utility Markers Wellington could need a biodegradable backing to work on the tag.
It is perfect for gardeners looking for a piece of specific details about a plant. For instance, it could include the name, species, and genus of the vegetable.
Wooden tags to label plants
Plain and treated plant tags are available in a range of sizes. Choose a plant tag, depending on the type of potted plant you want to label.
If you want to label plants for sale at a store, you can go for a wood pot label.
Wooden labels are perfect for tagging plants, bushes, and rows of trees. It gets written on a sharpie.
Whether you need plastic, wooden, or metal tag, we got your back. For quotes on Lane Divider Wellington, contact us today.
0 notes
namanag · 3 years
Text
Q/A (Hadoop)
Q - What if NameNode fails in Hadoop?
A - The single point of failure in Hadoop 1x is NameNode. If NameNode gets fail, the whole Hadoop cluster will not work. Actually, there will not any data loss only the cluster work will be shut down, because NameNode is only the point of contact to all DataNode and if the NameNode fails, all communication will stop.
Q - What is Data Locality?
A - If we bring the data from slave to master, it will cost network congestion + input output channel congestion , and at the same time master node will take a lot of time to process this huge amount of data. We can send this process to data, means we can send the logic to all slaves which contains data and perform processing in the slave itself, result will be sent to name node, will take less time.
Q - An a folder, 100 files are there. Each file size is 1 MB, if block size is 64 MB, total how many blocks will be created?
A - 100 blocks will be created.
Q - Explain what is heartbeat in HDFS?
A - Heartbeat is referred to a signal used between a data node and Name node, and between task tracker and job tracker, if the Name node or job tracker does not respond to the signal, then it is considered there is some issues with data node or task tracker.
Q - What happens when a data node fails?
A - When a data node fails...
Job tracker and NameNode detect the failure.
On the failed node all tasks are re-scheduled.
NameNode replicates the user's data to another node.
Q - Explain what are the basic parameters of a Mapper?
A - The basic parameters of a Mapper are
LongWritable and Text
Text and IntWritable
Q - What Are Hadoop Daemon?
A - Daemons are the processes that run in the background. There are four primary daemons: NameNode, DataNode, Resource Manager (runs on master node for YARN), Node Manager (runs on slave node for YARN).
Q - Why divide the file into blocks?
A - Let’s assume that we don’t divide, now it’s very difficult to store a 100 TB file on a single machine. Even if we store, then each read and write operation on that whole file is going to take very high seek time. But if we have multiple blocks of size 128 MB then its become easy to perform various read and write operations on it compared to doing it on a whole file at once. So we divide the file to have faster data access i.e. reduce seek time.
Q - Why replicate the blocks in data nodes while storing?
A - Let’s assume we don’t replicate and only one yellow block is present on DataNode D1. Now if the data node D1 crashes we will lose the block and which will make the overall data inconsistent and faulty. So we replicate the blocks to achieve fault-tolerance.
#Check this out
0 notes
ibentoy · 3 years
Photo
Tumblr media
💗These kawaii Cute Candy Ballpoint Pens, which will make kawaii lovers can't put it down.💗Smooth writing, removable design, replace the refill, use longer, press the design can hold the pen tip back without damage.💗You can collect many cute candy pens, like getting many sweet gifts.Size: about 14cm longWriting point: 0.5mm, black https://ift.tt/3l6Qmu8
0 notes
ai-news · 1 month
Link
Long-context LLMs require sufficient context windows for complex tasks, akin to human working memory. Research focuses on extending context length, enabling better handling of longer content. Zero-shot methods and fine-tuning enhance memory capacity #AI #ML #Automation
0 notes
siva3155 · 5 years
Text
300+ TOP MAPREDUCE Interview Questions and Answers
MAPREDUCE Interview Questions for freshers experienced :-
1. What is MapReduce? It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming. 2. What are 'maps' and 'reduces'? 'Maps' and 'Reduces' are two phases of solving a query in HDFS. 'Map' is responsible to read data from input location, and based on the input type, it will generate a key value pair,that is, an intermediate output in local machine.'Reducer' is responsible to process the intermediate output received from the mapper and generate the final output. 3. What are the four basic parameters of a mapper? The four basic parameters of a mapper are LongWritable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters. 4. What are the four basic parameters of a reducer? The four basic parameters of a reducer are Text, IntWritable, Text, IntWritable.The first two represent intermediate output parameters and the second two represent final output parameters. 5. What do the master class and the output class do? Master is defined to update the Master or the job tracker and the output class is defined to write data onto the output location. 6. What is the input type/format in MapReduce by default? By default the type input type in MapReduce is 'text'. 7. Is it mandatory to set input and output type/format in MapReduce? No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as 'text'. 8. What does the text input format do? In text input format, each line will create a line off-set, that is an hexa-decimal number. Key is considered as a line off-set and value is considered as a whole line text. This is how the data gets processed by a mapper. The mapper will receive the 'key' as a 'LongWritable' parameter and value as a 'Text' parameter. 9. What does job conf class do? MapReduce needs to logically separate different jobs running on the same cluster. 'Job conf class' helps to do job level settings such as declaring a job in real environment. It is recommended that Job name should be descriptive and represent the type of job that is being executed. 10. What does conf.setMapper Class do? Conf.setMapperclass sets the mapper class and all the stuff related to map job such as reading a data and generating a key-value pair out of the mapper.
Tumblr media
MAPREDUCE Interview Questions 11. What do sorting and shuffling do? Sorting and shuffling are responsible for creating a unique key and a list of values.Making similar keys at one location is known as Sorting. And the process by which the intermediate output of the mapper is sorted and sent across to the reducers is known as Shuffling. 12. What does a split do? Before transferring the data from hard disk location to map method, there is a phase or method called the 'Split Method'. Split method pulls a block of data from HDFS to the framework. The Split class does not write anything, but reads data from the block and pass it to the mapper.Be default, Split is taken care by the framework. Split method is equal to the block size and is used to divide block into bunch of splits. 13. How can we change the split size if our commodity hardware has less storage space? If our commodity hardware has less storage space, we can change the split size by writing the 'custom splitter'. There is a feature of customization in Hadoop which can be called from the main method. 14. What does a MapReduce partitioner do? A MapReduce partitioner makes sure that all the value of a single key goes to the same reducer, thus allows evenly distribution of the map output over the reducers. It redirects the mapper output to the reducer by determining which reducer is responsible for a particular key. 15. How is Hadoop different from other data processing tools? In Hadoop, based upon your requirements, you can increase or decrease the number of mappers without bothering about the volume of data to be processed. this is the beauty of parallel processing in contrast to the other data processing tools available. 16. Can we rename the output file? Yes we can rename the output file by implementing multiple format output class. 17. Why we cannot do aggregation (addition) in a mapper? Why we require reducer for that? We cannot do aggregation (addition) in a mapper because, sorting is not done in a mapper. Sorting happens only on the reducer side. Mapper method initialization depends upon each input split. While doing aggregation, we will lose the value of the previous instance. For each row, a new mapper will get initialized. For each row, inputsplit again gets divided into mapper, thus we do not have a track of the previous row value. 18. What is Streaming? Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output. It could be Perl, Python, Ruby and not necessarily be Java. However, customization in MapReduce can only be done using Java and not any other programming language. 19. What is a Combiner? A 'Combiner' is a mini reducer that performs the local reduce task. It receives the input from the mapper on a particular node and sends the output to the reducer. Combiners help in enhancing the efficiency of MapReduce by reducing the quantum of data that is required to be sent to the reducers. 20. What happens in a TextInputFormat? In TextInputFormat, each line in the text file is a record. Key is the byte offset of the line and value is the content of the line. For instance,Key: LongWritable, value: Text. 21. What do you know about KeyValueTextInputFormat? In KeyValueTextInputFormat, each line in the text file is a 'record'. The first separator character divides each line. Everything before the separator is the key and everything after the separator is the value. For instance,Key: Text, value: Text. 22. What do you know about SequenceFileInputFormat? SequenceFileInputFormat is an input format for reading in sequence files. Key and value are user defined. It is a specific compressed binary file format which is optimized for passing the data between the output of one MapReduce job to the input of some other MapReduce job. 23. What do you know about NLineOutputFormat? NLineOutputFormat splits 'n' lines of input as one split. 24. What is the difference between an HDFS Block and Input Split? HDFS Block is the physical division of the data and Input Split is the logical division of the data. 25. After restart of namenode, Mapreduce jobs started failing which worked fine before restart. What could be the wrong ? The cluster could be in a safe mode after the restart of a namenode. The administrator needs to wait for namenode to exit the safe mode before restarting the jobs again. This is a very common mistake by Hadoop administrators. 26. What do you always have to specify for a MapReduce job ? The classes for the mapper and reducer. The classes for the mapper, reducer, and combiner. The classes for the mapper, reducer, partitioner, and combiner. None; all classes have default implementations. 27. How many times will a combiner be executed ? At least once. Zero or one times. Zero, one, or many times. It’s configurable. 28. You have a mapper that for each key produces an integer value and the following set of reduce operations Reducer A: outputs the sum of the set of integer values. Reducer B: outputs the maximum of the set of values. Reducer C: outputs the mean of the set of values. Reducer D: outputs the difference between the largest and smallest values in the set. 29. Which of these reduce operations could safely be used as a combiner ? All of them. A and B. A, B, and D. C and D. None of them. Explanation: Reducer C cannot be used because if such reduction were to occur, the final reducer could receive from the combiner a series of means with no knowledge of how many items were used to generate them, meaning the overall mean is impossible to calculate. Reducer D is subtle as the individual tasks of selecting a maximum or minimum are safe for use as combiner operations. But if the goal is to determine the overall variance between the maximum and minimum value for each key, this would not work. If the combiner that received the maximum key had values clustered around it, this would generate small results; similarly for the one receiving the minimum value. These sub ranges have little value in isolation and again the final reducer cannot construct the desired result. 30. What is Uber task in YARN ? If the job is small, the application master may choose to run them in the same JVM as itself, since it judges the overhead of allocating new containers and running tasks in them as outweighing the gain to be had in running them in parallel, compared to running them sequentially on one node. (This is different to Mapreduce 1, where small jobs are never run on a single task tracker.) Such a job is said to be Uberized, or run as an Uber task. 31. How to configure Uber Tasks ? By default a job that has less than 10 mappers only and one reducer, and the input size is less than the size of one HDFS block is said to be small job. These values may be changed for a job by setting mapreduce.job.ubertask.maxmaps , mapreduce.job.uber task.maxreduces , and mapreduce.job.ubertask.maxbytes It’s also possible to disable Uber tasks entirely by setting mapreduce.job.ubertask.enable to false. 32. What are the ways to debug a failed mapreduce job ? Commonly there are two ways. By using mapreduce job counters YARN Web UI for looking into syslogs for actual error messages or status. 33. What is the importance of heartbeats in HDFS/Mapreduce Framework ? A heartbeat in master/slave architecture is a signal indicating that it is alive. A datanode sends heartbeats to Namenode and node managers send their heartbeats to Resource Managers to tell the master node that these are still alive. If the Namenode or Resource manager does not receive heartbeat from any slave node then they will decide that there is some problem in data node or node manager and is unable to perform the assigned task, then master (namenode or resource manager) will reassign the same task to other live nodes. 34. Can we rename the output file ? Yes, we can rename the output file by implementing multiple format output class. 35. What are the default input and output file formats in Mapreduce jobs ? If input file or output file formats are not specified, then the default file input or output formats are considered as text files. MAPREDUCE Questions and Answers pdf Download Read the full article
0 notes
thefeatherquill · 6 years
Photo
Tumblr media
Never thought I’d have my heart broken. Never thought I was the type. Figured I could brush it off, laugh it up and drink it away. That wasn’t the way it went. It was like a fire storm, just anguish and fury. Funny thing is, the only one I could blame was me. At least, that was how it was at first. Slowly I start to see back through the looking glass. Saw the truth for what it was…what it is. There was blame to go around for all parties. More than enough to share. It didn’t really matter though, not in the end. Never thought I’d have my heart broken. Never thought I’d get back to who I really. Figured I knew who I was, but that wasn’t who I am. --Skald #poetry #poem #longwrite #read #bringontheink #inkinmyveins #writersunite #written #writeordie #poet #writer #literature #thefeatherquill #writerscreed #spilledink #craft
2 notes · View notes
deepblueruin · 8 years
Photo
Tumblr media
BOMBAY TIMES, WE NEED TO TALK ABOUT YOUR SHITTY ROMANCE ADVICE. The 90's are over (thank god). And girls need no longer wait for pooranmashi ki raat wala Valentine's Day to meet their janam janam ka saathi while sighing around a bunch of candles. But Bombay Times doesn't know that. So, like a retiree uncle who cares too much, BT continues to give bad romance advice - in this case, things to keep in mind if you are texting him back... Don't worry BT, I fixed it for you. First. It's okay to text first, irrespective of what organ lies between your legs. Here's a rule of thumb when it comes to texting someone- Do you want to text them? Yes? Will texting them right now count as harassment? No? Then text them. CHANGE YOUR STATUS WHENEVER THE FUCK YOU FEEL LIKE. If a human bases his feelings for you on whether you change your what's app status from "banana ice-cream is bae" to "Bhai roxxx", let him go. Can't go around trusting someone who has no appreciation for two of the finest things in life (banana ice cream and Bhai). SAY WHATEVER YOU THINK. Like whatever. You like making music with burp sounds? Say it. If he feels the same way, but about fart sounds, girl, you got a band. USE HIS OWN LINES ON HIM But go two steps ahead and coat them in extra cheese. You don't need a man who can't stand cheese. THE HEART EMOJI IS NOT AN ACTUAL HEART. Throw it around as liberally as you want. But remember, consensual heart emoji exchange is everything. Ask him if he wants a heart. He does? 😘❤️💛💚💙💜💖 KEEP IT TERSE. Why spend time typing out long texts when you could be out talking over coffee instead? EVERY PUNCTUATION MARK COUNTS. Otherwise a simple sentence like "Come with me" can become totally ridden with innuendo. DO NOT REPLY TO SOON OR TOO LATE. No, keep an egg timer close at hand so you can put all work aside and monitor the exact moment the conversation is hard boiled enough to hit send.
3 notes · View notes
ai-news · 1 month
Link
Long-context LLMs require sufficient context windows for complex tasks, akin to human working memory. Research focuses on extending context length, enabling better handling of longer content. Zero-shot methods and fine-tuning enhance memory capacity #AI #ML #Automation
0 notes
siva3155 · 5 years
Text
300+ TOP MAPREDUCE Interview Questions and Answers
MAPREDUCE Interview Questions for freshers experienced :-
1. What is MapReduce? It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming. 2. What are 'maps' and 'reduces'? 'Maps' and 'Reduces' are two phases of solving a query in HDFS. 'Map' is responsible to read data from input location, and based on the input type, it will generate a key value pair,that is, an intermediate output in local machine.'Reducer' is responsible to process the intermediate output received from the mapper and generate the final output. 3. What are the four basic parameters of a mapper? The four basic parameters of a mapper are LongWritable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters. 4. What are the four basic parameters of a reducer? The four basic parameters of a reducer are Text, IntWritable, Text, IntWritable.The first two represent intermediate output parameters and the second two represent final output parameters. 5. What do the master class and the output class do? Master is defined to update the Master or the job tracker and the output class is defined to write data onto the output location. 6. What is the input type/format in MapReduce by default? By default the type input type in MapReduce is 'text'. 7. Is it mandatory to set input and output type/format in MapReduce? No, it is not mandatory to set the input and output type/format in MapReduce. By default, the cluster takes the input and the output type as 'text'. 8. What does the text input format do? In text input format, each line will create a line off-set, that is an hexa-decimal number. Key is considered as a line off-set and value is considered as a whole line text. This is how the data gets processed by a mapper. The mapper will receive the 'key' as a 'LongWritable' parameter and value as a 'Text' parameter. 9. What does job conf class do? MapReduce needs to logically separate different jobs running on the same cluster. 'Job conf class' helps to do job level settings such as declaring a job in real environment. It is recommended that Job name should be descriptive and represent the type of job that is being executed. 10. What does conf.setMapper Class do? Conf.setMapperclass sets the mapper class and all the stuff related to map job such as reading a data and generating a key-value pair out of the mapper.
Tumblr media
MAPREDUCE Interview Questions 11. What do sorting and shuffling do? Sorting and shuffling are responsible for creating a unique key and a list of values.Making similar keys at one location is known as Sorting. And the process by which the intermediate output of the mapper is sorted and sent across to the reducers is known as Shuffling. 12. What does a split do? Before transferring the data from hard disk location to map method, there is a phase or method called the 'Split Method'. Split method pulls a block of data from HDFS to the framework. The Split class does not write anything, but reads data from the block and pass it to the mapper.Be default, Split is taken care by the framework. Split method is equal to the block size and is used to divide block into bunch of splits. 13. How can we change the split size if our commodity hardware has less storage space? If our commodity hardware has less storage space, we can change the split size by writing the 'custom splitter'. There is a feature of customization in Hadoop which can be called from the main method. 14. What does a MapReduce partitioner do? A MapReduce partitioner makes sure that all the value of a single key goes to the same reducer, thus allows evenly distribution of the map output over the reducers. It redirects the mapper output to the reducer by determining which reducer is responsible for a particular key. 15. How is Hadoop different from other data processing tools? In Hadoop, based upon your requirements, you can increase or decrease the number of mappers without bothering about the volume of data to be processed. this is the beauty of parallel processing in contrast to the other data processing tools available. 16. Can we rename the output file? Yes we can rename the output file by implementing multiple format output class. 17. Why we cannot do aggregation (addition) in a mapper? Why we require reducer for that? We cannot do aggregation (addition) in a mapper because, sorting is not done in a mapper. Sorting happens only on the reducer side. Mapper method initialization depends upon each input split. While doing aggregation, we will lose the value of the previous instance. For each row, a new mapper will get initialized. For each row, inputsplit again gets divided into mapper, thus we do not have a track of the previous row value. 18. What is Streaming? Streaming is a feature with Hadoop framework that allows us to do programming using MapReduce in any programming language which can accept standard input and can produce standard output. It could be Perl, Python, Ruby and not necessarily be Java. However, customization in MapReduce can only be done using Java and not any other programming language. 19. What is a Combiner? A 'Combiner' is a mini reducer that performs the local reduce task. It receives the input from the mapper on a particular node and sends the output to the reducer. Combiners help in enhancing the efficiency of MapReduce by reducing the quantum of data that is required to be sent to the reducers. 20. What happens in a TextInputFormat? In TextInputFormat, each line in the text file is a record. Key is the byte offset of the line and value is the content of the line. For instance,Key: LongWritable, value: Text. 21. What do you know about KeyValueTextInputFormat? In KeyValueTextInputFormat, each line in the text file is a 'record'. The first separator character divides each line. Everything before the separator is the key and everything after the separator is the value. For instance,Key: Text, value: Text. 22. What do you know about SequenceFileInputFormat? SequenceFileInputFormat is an input format for reading in sequence files. Key and value are user defined. It is a specific compressed binary file format which is optimized for passing the data between the output of one MapReduce job to the input of some other MapReduce job. 23. What do you know about NLineOutputFormat? NLineOutputFormat splits 'n' lines of input as one split. 24. What is the difference between an HDFS Block and Input Split? HDFS Block is the physical division of the data and Input Split is the logical division of the data. 25. After restart of namenode, Mapreduce jobs started failing which worked fine before restart. What could be the wrong ? The cluster could be in a safe mode after the restart of a namenode. The administrator needs to wait for namenode to exit the safe mode before restarting the jobs again. This is a very common mistake by Hadoop administrators. 26. What do you always have to specify for a MapReduce job ? The classes for the mapper and reducer. The classes for the mapper, reducer, and combiner. The classes for the mapper, reducer, partitioner, and combiner. None; all classes have default implementations. 27. How many times will a combiner be executed ? At least once. Zero or one times. Zero, one, or many times. It’s configurable. 28. You have a mapper that for each key produces an integer value and the following set of reduce operations Reducer A: outputs the sum of the set of integer values. Reducer B: outputs the maximum of the set of values. Reducer C: outputs the mean of the set of values. Reducer D: outputs the difference between the largest and smallest values in the set. 29. Which of these reduce operations could safely be used as a combiner ? All of them. A and B. A, B, and D. C and D. None of them. Explanation: Reducer C cannot be used because if such reduction were to occur, the final reducer could receive from the combiner a series of means with no knowledge of how many items were used to generate them, meaning the overall mean is impossible to calculate. Reducer D is subtle as the individual tasks of selecting a maximum or minimum are safe for use as combiner operations. But if the goal is to determine the overall variance between the maximum and minimum value for each key, this would not work. If the combiner that received the maximum key had values clustered around it, this would generate small results; similarly for the one receiving the minimum value. These sub ranges have little value in isolation and again the final reducer cannot construct the desired result. 30. What is Uber task in YARN ? If the job is small, the application master may choose to run them in the same JVM as itself, since it judges the overhead of allocating new containers and running tasks in them as outweighing the gain to be had in running them in parallel, compared to running them sequentially on one node. (This is different to Mapreduce 1, where small jobs are never run on a single task tracker.) Such a job is said to be Uberized, or run as an Uber task. 31. How to configure Uber Tasks ? By default a job that has less than 10 mappers only and one reducer, and the input size is less than the size of one HDFS block is said to be small job. These values may be changed for a job by setting mapreduce.job.ubertask.maxmaps , mapreduce.job.uber task.maxreduces , and mapreduce.job.ubertask.maxbytes It’s also possible to disable Uber tasks entirely by setting mapreduce.job.ubertask.enable to false. 32. What are the ways to debug a failed mapreduce job ? Commonly there are two ways. By using mapreduce job counters YARN Web UI for looking into syslogs for actual error messages or status. 33. What is the importance of heartbeats in HDFS/Mapreduce Framework ? A heartbeat in master/slave architecture is a signal indicating that it is alive. A datanode sends heartbeats to Namenode and node managers send their heartbeats to Resource Managers to tell the master node that these are still alive. If the Namenode or Resource manager does not receive heartbeat from any slave node then they will decide that there is some problem in data node or node manager and is unable to perform the assigned task, then master (namenode or resource manager) will reassign the same task to other live nodes. 34. Can we rename the output file ? Yes, we can rename the output file by implementing multiple format output class. 35. What are the default input and output file formats in Mapreduce jobs ? If input file or output file formats are not specified, then the default file input or output formats are considered as text files. MAPREDUCE Questions and Answers pdf Download Read the full article
0 notes
deepblueruin · 10 years
Text
I'm thinking of life with Takeshi's Castle voice overs. Of the idea that every time you fall or bump something in an embarrassing way, a voice is narrating your every movement and enjoying your every misfortune.
This is what the God's watch for television, don't they?
0 notes