Tumgik
#blameless postmortem
i forget if i’ve posted about this before. stop me
blameless postmortem culture has a lot to offer, but other people explain that plenty. here’s the catch: it only works if these two conditions are met:
1. everyone involved is doing their earnest best (or at least, meeting the effort expectations agreed in the team)
2. everyone involved is working toward the same set of goals
if either of these conditions is not met, you have a problem. if the root cause boils down to “jimmy didn’t want to deal with it so he didn’t”, unfortunately that’s a people problem. you may be able to engineer it a little bit, but you can never really prevent it.
if the root cause is “someone or some team was working toward a different goal from the rest of us”, that’s either a communication problem (benign) or a people problem (malicious). in the benign case you can engineer better communication models and depend on people Doing Their Best to prevent the problem. in the malicious case, you can attempt to limit the impact of a trusted adversary…but generally at great cost to productivity, which really means the adversary wins anyways.
now that i’m looking at it, this really condenses down to just one idea, since you could say that doing your best toward a counterproductive goal on purpose is simply not doing your best in context. but yeah. if your RCA reaches “so and so chose to do y instead of x” and the next “why” comes up with “because they don’t care about the success of the project”, you really can’t engineer that away.
4 notes · View notes
welikeonion1 · 1 year
Text
I like to read faraday’s postmortem journal whenever I’m feeling sad. The “Where are you, my dear Marie?” kills me every time. And if you listen to it on audible it’s even sadder. When he said “such elaborate ideas those mortals had!” it sounded like he was breaking down. Ik it doesn’t appear that way when you read it but it certainly sounds like it on audible.
“I am far from perfect and far from blameless,” “I’ve made many a selfish mistake in my time.” Hmm sounds like faraday feels responsible for their deaths even though Curie would’ve died no matter what.
Xenocrates always would’ve been promoted and Goddard would’ve always wanted Curie out of the way so that he could run unopposed. So she would’ve died a far worse death if faraday hadn’t intervened. At least she got to die on her own terms. I feel bad that faraday feels guilty for something out of his control(I think faraday would’ve definitely gone on a murder streak if she died that way and he figured out who did it). I get him feeling guilty about Citra and Rowan but he clearly grieves more for curie so it’s sad that he feels responsible.
17 notes · View notes
tech-insides · 3 months
Text
SRE vs DEVOPS:
In the rapidly evolving landscape of software development and IT operations, two methodologies have emerged as pivotal to ensuring the reliability, scalability, and efficiency of systems: Site Reliability Engineering (SRE) and DevOps. While they share common goals, they approach these goals from different angles, each bringing unique principles and practices to the table. Let's delve into the distinctions and the ways they complement each other.
What is DevOps?
DevOps, a blend of "Development" and "Operations," is a set of practices aimed at bridging the gap between software development and IT operations. It emphasizes collaboration, automation, and continuous integration/continuous deployment (CI/CD) to deliver software more rapidly and reliably. Key principles of DevOps include:
Culture of Collaboration: Breaking down silos between development and operations teams to foster better communication and teamwork.
Automation: Automating repetitive tasks to reduce human error and increase efficiency.
Continuous Integration/Continuous Deployment (CI/CD): Continuously integrating code changes and deploying them to production to ensure fast and reliable delivery of features and fixes.
Monitoring and Feedback: Continuously monitoring systems and gathering feedback to improve the development process and system performance.
What is SRE?
Site Reliability Engineering (SRE), developed by Google, applies software engineering principles to IT operations. The primary goal of SRE is to create scalable and highly reliable software systems. Key principles of SRE include:
Emphasis on Reliability: Ensuring that systems are highly available, resilient, and performant.
Service Level Objectives (SLOs): Defining specific metrics to measure the performance and availability of services.
Error Budgets: Allowing a certain amount of acceptable failure to balance innovation and reliability.
Automation and Engineering: Using software engineering techniques to automate operations tasks and manage infrastructure as code.
Blameless Postmortems: Analyzing incidents without assigning blame to learn from failures and prevent them in the future.
Key Differences Between SRE and DevOps
Origin and Focus:
DevOps: Emerged from the need to improve collaboration between development and operations teams. Focuses on the entire software lifecycle, from development to deployment and operations.
SRE: Originated at Google with a strong emphasis on reliability and scalability. Focuses primarily on the reliability and operability of services.
Approach to Reliability:
DevOps: Aims for continuous improvement and rapid delivery, sometimes at the expense of reliability.
SRE: Balances reliability and innovation using error budgets and SLOs to ensure that reliability is a top priority.
Team Structure:
DevOps: Often involves cross-functional teams where developers and operations personnel work together throughout the software development lifecycle.
SRE: Typically consists of specialized engineers who apply software engineering principles to operations tasks.
Automation and Tooling:
DevOps: Focuses on automating the software delivery pipeline, including CI/CD, testing, and deployment.
SRE: Extends automation to operational tasks, such as incident response, scaling, and system management.
How SRE and DevOps Complement Each Other
While SRE and DevOps have distinct focuses, they are not mutually exclusive. In fact, they complement each other in several ways:
Shared Goals: Both aim to improve the delivery and reliability of software systems.
Cultural Alignment: Both promote a culture of collaboration, continuous improvement, and shared responsibility.
Automation: Both emphasize the importance of automation, although in different areas (DevOps in CI/CD, SRE in operations).
Feedback Loops: Both use monitoring and feedback to drive improvements in systems and processes.
Conclusion
In the debate of SRE vs DevOps, it's important to recognize that both methodologies bring valuable practices and principles to the table. Organizations can benefit from adopting both approaches, using DevOps to streamline development and deployment processes, and SRE to ensure that these processes result in reliable and scalable systems. By understanding and leveraging the strengths of each, businesses can achieve a balanced approach to innovation and reliability, ultimately delivering better software and services to their users.
0 notes
wicultyls · 8 months
Text
DevOps Certification and Chaos Engineering: Testing System Resilience
In today's rapidly evolving technological landscape, where businesses are constantly striving for faster delivery and higher quality of software products, DevOps has emerged as a crucial methodology. DevOps practices emphasize collaboration, automation, and integration between software development and IT operations teams to deliver reliable software solutions efficiently. With the increasing adoption of DevOps, the demand for skilled professionals who can effectively implement DevOps practices is also on the rise. This has led to the proliferation of DevOps certification programs aimed at validating individuals' expertise in various aspects of DevOps.
However, while DevOps certification equips professionals with essential knowledge and skills to streamline software delivery processes, it's equally important to ensure the resilience and reliability of systems under varying conditions. This is where Chaos Engineering comes into play. Chaos Engineering is a discipline that advocates for intentionally injecting failure into systems to proactively identify weaknesses and enhance resilience. By simulating real-world failures, Chaos Engineering enables organizations to uncover vulnerabilities and build more robust systems that can withstand unexpected disruptions.
Combining DevOps certification with Chaos Engineering practices can significantly enhance an organization's ability to deliver resilient software solutions. Here's how:
Comprehensive Skillset: DevOps certification programs typically cover a broad range of topics, including continuous integration, continuous delivery, infrastructure as code, and automated testing. By adding Chaos Engineering principles to the mix, professionals gain a more comprehensive understanding of system behavior under stress and learn how to design systems that are resilient to failures.
Proactive Problem-Solving: Traditional testing approaches often focus on validating expected behavior under ideal conditions. However, in today's complex distributed systems, failures are inevitable. Chaos Engineering encourages a shift in mindset towards proactive problem-solving by deliberately introducing failures and observing system responses. This proactive approach helps teams identify weaknesses before they manifest in production, reducing the likelihood of costly outages.
Cultural Transformation: DevOps is not just about tools and practices; it's also about fostering a culture of collaboration, experimentation, and continuous improvement. Similarly, Chaos Engineering promotes a culture of resilience by encouraging teams to embrace failure as an opportunity for learning and growth. By integrating DevOps and Chaos Engineering principles, organizations can nurture a culture that values resilience as a shared responsibility across development, operations, and QA teams.
Continuous Validation: DevOps promotes the idea of continuous integration and continuous delivery (CI/CD), where code changes are automatically built, tested, and deployed. Incorporating Chaos Engineering into CI/CD pipelines allows teams to continuously validate system resilience alongside functional correctness. This ensures that resilience is not an afterthought but an integral part of the software delivery process.
Improved Incident Response: Despite best efforts, failures can still occur in production environments. However, organizations equipped with both DevOps and Chaos Engineering practices are better prepared to handle incidents effectively. DevOps principles such as blameless postmortems and automated incident response workflows, combined with Chaos Engineering's emphasis on understanding failure modes, enable teams to quickly diagnose issues, mitigate impact, and implement preventive measures.
In conclusion, while DevOps certification provides professionals with essential skills for streamlining software delivery, integrating Chaos Engineering principles into DevOps practices enhances system resilience and reliability. By embracing both disciplines, organizations can build a culture of continuous learning, experimentation, and resilience, ultimately delivering better software products that meet the demands of today's dynamic business environment.
0 notes
Text
Tumblr media
Site Reliability Engineering (SRE) aims to ensure service reliability and performance. It balances rapid innovation with system stability using tools like error budgets and SLOs. SRE emphasizes automation, blameless postmortems, and continuous improvement, bridging the gap between development and operations. Embark on a transformative journey with our Site Reliability Engineer (SRE) Certification program.
0 notes
Text
What's the Main Objective of SRE?
Site Reliability Engineering (SRE) aims to ensure service reliability and performance. It balances rapid innovation with system stability using tools like error budgets and SLOs. SRE emphasizes automation, blameless postmortems, and continuous improvement, bridging the gap between development and operations. Embark on a transformative journey with our Site Reliability Engineer (SRE) Certification program.
Tumblr media
0 notes
jonfazzaro · 1 year
Link
"No one asks for permission in a bottom-up culture shift, they just start to act in ways they believe are better, and others begin to follow."
0 notes
bakersimmer · 3 years
Text
Get to know me tag
I was tagged by @majospirina @justasaltyllama @simmanatti thanks! 💞
rules: answer the questions and tag 9 people you want to get to know better
favourite colour: Forest green, black, wine red currently reading: Article about Blameless Postmortem Culture last song you listened to:  R3HAB & GATTÜSO - Creep last series you watched: I don't care much about TV anymore.  sweet, spicy or savoury: Savoury and spicy craving: G&T tea or coffee: Coffee currently working on: What's going to happen on the bridge, but my brain isn't cooperating. 
Most of you have already answered these questions, I think. So I'm not going to tag anybody. 
19 notes · View notes
kcwcommentary · 5 years
Text
VLD8x09 – “Knights of Light Part 1”
8x09 – “Knights of Light Part 1”
The first time I watched this episode, I spent most of the episode being nothing but confused. This time, there is still a lot that confuses me, but I’m able to focus in on what does so. This episode retcons what has previously been depicted as the Lions’ consciousness. Now, the void that Shiro spent seasons three through six in is no longer the Black Lion’s consciousness. It’s just a miscellaneous realm of connectivity between everyone in the universe’s minds. In the style of how what was previously described as the Black Lion’s consciousness was animated, now similar appearing locations exist inside Honerva’s mind. The way universal consciousness is depicted in this episode is so uncontrolled that the episode has the Paladins arrive at Honerva’s mind twice during their attempt to get to her, or if the first mind they get to wasn’t Honerva’s then it was someone that isn’t identified. They get pulled into Honerva’s mind, but then the things characters say indicate that they’re not in Honerva’s mind. Basically, this episode’s production couldn’t keep clear about its locations. Maybe that’s just a problem with writing disembodied, nebulousness like this: you lose sense of the logistics of the action.
I’m also super annoyed that this episode now adds a second level of blamelessness to Honerva and Zarkon. The show has long been pushing the idea that neither of them are to blame for their actions because they were externally influenced by being poisoned by quintessence. But now, this episode says that they weren’t to blame because they were possessed by rift entities. I am not okay with a story telling me that it’s not the fault of abusive, torturous, genocidal dictators that they abuse, torture, and commit genocide.
The episode starts with Allura having another dream, floating in darkness.  There’s a voice calling her name, I think it’s Honerva? Allura wakes up, Coran and Lance by her bed in the medical bay. Coran asks her about the entity, and Allura says, “I did what needed to be done. […] This entity, it is connected to Honerva in some way. I believe we can use it.” Coran goes patriarchal, saying, “I swore to your father I would look after you, but I fear I may have let him down.” Because how dare a woman make her own decisions.
Coran says, “This is the path of darkness.” We get lots of supposedly ominous statements like this in this part of the story, but none of it really amounts to anything. It’s certainly not foreshadowing, nor is it used as set-up to be undermined. It’s not as if the story is having it seem like the rift entity is dangerous only to reveal, as I suggested last commentary as having been a potential better story, that the entity isn’t dangerous, just upset at being imprisoned and hurt by Honerva. Nothing like that happens in this story. It’s just the entity is miscellaneously dangerous, and then the danger to Allura never happens. 
I still think the way gravity works on the Atlas is odd. Shiro floats through a hallway up to a door, and once at the door, he steps down onto the floor. There’s no gravity in the center of the hallway, but there’s gravity along the sides of the hallway. It comes off to me as nonsense. Shiro opens the door and inside are the Paladins and Coran. “You wanted to see me?” Shiro asks. This again emphasizes how Shiro is not part of the team. The rest of them had this big conversation without including him in it until now.
Keith says, “We think we might have a way to find Honerva.” Allura tells Shiro, “The entity has bonded me to Honerva. The link is there whether we use it or not.” I guess this is just more miscellaneous, unexplained space magic. The entity isn’t really defined in this show. We’ve seen the entities in the past be aggressive, we’ve seen it merge with others of its kind to fight. That was presented as being a threat (3x07 “The Legend Begins). Now, we’ve seen it having been inside Tova, but precisely what it did in him wasn’t explained. I guess maybe the show is saying the entity is just there to let Honerva control and kill the Colony Alteans? That doesn’t make the entity really seem inherently threatening in and of itself though. That’s just using the entity as a mechanism to say this is how Honerva can control people. That’s Honerva, not the entity, being threatening.
Allura says (and Lance looks super angry while she speaks), “I believe if the Paladins connect using the shared consciousness of Voltron, we may be able to travel through the void and into Honerva’s mind.”
I have a huge problem with this. This is feels like a massive retcon to what “the void” has been shown as in the past. This is connected back to 5x03 “Postmortem,” in which all five Paladins used their bayards in Voltron. In that episode, Voltron is being attacked by a plant-virus and Pidge says, “Listen, this virus is affecting Voltron on a submolecular level. To drive it out, we have to tap into the quantum energy that binds us all to Voltron.” Allura then responds, “The bayards, they amplify each Paladin’s life force. They might provide enough power to drive out the virus.” The five of them put their bayards in their respective Lions’ bayard slots, and then they all appear within “the void.” The void is in “Postmortem” and in every instance that has dealt with Shiro and the Black Lion’s bond, like 2x07 “Space Mall” and 6x06 “All Good Things,” been about the Paladins and Voltron. The void has not been some otherworldly location separate from the Lions/Voltron. There is absolutely no reason whatsoever for the void to now be connected to Honerva. Honerva is not a Paladin. She has never been a Paladin. The psychic space of the Lions/Voltron should have absolutely nothing to do with her.
I can totally understand why people would see this episode and, on this alone, think that this is evidence of a massive change in the season’s plot really late into the season’s production. I can see how there’s a suggestion here of a revised story where originally the Paladins went into the psychic space of Voltron having something to do with Shiro and his having been stored by the Black Lion in her psychic space. If the void is now just some nebulous otherworld that does not belong to the Lions, then it completely erases the previously stated fact that the Black Lion kept Shiro in its consciousness. But the show’s use of the void in the past has been fairly explicit that that is what was happening. So, this episode contradicts severely with past episodes.
That’s not to say that the executive producers and writers of this show would have any problem with writing this story to contradict previously written story. They don’t seem to have any real desire to maintain continuity and consistency in the writing for this show. Since most of the plot of this series seems like the EPs and writers were mostly just winging it as they went along, I don’t think they cared that this violates what’s been established in the story. Sometimes, I honestly don’t think that Joaquim Dos Santos and Lauren Montgomery were even interested in telling a story. I think they just wanted to do story-less animation. So many times, this show feels like they were trying to do nothing more than to make some Voltron: Defender of the Universe AU fanart. They certainly did not construct a coherent, cohesive narrative. Maybe that’s part of what angers me so much about VLD. I’m a writer. I love storytelling. I love animation, but I come to animation as being a medium through which a story can be told. JDS and LM saw this project as animation first and foremost, and storytelling was only secondary, at best. The story of this show was only something they were forced to do in order to be allowed to make animation. I don’t think they really cared about telling a story.
Back to this episode. Pidge says, “That could in theory give us access to her physical location as well as key information on how to defeat her.” I’m still stuck on how they would see entering Voltron’s psychic space would have anything to do with Honerva. This is just feels forced.
Keith says, “Honerva is capable of creating galactic komars, wormholes, Robeasts,” I know this show doesn’t do logic, but Honerva’s “galactic komar” was dependent on the Robeasts. How many Robeasts did she create from the statues at Oriande? I know, the show doesn’t show us so that it can have an endless supply for whenever the writers want to pull a new one out of the bag and not have to keep track of the logistics of them all. All of the Robeasts used in 8x06 “Genesis” should be out of the story. Oriande and the white hole went boom, so none of them there should have survived (though since Merla pops up again, I guess they did). The ones used as part of this “galactic komar” though have been found; that’s where the four of the six Colony Alteans now onboard the Atlas came from. So, did the Atlas just leave those Robeasts laying on those respective planets? Are they just sitting there waiting for Honerva to reclaim them? Otherwise, how is the “galactic komar” still a threat since it was built out of tech (the mechas and the Olkari cubes) that are destroyed or no longer functioning?
Keith continues, “And now, Lotor and his mech are out there somewhere.” Again, this show presents this story as if Lotor is actually alive, but we’ll soon be shown that he’s a melted corpse. It does make the season feel like it was re-edited. If it wasn’t re-edited, then this is absolutely the creative team for this show repetitiously manipulating the audience. If they hadn’t already proven themselves to be more than willing to engage in audience manipulation, especially when it comes to Lotor’s part in the show’s story, then I might be more willing to give more weight to the possibility that this was re-edited. I can see JDS, LM, and the writers in a meeting saying to write the season’s story to make it look like Lotor is alive only to surprise twist! he’s been dead the whole time. They wouldn’t care about the inconsistency that writing this season that way would have because their goal would be to create the surprise twist! that they seem to think is how you create a story. And it would make sense that they would think that that is how you create a story if creating that story was always secondary, at best, to their goals with this project.
Keith says, “We don’t have any other leads. It might take lifetimes for another opportunity like this to come around.” Are you kidding me? Keith thinks that Honerva is going to sit out and do nothing for “lifetimes?” Remember, they’re saying they need to do this in order to find Honerva, that’s the opportunity of a lifetime: a chance to find her. But if Honerva’s not done doing what she plans to do, then she’ll show up again, thus they will have found her by her just continuing whatever it is she’s doing. This line of dialog is just not written well.
Shiro speaks. “I spent a lot of time in the infinite void.” Yes, you did, and that void was inside the Black Lion, not some otherworld that is connected to Honerva. “And if you face Honerva in the void—” They shouldn’t be able to because how is Honerva able to be in the Lions’/Voltron’s psychic space?
Lance says, “We’re messing with powers we don’t fully understand.” Unfortunately, the writers of this show don’t fully understand those powers either because they didn’t bother to define those powers.
Coran says, “It’s been a long time since it was only the seven of us in a room together.” That’s because the writers didn’t bother to continue to write you guys as the main characters.
The Paladins get in their Lions. Allura has a flash of Honerva’s face and screams. She then has a vision of Honerva and Merla floating in a hallway. It makes me think of how the Shiro-clone had visions of Honerva going to Oriande in 6x01 “Omega Shield.” Unlike with Allura, it couldn’t be that the clone had a rift entity in him. I’m still wondering why exactly the clone was able to see Honerva in that episode. The initial blast of Allura seeing an outline of Honerva’s face made it seem like Honerva was directly trying to access Allura, but then the vision of Honerva floating down a hallway would suggest Honerva was just doing her thing and Allura was eavesdropping. Even within just a couple of seconds, this show seems to contradict itself.
The show then cuts to a total tonal dissonance by having Veronica and Iverson discussing Shiro’s win at arm-wrestling while having a “robot arm.” I really like (the barely included) Curtis in the background turned in his seat and listening to this conversation while having an adorkable smile.
So, Voltron forms, the Paladins do their joint bayard use. The animation team was clearly lazy and just reused a shot from the animation in 5x03 “Postmortem.” You can tell that it’s reused animation because Keith is currently the Black Paladin. Keith, though he has the black bayard, wears red armor. In the animation of the bayards being used here in “Knights of Light Part 1,” the user of the black bayard has black armor, just like he did in “Postmortem.” So, in this particular shot here in “Knights of Light Part 1,” that’s Shiro’s arm, not Keith’s.
Everything glows and then turns dark. Then the Paladins enter the void. Allura declares that they “must travel through that light.” Generic, but okay. She says, “The entity draws me toward it.” That implies that either the entity has some agency or that for some reason the entity is unwillingly attracted toward Honerva. (I really hate that to continue to discuss this episode, I have to accept the retcon that the Lions’/Voltron’s psychic space is not the Lions’ consciousness but some otherworld that Honerva is connected to.)
The Paladins use their suits’ jetpacks to move toward the light. I know it’s a minor matter, but they’re not physically in this location right now; they’re sitting in their Lions. So, why write them to look like they’re using their jetpacks to move through space? They get to the light, which fills the screen, and then dissipates into normal looking space. Lance says, “What is this place? It’s like I can hear what the universe is thinking.” What? This show is going to write the universe itself to be sentient? I imagine that this line really is nothing but the writers thinking they’re being profound when really it’s just them being nonsensical.
Pidge invokes the Olkari having said everything is made of the same energy. Hunk replies, “So, thoughts are linked across some kind of, what, cosmic connection?” I’m so uninterested in this. The psychic space used to seem special, that it was a manifestation of the Lions’ consciousness. Now, the void is just some generic overmind of the universe. The implication of this is that the Black Lion didn’t save Shiro’s spirit at all. If the void is not some psychic space of the Lions’ consciousness but instead is just a universal overmind, then Shiro’s spirit wasn’t being stored in the Black Lion. Shiro’s spirit was just where spirits are.
Allura has another flash headache, and then everyone else does too. They see Honerva and Merla somewhere. It looks like Oriande with how the mechas are standing in the background. Cut to Voltron’s eyes glowing.
Back in the overmind. Allura explains that “that was Honerva. The entity inside of me is connected to her.” This dialog is getting repetitive. Allura says that they now have a psychic link with Honerva. “The closer we are to her, the stronger that link.” But of course, they’re not physically closer to her, so I guess this “closer” is referencing their being psychically closer to her here in the overmind? In which case, this line of dialog becomes a tautology. Lance worries that Honerva will use this link the Paladins now have with Honerva’s mind to find them. Then they will have achieved their goal; remember, they’re doing this to find where Honerva is. Also, Honerva has enough space magic already, I doubt she needs this link to find them.
It feels really weird for Keith to say, “This isn’t just on you now, Princess.” They haven’t interacted with Allura in recognition of her as a princess in a long time. It feels strange hearing him do so now. It actually sounds kind of condescending when he says it. Keith tells them all to miscellaneously focus on Honerva’s energy. They glow. The Lions’ eyes glow. Each Paladin is shown associated to their Lions. And then the Lions are flying through space.
What is happening here? I guess the Lions aren’t actually flying anywhere, that this is somehow now them in the overmind going with the Paladins through the overmind to Honerva’s mind? But then, they fly into the distance and there’s a flash of light and the Lions are back together in Voltron. So, is this actual Voltron or some Voltron in the overmind?
This is a huge part of why I hate this episode: It so too damn confusing. I can imagine the creative team had certain specific ideas in mind when they were doing this, but they failed to tell this story clearly, and I don’t know what they were trying to tell.
Voltron looks like it’s travelling through space. There’s even planets and moons. Because Voltron actually travels through space, there’s nothing visually that allows for this to contrast with actual space if this is supposed to be Voltron traveling through the overmind.
Hunk says, “I can feel something, like an energy inside me.” And Allura says, “It’s the entity.” The entity is now in Hunk? In all the Paladins? Keith says, “It’s like a dark realization washing over.” This line does not mean anything. It’s just miscellaneous spew that the writer thinks sounds profound and ominous. It doesn’t though. Pidge says, “It’s like we’re begin pulled by a tether connected to our souls.” Again, this is nowhere near as profound as the writer thinks it is. It also places the locus of action on the entity, pulling them, rather than on the Paladins actively traversing through the overmind. It’s sort of depriving the Paladins of agency. It also doesn’t explain why the entity is being drawn toward Honerva.
They come upon what looks like a black hole. Everything goes dark, then every looks like neurons, which I assume is supposed to be Honerva’s brain. Then Voltron looks like it comes out of a light, like it’s left the neurons behind. It’s so confusing. I’m really trying to understand what this animation is actually depicting, but I don’t really know.
Voltron flies through space some more, and then everything turns dark again. And then Voltron is flying through that darkness with little bits of quintessence? floating off of it. Then Voltron separates into little colored dots. From the wide shot, you can’t tell if they’re the Lions or the Paladins. They float through blackness with more quintessence-bits floating from a glowing event horizon. The camera reorients, and the glowing Paladins land on the black surface.
I don’t know why Voltron was involved in this traversing of the overmind. What did Voltron actually do for the Paladins in this process? And why is Voltron now not part of the process? Why did it poof into just being the Paladins in the overmind again? Also, the animation already had Voltron enter what looked like a black hole, which I assumed was Honerva’s mind, but then they left it, floated through space some more, and now have arrived at what looks like another black hole. So, why was there a first black hole if that wasn’t Honerva’s mind? Whose mind did that first black hole belong to?
They stand on a surface, and Allura says that Honerva’s mind is “on the other side of this wall.” There are streaks of black with streaks of red or orange for eyes moving under the wall. Allura says, “It feels like that these are the souls that Honerva has defeated and corrupted.” Why has the show not shown us that Honerva could corrupt souls before now? (Or is corrupting a soul what she did to the clone to be able to control him? That would make it even more offensive that the Paladins described the clone as evil.) Honerva corrupting souls just comes out of nowhere. There is nothing in the show prior to this that sets-up this reveal. And, if Honerva can explicitly corrupt someone else’s soul, then how did the writers think it’s acceptable for them to expect the audience by the end of the series to view Honerva as absolved of her horrible behavior?
The show here says Honerva can corrupt souls, it has been saying for seasons now that Honerva was corrupted by quintessence poisoning, and in this episode says that Honerva was corrupted by a rift entity. It’s all a mess that does not locate motivation for character action within the character.
Spectral hands come out of the floor-wall and grab Allura first and then all the Paladins. There are a lot of hands, so there are a lot of souls that Honerva has corrupted. The Paladins, with the exception of Keith, are pulled down into the floor-wall, into Honerva’s mind. Why wasn’t Keith pulled in?
First Pidge opens her eyes and the way the place she is looks, she’s in a greenish version of what had been depicted as the Black Lion’s consciousness in 6x06 “All Good Things” when Keith spoke to Shiro’s spirit there. But Allura just said that “on the other side of this wall” was Honerva’s mind. But the visuals of this would suggest that this is the Green Lion’s consciousness. So, how is the Green Lion’s consciousness inside Honerva’s mind?
A green-outlined shadowy person with a polearm weapon appears and attacks Pidge. Cut to a yellowy-orangey area where Hunk is being attacked by a similar figure with a staff. Lance in a red area fighting one with a sword. Allura in blue and fighting one with a bow. Keith meanwhile is still on the surface of Honerva’s mind.
Pidge says, “I can’t even feel my Lion.” I still don’t understand why the animators chose to have the environments the Paladins are fighting in resemble what “All Good Things” presented as the Black Lion’s consciousness if here the Paladins are, as Pidge says, disconnected from their Lions. I don’t know if the executive producers and animators in making this episode either didn’t remember or recognize the significance of this background from when it was used in “All Good Things,” but that’s hard for me to accept as a possibility. Or if they instead thought that assigning meaning to animation like they did for this background style from “All Good Things” just didn’t matter and that they could just port over the background style and ignore the previous meaning.
In Shiro and Keith’s conversation in “All Good Things, Shiro said, “somehow the Black Lion retained my essence,” to which Keith asked Shiro, “Is that where we are, in the Black Lion’s consciousness?” So, this background style has most definitively been established as being that of the Lions’ consciousness. But now, Pidge says she “can’t even feel [her] Lion.” I just don’t know what to make of this episode. It seems so disconnected from what has come before. No wonder people think this is the result of shattering some original story and reconstituting the pieces into this confusing mess.
Allura’s attacker shoots an arrow at her. She holds out her hand, shadowy wisps float off her hand, and the arrow stops. Her eyes turn black. She screams an inhuman scream. I guess we’re supposed to think this is the rift entity acting through her?
But then glowing, spectral versions of the Lions show up in each of the four’s combat area. So, they’re supposed to be inside Honerva’s mind, though it looks like the visual style previously used to depict a Lion’s consciousness, Keith can’t get inside Honerva’s mind with the other Paladins, but the Lions can enter into Honerva’s mind to join their Paladins? The Lions roar, and the dark wisps are blown by the wind of the roar off of Allura’s eyes. The shadowy figures the Paladins have been fighting have their shadow blown off of them revealing them to be the past Paladins.
So, we’re supposed to understand that Honerva somehow corrupted the souls of the Paladins. How did she corrupt Alfor’s soul? When did she corrupt Alfor’s soul? We saw in “The Legend Begins” that Alfor was killed by Zarkon on a bridge with no one else around them.
Allura’s eyes go dark again and she screams her inhuman scream and attacks Blaytz. The Blue Lion roars again and everything goes white. Then we see Alfor giving the other old Paladins their armor and their bayards. This scene depicts Zarkon pre-quintessence poisoning, and he says, “With this much power, we will be unstoppable.” So, his being a dictatorial, genocidal conqueror was always a part of who he was. He didn’t become bad because of quintessence poisoning. So, the show using the poisoning to excuse his behavior is just offensive. This flashback scene of the old Paladins unintentionally emphasizes Zarkon’s lack of presence in this fight so far, emphasizes Keith’s exclusion from the fight.
Allura stops short of stabbing Blaytz. “It’s really you,” she says. Pidge says, “Your soul, Honerva must have—” and Hunk continues, “—trapped you here somehow.” Of course, that makes me ask, how did Honerva do this? Especially with Alfor? We’ve seen his death. When did Honerva trap the Paladins’ souls inside her own mind? And why has that never been part of the story until now? Honerva has been active in this show’s plot for the whole series. Why is it only now that her having done this is relevant? This has never been a part of anything in the story until now. It clearly was not planned as part of the overall story arc. This really is just coming out of nowhere. And that, beyond this part of the story being super confusing, is why I don’t like it. The idea that the current Paladins would have to fight the old Paladins is a really interesting premise, but there’s been no set-up to lead to this fight. There’s nothing to this that makes it feel the inevitable outcome of the events prior to now.
Like too much in this show’s story, this is a set piece. It seems conceptualized wholly independently of the story and then the story is what was written to try to force connection between set pieces. It’s why the reveal of the clone and resulting battle between Keith and Shiro had a certain grandeur to it, but there was no reason ever given in the show for why Haggar had hundreds of clones of Shiro made. That fight was a set piece that was forced into the story rather than grown out of the story. Similarly, this fight with the old Paladins doesn’t come out of the story, it’s wedged into it, the story is forced to accommodate it, and that’s why it is so disconnected to what has come before. And in being so disconnected, the moment is deprived of the emotion it could and should have.
The Lions roar once again, the souls of the old Paladins and the backgrounds fracture and light pours out of it all and everything goes to white. All of the development in this conflict comes from the Lions, so if the Lions could clean the Paladins’ souls of corruption, they why have they been sitting there in this fight waiting instead of just doing so from the moment they came into this space?
Cut to a flashback with Alfor, Gyrgan, Trigel, Blaytz, and Coran discuss how Zarkon is going to come for them. Trigel wants to fight Zarkon, but Alfor doesn’t want to risk Zarkon getting the Lions. Alfor says they’ll use the other Lions to seal the Black Lion and then send the other four to where they were until they were found at the beginning of the series. Alfor tells Coran to use the Castle of Lions to take Allura and the Black Lion away. But then, Gyrgan says, “Then it is decided: We go into battle together one last time.” Of course, that doesn’t match Alfor’s having rejected fighting when Trigel suggested it first.
Cleaned of having been corrupted by Honerva, Trigel asks, “Where am I?” Hunk answers Gyrgan, “You’re in the void, just outside of Honerva’s mind.”
One, the backgrounds still look like the Black Lion’s consciousness, but Hunk doesn’t say they’re in their respective Lions’ consciousness, he says they’re “in the void.” But then he says that they’re “just outside of Honerva’s mind.” Keith is, the rest of them are not. The rest of them were dragged under the barrier into Honerva’s mind. This show really cannot keep this straight. So, which is it? They were pulled into Honerva’s mind, but it’s outside her mind. They’re in the void, but it looks like the Lions’ consciousnesses.
Trigel says she’s glad that “someone so connected to the world around her is piloting the Green Lion.” I still don’t think this show has actually shown Pidge to be connected to nature. “My race believes observation to be the most revered attribute.” As I’ve said before in my commentaries, I really don’t like when science fiction writes aliens to be monocultures. Trigel’s race doesn’t believe anything because it was made up of a bunch of different people, who like humanity, all have a bunch of different thoughts and opinions and beliefs, or at least her race should be like that if it was written realistically. Sorry, that’s just a peeve of mine.
Blaytz tells Allura, “People often overlook me because I was,” there’s a very slight hesitation in his voice, “different.” I very much imagine that this is supposed to be a reference to Blaytz being gay. Unfortunately, this phrasing sounds to me like how a straight person would write a gay person to speak. Using the word “different” as a stand-in functions like someone who doesn’t want to get personal, which you have to do to write dialog well. It’s seems clear to me that the word “different” was chosen to avoid actually, openly talking about Blaytz’s being gay. It makes the character’s experience generic, which causes it to lose emotion and meaning. I can imagine the writer defending this dialog by saying writing it this way gives it universality, but that’s a thought that comes out of unexamined privilege. By excluding the specifics in favor of claimed universality, you deny the cause and content of the discrimination and exclusion you claim to be rejecting, and so cease to be actually rejecting anything.
Blaytz then continues talking about the Blue Lion picking him. “But the Blue Lion recognized something in me, something others couldn’t see. It saw the greatness within that even I did not.” Imagine this dialog but instead of Blaytz talking to Allura, he was talking to Lance. It would be totally fitting in theme as another step in Lance’s struggle to recognize something great within himself that even he could not see. Lance had, until it was abandoned by the writers, serious issues with self-confidence and believing himself to have value as part of the team. Even this episode hints at that abandoned element of Lance’s character back at the beginning during the meeting with Shiro when Lance expresses pleasure that calling him “the sharpshooter” is now something the team does fondly. It would have made so much more sense for Blaytz to have said this to Lance. But instead, he says, “You, Allura, have greatness within you as well.” This isn’t something she needs him to tell her since she’s hasn’t wrestled with feeling fundamentally worthless, especially once she learned Altean alchemy at Oriande. She has doubted her ability to succeed, but not her inherent worth. “You’re so much like your father, and yet so different.” This line is so cliché and so meaningless.
Lance, with his long history of insecurity, gets nothing from Alfor that supports him as a person. Instead, the show returns to having patriarchy as the governing influence of Lance’s relationship with Allura. “Through the Lion’s bond, I could feel your love for my daughter,” Alfor says. “I could feel yours as well,” Lance says. Why is this show juxtaposing the love of a father for his daughter to the love a guy feels for his girlfriend? It’s just gross. It makes Lauren Montgomery’s claims that feminism has any influence whatsoever in this show seem absolutely absurd. But then, LM thought killing Allura at the end of the series was feminist. I still cannot get over how she considered having Allura reincarnated as an infant, literally infantilizing Allura, and have her then raised by Lance. It’s so creepy.
Alfor tells Lance, “We face many quests throughout the cosmos, but the most amazing journey is that of life. And the biggest question you face is who to go on that journey with. I’m glad my daughter chose you.” So, Lance receives nothing, no personal support from his Paladin-parallel. The others tell their respective Paladins something supportive of them, but all Lance gets is Alfor’s patriarchal approval of Lance as a suitor for Allura. This is such an absolute disservice to Lance’s character.
Keith remains on the surface of Honerva’s mind, unable to get through. Then suddenly the Black Lion is there. Why did it take so long for Black to show up for Keith when the other Lions had long showed up for their Paladins? Keith gets no special moment of affirmation with a previous Paladin. It feels absolutely like he was excluded, but for no reason other than they wanted to save a conflict with Zarkon until later. But if Zarkon was corrupted the same as the other old Paladins, why wouldn’t he have attacked Keith the same as the other old Paladins attacked their current counterparts? Almost as soon as the Black Lion shows up here with Keith on the outside of Honerva’s mind, the other Lions show up here too. The other Paladins, both old and new, show up here on the floor-wall to Honerva’s mind.
Allura runs to and hugs her father. Alfor says, “It is fitting that I would find what is brightest to me in the darkest place.” This feels a bit cliché. Allura says, “All that I have done, I have done to make you proud.” Trust me, I know what it feels like to want to make a parent proud, but this is so limiting. This is Allura defining herself by her father’s acceptance rather than defining herself by her own self. It’s very patriarchal to write a young female character to define herself by her father. I just don’t like it.
There’s a headache flash, and they see Honerva looking at a mecha, one that matches the one Allura saw Lotor piloting in her vision last episode when Lotor said, “Follow me!”
Alfor says to Allura, “You hold a dark entity within you! Don’t you know how dangerous that is!” Thank you so much for your patriarchal judgement, Alfor. Allura is so stupid that she couldn’t possibly know about something being dangerous unless you chastise her for it. Also, ignore the fact that this supposed dangerousness never manifests in the story.
Alfor says, “That’s what led to Honerva and Zarkon’s end!” So, not only were they poisoned by quintessence when they went into the quintessence field in “The Legend Begins,” now the show is saying that Honerva and Zarkon became who they are because they were possessed by a rift entity? The show is really doubling down on absolving Honerva and Zarkon for being abusive, murderous, genocidal dictators for 10,000 years. Not only is it supposed to be not their fault because they were poisoned, now it’s also not their fault because they were possessed. I do not find the-devil-made-me-do-it stories even remotely interesting. This is just the creative team of this show being too cowardly to actually have their villains be villains. They don’t mind retaining their proclamation that Lotor was evil all along because declaring him to be evil was their big, desired plot twist. But they were too scared to commit to having Haggar and Zarkon as villains. For so long, both characters were written as just generic, maniacal villains. Then, I think the creative team thought they were making Zarkon and Honerva be more complex villains by saying they did the horrible things they did because of an external force (quintessence). Now, the show is just creating another way with which they can absolve the characters of the horrible things they did. But it’s not like Honerva and Zarkon just said hurtful things. They abused their son. They enslaved people. They tortured people. They murdered people. They committed genocide, murdering the population of whole planets. This is not something to forgive them for. And it’s offensive that this show is telling us to do so.
Also, if Honerva is supposed to be absolved for her actions because she was being controlled by a rift entity, how is it that the show simultaneously has Honerva affecting others through the rift entity? She was trying to kill Tova last episode through the rift entity in him. So, is the rift entity controlling Honerva, or is she controlling rift entities?
Allura says, “I am not going to be afraid to use the power I have.” She says, “We need to continue,” and someone, Lance, I think, says, “But how do we get past the wall?” Well, the spirits of the old Paladins dragged everyone but Keith beneath the wall earlier, so, try that.
Allura says, “It’s like I can feel her thoughts. The way through is with the darkness.” I don’t know how these two sentences are supposedly connected. Is this the extent of what the entity is doing for Allura, just getting her inside Honerva’s mind?
Alfor says, “Honerva went mad—” it’s annoying that the show equates mental illness with dangerousness in this line of dialog “—obsessed with darkness and power.” She was obsessed with quintessence. So, is the show now calling quintessence “darkness?” I thought quintessence was supposed to be life energy.
Allura does a touch-hand-glow and the surface beneath them glows. A light line streaks through the surface and the geometrics like that which appear around wormholes and like the ones Honerva created on Oriande when she retrieved Sincline appears beneath them all. Everything starts glowing white and they disappear from the surface.
This episode is a mess. It’s ambitious, but it’s so unwieldly. It significantly contradicts previous parts of the show, and that really, really bothers me. I don’t like how it takes what felt special (the idea that the Lions had a psychic space created by their consciousness, and that that is where the Black Lion kept Shiro’s spirit after he died) and now just makes it all be some generic universal overmind.
And I really hate that this show is now saying Honerva and Zarkon’s horrible behavior is the fault of being possessed by rift entities. Before it was the result of being poised by quintessence. Did the show forget that they had already assigned one thing to blame for their behavior other than themselves? And that is the ultimate thing that angers me about this. Both quintessence poisoning and rift entity possession are cheats. They reveal a fear of letting villains be villains. They refuse to seriously deal with the reality that these characters are horrible people. By assigning blame to an external source, now two external sources, the show prevents the villains from having to confront their behavior, and the protagonists lose the ability to condemn the behavior as the narrative manifestation of their final confrontation with the villains. The same way that these episodes keep telling us the rift entities are a threat without ever really resolving that, the series, by locating the blame for the villains actions in external sources, never resolves the villains’ actions.
36 notes · View notes
Text
DevOps vs SRE : What's The Difference ?
Tumblr media
Hi there. I'm Nagarjoon, a SRE, at Cloudnow Technology. Before blog I write about how to build and operate reliable services.
Which is better, DevOps or SRE?
But first of all, maybe we should clarify some things. What do you think DevOps is? Back in the day, operators and developers had a lot of contention. Developers used to throw their code over the metaphorical wall, and operators were responsible for keeping that code running in production. Operators had little understanding of the code bases, and developers had little understanding of operational practices. But developers were concerned with shipping code, and operators were concerned with reliability. This misalignment often caused tension within the organization.
Tumblr media
So if I understand you correctly, you're saying that the developers were responsible for features, and the operators were responsible for stability, meaning the developers wanted to move faster to get their features out faster and the operators wanted to move slower to keep things stable? I could see how that would cause a lot of tension. Exactly. So DevOps is a set of practices and a culture designed to break down those barriers between developers, operators, and other parts of the organization. I break DevOps down into five key areas.
First, reduce organizational silence. By breaking down barriers across teams, we can increase collaboration and thorough put.
Second, accept failure as normal. Computers are inherently unreliable, so we can't expect perfection. And when we introduce humans into the system, we get even more imperfection.
Third, implement gradual change. Not only are small, incremental changes easier to review, but in the event that a gradual change does make a bug in production, it allows us to reduce our mean time to recover, making it simple to roll back.
Fourth, we need to leverage tooling and automation.
Fifth, we need to measure everything.
Measurement is a critical gauge for success. And without a way to measure if our first four pillars were successful, we would have no way of knowing if they were. So, you've been an SRE at Cloudnow technology for over 10 years now. Do you think any of the way that I described DevOps aligns with your experience as an SRE?. It's sounding very familiar. Because, if you think about DevOps as a philosophy, SRE is a prescriptive way of accomplishing that philosophy. So if DevOps were an interface in programming language, you might almost say that SRE is a concrete class that implements DevOps.
Let's take a look at how that is. When you talked about eliminating organizational silos, what I thought about is the fact that we share ownership of production with our developers. And we use the same tooling in order to make sure everyone has the same view and same approach to working with production. When you talked about accepting accidents and failure as normal, what I thought about is the fact that-- similar to many DevOps practitioners-- we have blameless postmortems, where we make sure that the failures that happen in our production systems don't happen the exact same way more than once. And we accept the failures as normal by encoding a concept of an error budget of how much the system is allowed to go out of spec. And then third, we talked about making gradual changes. And when you said that, I thought about the fact that we canary things, that we roll things out to a small percentage of the fleet before we move them out for all users. And then fourth, when you talked about leveraging tooling and automation, what I thought about is the fact that we try to eliminate manual work as much as possible. So we measure how much toil we have, and then we try to automate this year's job away. And then fifth, when you talked about measuring everything, I thought about exactly that measurement of measuring the amount of toil that we have and measuring the reliability and health of our systems.  I really like that. Class SRE implements DevOps. We should get that on a shirt or something. But just like a class in a programming language, there might be additional functions or methods that don't necessarily correspond to that interface.
Or the class might implement multiple interfaces. Do you think SRE is like that? I absolutely think that's the case because of the fact that SRE doesn't do things in the exact same way that other people that implement DevOps elsewhere might want to do. So we'll talk a little bit more about those differences, such as how exactly SLOs work, which are a very specific concept that we implement in order to make SRE successful. Great. Well, that settles it, then. It turns out that DevOps and SRE aren't two competing methods, but rather close friends designed to help break down organizational barriers to deliver better software faster. Thank you, everyone, for watching. Please be sure to check the description below for more links, and don't forget to subscribe to our channel. Stay tuned for our next Blog, where we will discuss the differences between SLIs, SLOs, and SLAs.Cloudnow Technologies provides devops consulting services for their clients to help to build high-performance, scalable and agile devops technology.Cloud now technology ranked as top devops services company in usa.
0 notes
ceejbot · 7 years
Text
How to handle an outage
I got some questions on Twitter about npm's incident response process, so here I am blogging yet again. I looked up our internal docs on the topic, and was surprised to notice how terse they are.
Here's the "how to handle an outage" document that's in our operations repo:
How to handle an outage
Take a deep breath. I know we joke that things are on fire, but they're not literally on fire. People can't install javascript packages. We'll fix it. It'll be okay.
The person on PagerDuty should assume point & hande initial investigations.
The point person should make sure that the following two roles are filled: operations, which acts to resolve the outage, and communications, which shares information about the outage to the public & the rest of the company. Escalate operations to a subject matter expert if you aren't qualified to fix the problem yourself, and assume the communications role. If you are the expert, pick somebody else as communicator. (Don't try to do both during a serious outage. It's too stressful.)
In very serious outages or security incidents, there might be a third role, that of coordinator who decides what actions to take next. If you move the discussion from #ops to #incident-response, it's probably serious enough to warrant that third role. Also, if more than one person is acting in operations to fix things, choose a coordinator to avoid collisions or conflicting efforts.
If the outage is visible to users in any way, the communicator should open a statuspage.io incident immediately. Do this even if we don't know anything yet. An open incident gives Support a place to point people for more information.
The communicator should keep the incident updated as we learn more: use the "identified", "monitoring", and "resolved" statuses to let people outside know what to expect.
People who aren't actively involved in the incident should ask their questions of the communicator, not the operations person.
The operations person or people should keep the Slack channel updated with new information as they can.
Serious incidents usually warrant postmortem discussions to figure out how improve our response next time as well as to note what we did well.
That's it
It's just enough to guide the team to doing three things:
clarify who has initial responsibility
assign further roles so we can coordinate
write a status incident so our users know what's happening
In addition, when the outage is more than a trivial one-- e.g., AWS is down or a single point of failure has failed and cannot be restored easily--we make our response more formal. We move all communication to one place, a Slack channel set aside for incident response. In that channel, communications are more stylized. We acknowledge requests and responses, and I take pains to thank people for actions.
Why do I do this? To slow things down.
Yes, I literally slow incident response down. Usually people are stressed during incidents and they're doing things in a rushed way, because they feel the urgency of the problem. Pausing to breathe and think carefully results in better decision-making, and overall faster resolution to the incident. I also think being extra-polite in sticky situations helps us as humans cope with stress. Our users might be yelling at us, but we're not yelling at each other. Instead, we're a kind and thoughtful team, and that's being modeled by the person with the CTO title.
I also think acking communication is a good way to make sure nothing gets lost. SYN/SYN-ACK/ACK is a way of life, you know?
Postmortems follow outages
The less grim word for this is "retrospective", but I seem to be stuck on the autopsy language. This is an important step for any incident or project. Some guidelines we follow:
The postmortem must happen after things have concluded and everyone is calm again.
Focus must be on processes not people. This is what "blameless postmortem" means for us.
Sometimes people become aware during incident response that they've made mistakes or taken down production with some action. I go out of my way to make it clear that this is our collective fault. Our processes let us down by not catching the problem before it could go live. If it's got to be any one person's fault, it's mine for not putting a better safety net in place before it could happen.
"Process" is a scary word that feels heavyweight, but what it really means is "the way you normally do things". You always have process; the question is whether your process is intentionally designed or not. You can and should change your processes when you find better ways to do things. Your processes are there to help you, not to be something you serve mindlessly.
Process is a safety net. Process is what allows us fallible humans to work and make mistakes without regularly blowing things up. It's a safety latch on a gate, a pre-launch checklist, your belayer checking your harness before you start climbing the cliff.
Some examples of processes you might have:
Pull requests are reviewed by at least one other engineer before they're landed.
Deploys get tested in a staging environment before they go to production.
Deploys get tested on a canary with a portion of production traffic before they roll out fully.
Libraries must have 90% (or 100% or something else) code coverage from unit tests.
In our retrospectives, we look at our processes and where they helped us and where they let us down. Did we follow them? If not, why not? What can we change so we can avoid this category of problem next time? What can we change so we can recover even faster from an event that's out of our control?
It's most fascinating to me when we have processes that we ignore. There's always a good reason. Sometimes it's that the process is a pain to follow, or not obviously helpful.
My general maxim is that the easy thing to do should be the right thing. Yes, the sentence is ordered that way. In a pinch we'll always go with the easiest thing to do, so we better prepare hard so when that moment comes, the easy thing is a good thing and not a trap. This can take a lot of planning and hard work to set up, but the trick is that you're doing that hard work when it's not stressful, when you're at your leisure to do it right. Danger operations used to have a saying: "maximize net slack". This means you work hard right now so you can put your feet up on your desk & drink a fancy drink with an umbrella in it later. I believe in this strategy.
The CTO's retro of the incident
We kinda hashed up internal communications during this incident. We did something new in this response, something we couldn't do before: most of the engineers involved in the response got into a Zoom video chat to share screens and discuss the problem. This was great for everybody in the video meeting, but was a comms black hole for everybody not in the meeting. We needed to do a better job of disentangling the person doing comms from any responsibility for debugging, so comms becomes their only job during incidents. I'll probably write something up about that. (If indeed this blog post by itself isn't enough! It might be.)
The npm team did a fantastic job of coming together and working without drama on the problem. I was in a 4-hour meeting that could not be interrupted so my participation was limited to initial escalation and kicking off the incident response process. My participation wasn't necessary, which was perfect. The retro meeting was huge because of how many people contributed meaningfully to the recovery, even the very newest members of the team. It was a good response to a bad event, and I was thrilled to watch it happen.
Making myself unnecessary is a victory.
4 notes · View notes
releaseteam · 4 years
Link
A masterpiece thread on #blameless postmortems #DevOps https://t.co/R2BKNwxHlp
— Francesco Gualazzi (@inge4pres) May 21, 2020
via: https://ift.tt/1GAs5mb
0 notes
hackernewsrobot · 5 years
Text
Blameless PostMortems and a Just Culture (2012)
https://codeascraft.com/2012/05/22/blameless-postmortems/ Comments
0 notes
Text
EFF is hiring an operations engineer
The Electronic Frontier Foundation (EFF) is seeking applicants for the position of Operations Engineer to join our Technical Operations team. EFF is the leading nonprofit organization defending civil liberties in the digital world. Founded in 1990, EFF champions user privacy, free expression, and innovation through impact litigation, policy analysis, grassroots activism, and technology development. We work to ensure that rights and freedoms are enhanced and protected as our use of technology grows.
EFF's Technical Operations team is the team responsible for designing and maintaining EFF's systems and networks while also providing hardware and software technical support for EFF's staff. The ideal candidate must work well with a very busy staff with varying levels of technical expertise.
Responsibilities:
Monitor and improve the performance and reliability of production GNU/Linux platforms, network infrastructure,services, and network-attached devices like PBX and conference systems. Maintain the corresponding documentation
Participate in on-call rotation and incident response efforts, perform blameless postmortem reporting
Develop and implement process improvements in the way we build, secure, manage, and maintain our infrastructure over its lifecycle from design through deployment, operation, and retirement
Respond to support requests within team-defined Service Level Agreements (SLAs). This includes working closely with other teams to design and adapt systems for changing program and operational requirements
Scale and standardize systems through automation
Communicate effectively about technical issues with non-technical parties
Minimum Qualifications:
Have a deep respect for user privacy and organizational security
5 years of system administration, site reliability engineering, or other ops experience in a unix-like environment
Excellent organizational, communication and people skills
Ability to debug and optimize code and automate routine tasks
Understanding of common algorithms, data structures, computational complexity analysis, and basic software design
Experience in one or more of the following computer languages: C, Python, Go, Bash / POSIX shell, Rust, PowerShell, Perl
Experience collaborating on engineering projects and processes
Experience operating cloud infrastructure including the configuration and maintenance of virtual servers & load balancers
Experience with TCP/IP network design, installation, and debugging
Experience with relational database administration
Preferred Qualifications:
10 years of experience operating and maintaining servers running unix-like operating systems
Experience in a large-scale or critical production service environment
Experience working in a managed cluster environment (such as Kubernetes, Mesos, Docker Swarm)
Fluency in multiple programming languages
Experience working in an environment controlled by orchestration tools like SaltStack, Puppet, Chef, Ansible, etc.
A history of open-source contributions (bug reports, pull requests, projects / packages)
Experience with network hardware and network hardware configuration orchestration
Experience with key-value or graph database administration
Experience using, designing, and implementing CI/CD pipelines
Experience with SSO, SAML, OAuth and other industry standard AuthN/AuthZ solutions
Experience with bring-your-own-device fleet management in a mixed environment (GNU/Linux, macOS, iOS, Android, etc.)
We offer an excellent benefits package including medical, dental, and vision insurance, a 403(b)(7) retirement savings program with matching, paid time off, holiday benefits, student loans assistance, housing cost assistance, parental leave, a dog-friendly workplace, and more.
As an advocacy organization, EFF is committed to being part of a diverse community. Diversity of life experiences makes a big difference in how we identify and litigate legal issues, design privacy-enhancing software, and organize our activism. To that end, we deliberately seek applicants with different perspectives, identities, and experiences to build a diverse and inclusive workplace to better inform our advocacy and defense of freedom in our digital world. EFF is an equal opportunity employer and encourages people of all races, genders, ages, abilities, orientations, ethnicities, and national origins to apply.
Interested in joining the team?
Apply here with:
A cover letter introducing yourself and telling us why you want to work at EFF.
A resume in PDF format with links to recent work.
https://www.eff.org/opportunities/jobs/operations-engineer
11 notes · View notes
Text
Source Coders: Senior SRE/DevOps Engineer - Quizlet (50M active users)
Tumblr media
Headquarters: San Francisco, CA URL: https://sourcecoders.io/
Company hiring: Quizlet.com
Technical Recruiting partner: SourceCoders.io
Location: Onsite in San Francisco or Denver or Remote for CST or EST-based candidates 
Compensation: $120K-$200K (heavily dependent on experience and work location)
Work visas accepted: US Citizen, Green Card, H-1B transfer, TN Visa
Quizlet’s mission is to help students (and their teachers) practice and master whatever they are learning. Every month more than 50 million active learners from 130 countries practice and master more than 300 million study sets on every conceivable topic and subject. We are developing new learning experiences by modeling how students learn and drawing upon knowledge acquisition, retention, and pedagogy in cognitive science. We are always seeking to help students master any subject by optimizing study efficiency and engagement. Want to be a go-to person for site reliability on the most-used learning platform in the U.S.? Want to work on a service that is rapidly scaling and relied upon by millions of students and teachers worldwide?  Quizlet is an indispensable utility used daily by millions of students and teachers around the globe. If our site goes down, even just for a few minutes, the pain is felt intensely. Speed is crucial, and downtime is not an option as we grow — during the school year, we are in the top 20 most-visited websites in the U.S. These are challenges you will face on day one at Quizlet.
What you'll do
Engage with service owners to improve the entire service lifecycle — from inception and design, through deployment, operation, maintenance, and sunset.
Help service owners drive their services through the service lifecycle through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Help service owners maintain their services once they are live by measuring and monitoring availability, latency, and overall system health.
Help scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Practice and evangelize sustainable incident response and blameless postmortems.
What we are looking for
Experience in designing, analyzing and troubleshooting distributed systems serving production traffic.
Experience with algorithmic thinking, data structures, and software complexity.
Experience in writing scripts in one or more languages such as Python or Go
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Ability and desire to debug and optimize code and automate routine tasks.
Experience with on-call duty, know why it’s hard, work to improve it, and make it so well documented that every engineer wants to be on rotation.
{Passion|Interest|Experience} with automation of code testing and deployment through the use of containers.
from We Work Remotely: Remote jobs in design, programming, marketing and more https://ift.tt/2SJbVVI from Work From Home YouTuber Job Board Blog https://ift.tt/3bTlGrB
0 notes
noredinktech · 5 years
Text
Use Pre-Mortems to Predict and Avoid Launch Failures
Imagine this: your team has been working on a big feature for the last couple months. You're about to have the first big milestone of the project: an observation in a real classroom. So exciting! But you can't shake the feeling that things will go wrong in surprising and terrible ways… what if you forgot something? Good news: that's normal, and you're human. Yay! Bad news: your team probably has similar feelings of dread. Hmm, what to do…
How about a pre-mortem?
"A pre-what-now?", I hear you say, "Don't you mean post-mortem? And don't you usually do those after something has already gone wrong?" Well, yes! We traditionally do retros after the fact, but with a little re-framing they're also useful for talking about things that might happen.
Try this: set up some time with your team. An hour should be about right, with enough time to make changes before your milestone. Include everyone who makes sense; at NoRedInk, this meant including the engineers, product manager, and designers a couple days before our QA deadline. When you're all together, set the scene:
OK, let's talk through possible failures while we're getting ready to ship. Imagine that it's a month after release, and our project has failed utterly. We have to talk about what happened so that we can avoid making the same mistakes again—what do we say?
Next, give everyone 10 minutes or so to write. Privacy is important here, so don't do it in a shared document. You want people to be able to share their worries and feelings unguarded. Keep yourself open to possibilities! People should feel free to write down anything they can think of, regardless of likelihood. This is actually more important than you'd think: in a previous NoRedInk pre-mortem, we said "everything shy of natural disasters is fair game," but then our classroom observation was cancelled due to a natural disaster and we didn't have a backup plan. Whoops, lesson learned.
You should also make it clear that people should be candid. The big goal of this exercise is to avoid problems where we can, so if people hide problems they know about you might as well not do it. Lead by example here!
It may be clear by this point that this exercise will really only work well in an established blame-free environment. If people cannot be honest about the things they're worried about, they're probably not going to speak up just because you asked them to. If you find that your organization is having trouble with blaming individuals feet in these kinds of scenarios, I'd recommend reading and applying blameless post-mortems.
After the writing time is up, go around and give everyone a chance to share one thing that they're most worried about. Have people raise their hands if they wrote down the same thing. Write down a summary of what the speaker said, and how many hands went up. Do several rounds here, sharing one item each time you go around. In a medium-size group (say 6 people), you're more likely to run out of time than concerns, so you will probably have to stop short of sharing everything to make time to address the concerns people shared.
Typical concerns will range from the annoying ("the user's device didn't support this feature") to human error ("we forgot to turn on the feature flag for the teacher") to the extreme ("we lost all student writing entered in the tool.") There's room for all of this here!
Next, start at the thing that the most people had written down. Remembering that we're treating this as something that already happened, how could it have been avoided? What should you have done differently to avoid this failure? Now might be a good time to apply 5 Whys or another root cause analysis technique. Make a plan to address these root concerns now!
Once you either have addressed everything or run out of time, thank everyone for their honesty and adjourn. After the meeting, work through the list with whoever is responsible for setting priorities to figure out what needs to be addressed right away. At NoRedInk, that means the team lead and product manager usually sit down to figure out prioritization.
And, that's it! With this list in hand, you can put mitigations in place to avoid the errors you were about to blindly stumble into. Hooray! At NoRedInk, we were suspicious about this practice originally but gave it a try to see how it would actually work. We ended up being pleasantly surprised! This technique is not only helpful for avoiding avoidable errors, but is also cathartic for the team. There's nothing quite like hearing from someone that they have also been worried about the things you are!
Next time you're about to launch something, I hope you'll consider making pre-mortems part of your planning process.
Further reading: - Performing a Project Premortem from the Harvard Business Review - Blameless PostMortems and a Just Culture from Code as Craft
Brian Hicks @BrianHicks Engineer at NoRedInk
(thanks to Michael Newton, Michael Hadley, and Anita Nuthi for reviewing drafts!)
0 notes