#architecture computer networking
Explore tagged Tumblr posts
Text
Demystifying Network Architecture: Building Blocks of Efficient Computer Networks
Introduction:
In the world of contemporary technology, where connectivity is critical, knowing network design is like knowing the digital world's blueprint.
1. Understanding Network Architecture:
Definition: The design, setup, and structure of computer networks are all included in network architecture, which also specifies how devices share and communicate with one another.
Importance: Optimized speed, improved security, and smooth data transfer all depend on an efficient network architecture.
2. Key Components of Network Architecture:
Physical Infrastructure:
Explanation of network devices: routers, switches, hubs, access points.
Overview of network cabling: Ethernet, fibre-optic, wireless connections.
Role of Network Interface Cards (NICs) in connecting devices to the network.

3. Network Topologies:
Bus Topology: Devices connected to a shared communication medium.
Star Topology: Devices connected to a central switch or hub.
Mesh Topology: Full or partial connections between all devices for redundancy.
Hybrid Topology: Combination of different basic topologies to meet specific needs.
4. Network Services and Security:
DHCP (Dynamic Host Configuration Protocol): Automatic IP address assignment.
DNS (Domain Name System): Translation of domain names to IP addresses.
NAT (Network Address Translation): Mapping of private IP addresses to public IP addresses.
Firewalls: Monitoring and control of network traffic for security purposes.
Conclusion:
Network architecture serves as the backbone of modern computing, enabling seamless communication and data exchange. By comprehending its core components - both physical and logical - and understanding the significance of network topologies and services, individuals can navigate the digital landscape with confidence, ensuring the efficiency, reliability, and security of computer networks.
MADMAN TECHNOLOGIES is one of the leading IT sector companies, where technical experts can help you to build your setup and best network where the chances of getting any threat are less. Computer Networks are actions designed to protect the convenience and integrity of the network and data. It incorporates both hardware and software technologies. Customers’ needs and worries are the priority.
For more details or any other further queries, you can contact the undersigned:-
email — [email protected]
Contact no. — 9625468776
#information technology#it services#technology#it products#itservices#it solutions#it technology#video conferencing#wifi#artificial intelligence#computer networking#architecture computer networking#network architecture
1 note
·
View note
Text
escapism gets a bad rap but i think we can actually use it proactively to build a better world and simply enter that world. if the escapism fantasy is realistic enough, there’s no reason why we couldn’t actually treat the escapism fantasy as a blueprint and manifest it materially. we just need resources proportional to the scale of the escapist fantasy itself
to be completely honest nobody should have to live in a world they don’t want to live in, there should be social tools to distribute realities to each according to their needs, within reason. each world could have its own constitution initialization program, a server instance
we’ll still have to all do work to maintain the infrastructure of such a system but, that could be accounted for! personally i want to live in a peaceful reality with hiking and beaches and books, and computers
i want to spend my days simply appreciating every single moment and making art. and i don’t mind helping out with IT infrastructure and some light gardening
#escapism#fantasy#science fiction#philosophy#worldbuilding#computer science#video games#life#blueprints#architecture#network architecture#game design#system sillies#magic system#it infrastructure#gardening#community building#mutual aid
7 notes
·
View notes
Text
Enterprise-Grade Datacenter Network Solutions by Esconet
Esconet Technologies offers cutting-edge datacenter network solutions tailored for enterprises, HPC, and cloud environments. With support for high-speed Ethernet (10G to 400G), Software-Defined Networking (SDN), Infiniband, and Fibre‑Channel technologies, Esconet ensures reliable, scalable, and high-performance connectivity. Their solutions are ideal for low-latency, high-bandwidth applications and are backed by trusted OEM partnerships with Cisco, Dell, and HPE. Perfect for businesses looking to modernize and secure their datacenter infrastructure. for more details visit: Esconet Datacenter Network Page
#Datacenter Network#Enterprise Networking#High-Speed Ethernet#Software-Defined Networking (SDN)#Infiniband Network#Fibre Channel Storage#Network Infrastructure#Data Center Solutions#10G to 400G Networking#Low Latency Networks#Esconet Technologies#Datacenter Connectivity#Leaf-Spine Architecture#Network Virtualization#High Performance Computing (HPC)
0 notes
Text
Beyond the Firewall: Edge Security Meets Zero Trust for a Safer Digital Frontier.
Sanjay Kumar Mohindroo Sanjay Kumar Mohindroo. skm.stayingalive.in Explore how Edge Security & Zero Trust Architecture with continuous verification secures distributed data and apps. Join the discussion!
Quick insights to shift your security approach. Today, data and apps live everywhere. The old wall around the network no longer holds. We must shift to a model that checks every request at…
#AI#Application Security#Continuous Verification#cyber-security#Cybersecurity#Distributed Data Protection#Edge Computing Security#Edge Security#Network Perimeter#News#Sanjay Kumar Mohindroo#security#technology#Zero Trust Architecture#Zero Trust Security
0 notes
Text
Shaktiman Mall, Principal Product Manager, Aviatrix – Interview Series
New Post has been published on https://thedigitalinsider.com/shaktiman-mall-principal-product-manager-aviatrix-interview-series/
Shaktiman Mall, Principal Product Manager, Aviatrix – Interview Series
Shaktiman Mall is Principal Product Manager at Aviatrix. With more than a decade of experience designing and implementing network solutions, Mall prides himself on ingenuity, creativity, adaptability and precision. Prior to joining Aviatrix, Mall served as Senior Technical Marketing Manager at Palo Alto Networks and Principal Infrastructure Engineer at MphasiS.
Aviatrix is a company focused on simplifying cloud networking to help businesses remain agile. Their cloud networking platform is used by over 500 enterprises and is designed to provide visibility, security, and control for adapting to changing needs. The Aviatrix Certified Engineer (ACE) Program offers certification in multicloud networking and security, aimed at supporting professionals in staying current with digital transformation trends.
What initially attracted you to computer engineering and cybersecurity?
As a student, I was initially more interested in studying medicine and wanted to pursue a degree in biotechnology. However, I decided to switch to computer science after having conversations with my classmates about technological advancements over the preceding decade and emerging technologies on the horizon.
Could you describe your current role at Aviatrix and share with us what your responsibilities are and what an average day looks like?
I’ve been with Aviatrix for two years and currently serve as a principal product manager in the product organization. As a product manager, my responsibilities include building product vision, conducting market research, and consulting with the sales, marketing and support teams. These inputs combined with direct customer engagement help me define and prioritize features and bug fixes.
I also ensure that our products align with customers’ requirements. New product features should be easy to use and not overly or unnecessarily complex. In my role, I also need to be mindful of the timing for these features – can we put engineering resources toward it today, or can it wait six months? To that end, should the rollout be staggered or phased into different versions? Most importantly, what is the projected return on investment?
An average day includes meetings with engineering, project planning, customer calls, and meetings with sales and support. Those discussions allow me to get an update on upcoming features and use cases while understanding current issues and feedback to troubleshoot before a release.
What are the primary challenges IT teams face when integrating AI tools into their existing cloud infrastructure?
Based on real-world experience of integrating AI into our IT technology, I believe there are four challenges companies will encounter:
Harnessing data & integration: Data enriches AI, but when data is across different places and resources in an organization, it can be difficult to harness it properly.
Scaling: AI operations can be CPU intensive, making scaling challenging.
Training and raising awareness: A company could have the most powerful AI solution, but if employees don’t know how to use it or don’t understand it, then it will be underutilized.
Cost: For IT especially, a quality AI integration will not be cheap, and businesses must budget accordingly.
Security: Make sure that the cloud infrastructure meets security standards and regulatory requirements relevant to AI applications
How can businesses ensure their cloud infrastructure is robust enough to support the heavy computing needs of AI applications?
There are multiple factors to running AI applications. For starters, it’s critical to find the right type and instance for scale and performance.
Also, there needs to be adequate data storage, as these applications will draw from static data available within the company and build their own database of information. Data storage can be costly, forcing businesses to assess different types of storage optimization.
Another consideration is network bandwidth. If every employee in the company uses the same AI application at once, the network bandwidth needs to scale – otherwise, the application will be so slow as to be unusable. Likewise, companies need to decide if they will use a centralized AI model where computing happens in a single place or a distributed AI model where computing happens closer to the data sources.
With the increasing adoption of AI, how can IT teams protect their systems from the heightened risk of cyberattacks?
There are two main aspects to security every IT team must consider. First, how do we protect against external risks? Second, how do we ensure data, whether it is the personally identifiable information (PII) of customers or proprietary information, remains within the company and is not exposed? Businesses must determine who can and cannot access certain data. As a product manager, I need sensitive information others are not authorized to access or code.
At Aviatrix, we help our customers protect against attacks, allowing them to continue adopting technologies like AI that are essential for being competitive today. Recall network bandwidth optimization: because Aviatrix acts as the data plane for our customers, we can manage the data going through their network, providing visibility and enhancing security enforcement.
Likewise, our distributed cloud firewall (DCF) solves the challenges of a distributed AI model where data gets queried in multiple places, spanning geographical boundaries with different laws and compliances. Specifically, a DCF supports a single set of security compliance enforced across the globe, ensuring the same set of security and networking architecture is supported. Our Aviatrix Networks Architecture also allows us to identify choke points, where we can dynamically update the routing table or help customers create new connections to optimize AI requirements.
How can businesses optimize their cloud spending while implementing AI technologies, and what role does the Aviatrix platform play in this?
One of the main practices that will help businesses optimize their cloud spending when implementing AI is minimizing egress spend.
Cloud network data processing and egress fees are a material component of cloud costs. They are both difficult to understand and inflexible. These cost structures not only hinder scalability and data portability for enterprises, but also provide decreasing returns to scale as cloud data volume increases which can impact organizations’ bandwidth.
Aviatrix designed our egress solution to give the customer visibility and control. Not only do we perform enforcement on gateways through DCF, but we also do native orchestration, enforcing control at the network interface card level for significant cost savings. In fact, after crunching the numbers on egress spend, we had customers report savings between 20% and 40%.
We’re also building auto-rightsizing capabilities to automatically detect high resource utilization and automatically schedule upgrades as needed.
Lastly, we ensure optimal network performance with advanced networking capabilities like intelligent routing, traffic engineering and secure connectivity across multi-cloud environments.
How does Aviatrix CoPilot enhance operational efficiency and provide better visibility and control over AI deployments in multicloud environments?
Aviatrix CoPilot’s topology view provides real-time network latency and throughput, allowing customers to see the number of VPC/VNets. It also displays different cloud resources, accelerating problem identification. For example, if the customer sees a latency issue in a network, they will know which assets are getting affected. Also, Aviatrix CoPilot helps customers identify bottlenecks, configuration issues, and improper connections or network mapping. Furthermore, if a customer needs to scale up one of its gateways into the node to accommodate more AI capabilities, Aviatrix CoPilot can automatically detect, scale, and upgrade as necessary.
Can you explain how dynamic topology mapping and embedded security visibility in Aviatrix CoPilot assist in real-time troubleshooting of AI applications?
Aviatrix CoPilot’s dynamic topology mapping also facilitates robust troubleshooting capabilities. If a customer must troubleshoot an issue between different clouds (requiring them to understand where traffic was getting blocked), CoPilot can find it, streamlining resolution. Not only does Aviatrix CoPilot visualize network aspects, but it also provides security visualization components in the form of our own threat IQ, which performs security and vulnerability protection. We help our customers map the networking and security into one comprehensive visualization solution.
We also help with capacity planning for both cost with costIQ, and performance with auto right sizing and network optimization.
How does Aviatrix ensure data security and compliance across various cloud providers when integrating AI tools?
AWS and its AI engine, Amazon Bedrock, have different security requirements from Azure and Microsoft Copilot. Uniquely, Aviatrix can help our customers create an orchestration layer where we can automatically align security and network requirements to the CSP in question. For example, Aviatrix can automatically compartmentalize data for all CSPs irrespective of APIs or underlying architecture.
It is important to note that all of these AI engines are inside a public subnet, which means they have access to the internet, creating additional vulnerabilities because they consume proprietary data. Thankfully, our DCF can sit on a public and private subnet, ensuring security. Beyond public subnets, it can also sit across different regions and CSPs, between data centers and CSPs or VPC/VNets and even between a random site and the cloud. We establish end-to-end encryption across VPC/VNets and regions for secure transfer of data. We also have extensive auditing and logging for tasks performed on the system, as well as integrated network and policy with threat detection and deep packet inspection.
What future trends do you foresee in the intersection of AI and cloud computing, and how is Aviatrix preparing to address these trends?
I see the interaction of AI and cloud computing birthing incredible automation capabilities in key areas such as networking, security, visibility, and troubleshooting for significant cost savings and efficiency.
It could also analyze the different types of data entering the network and recommend the most suitable policies or security compliances. Similarly, if a customer needed to enforce HIPAA, this solution could scan through the customer’s networks and then recommend a corresponding strategy.
Troubleshooting is a major investment because it requires a call center to assist customers. However, most of these issues don’t necessitate human intervention.
Generative AI (GenAI) will also be a game changer for cloud computing. Today, a topology is a day-zero decision – once an architecture or networking topology gets built, it is difficult to make changes. One potential use case I believe is on the horizon is a solution that could recommend an optimal topology based on certain requirements. Another problem that GenAI could solve is related to security policies, which quickly become outdated after a few years. AGenAI solution could help users routinely create new security stacks per new laws and regulations.
Aviatrix can implement the same security architecture for a datacenter with our edge solution, given that more AI will sit close to the data sources. We can help connect branches and sites to the cloud and edge with AI computes running.
We also help in B2B integration with different customers or entities in the same company with separate operating models.
AI is driving new and exciting computing trends that will impact how infrastructure is built. At Aviatrix, we’re looking forward to seizing the moment with our secure and seamless cloud networking solution.
Thank you for the great interview, readers who wish to learn more should visit Aviatrix.
#agile#ai#AI and cloud#AI and cloud computing#AI engines#AI integration#ai model#ai tools#Amazon#amp#APIs#applications#architecture#assets#automation#Aviatrix#awareness#AWS#azure#B2B#biotechnology#bug#Building#call center#certification#Cloud#cloud computing#cloud data#cloud infrastructure#cloud network
1 note
·
View note
Text
https://justpaste.it/b4afa
Is your current IT infrastructure starting to feel like a pair of outdated shoes? Reliable, sure, but maybe a little worn and hindering your ability to move forward. In today's fast-paced business world, you need agility and flexibility to stay competitive. That's where hybrid cloud infrastructure comes in.
#cloud services houston#enterprise network architect services#cloud computing houston#cloud based network security#houston managed it#intelligent architecture consulting#cloud computing in houston#cybersecurity consulting#secure cloud computing
0 notes
Text
Tech Breakdown: What Is a SuperNIC? Get the Inside Scoop!

The most recent development in the rapidly evolving digital realm is generative AI. A relatively new phrase, SuperNIC, is one of the revolutionary inventions that makes it feasible.
Describe a SuperNIC
On order to accelerate hyperscale AI workloads on Ethernet-based clouds, a new family of network accelerators called SuperNIC was created. With remote direct memory access (RDMA) over converged Ethernet (RoCE) technology, it offers extremely rapid network connectivity for GPU-to-GPU communication, with throughputs of up to 400Gb/s.
SuperNICs incorporate the following special qualities:
Ensuring that data packets are received and processed in the same sequence as they were originally delivered through high-speed packet reordering. This keeps the data flow’s sequential integrity intact.
In order to regulate and prevent congestion in AI networks, advanced congestion management uses network-aware algorithms and real-time telemetry data.
In AI cloud data centers, programmable computation on the input/output (I/O) channel facilitates network architecture adaptation and extension.
Low-profile, power-efficient architecture that effectively handles AI workloads under power-constrained budgets.
Optimization for full-stack AI, encompassing system software, communication libraries, application frameworks, networking, computing, and storage.
Recently, NVIDIA revealed the first SuperNIC in the world designed specifically for AI computing, built on the BlueField-3 networking architecture. It is a component of the NVIDIA Spectrum-X platform, which allows for smooth integration with the Ethernet switch system Spectrum-4.
The NVIDIA Spectrum-4 switch system and BlueField-3 SuperNIC work together to provide an accelerated computing fabric that is optimized for AI applications. Spectrum-X outperforms conventional Ethernet settings by continuously delivering high levels of network efficiency.
Yael Shenhav, vice president of DPU and NIC products at NVIDIA, stated, “In a world where AI is driving the next wave of technological innovation, the BlueField-3 SuperNIC is a vital cog in the machinery.” “SuperNICs are essential components for enabling the future of AI computing because they guarantee that your AI workloads are executed with efficiency and speed.”
The Changing Environment of Networking and AI
Large language models and generative AI are causing a seismic change in the area of artificial intelligence. These potent technologies have opened up new avenues and made it possible for computers to perform new functions.
GPU-accelerated computing plays a critical role in the development of AI by processing massive amounts of data, training huge AI models, and enabling real-time inference. While this increased computing capacity has created opportunities, Ethernet cloud networks have also been put to the test.
The internet’s foundational technology, traditional Ethernet, was designed to link loosely connected applications and provide wide compatibility. The complex computational requirements of contemporary AI workloads, which include quickly transferring large amounts of data, closely linked parallel processing, and unusual communication patterns all of which call for optimal network connectivity were not intended for it.
Basic network interface cards (NICs) were created with interoperability, universal data transfer, and general-purpose computing in mind. They were never intended to handle the special difficulties brought on by the high processing demands of AI applications.
The necessary characteristics and capabilities for effective data transmission, low latency, and the predictable performance required for AI activities are absent from standard NICs. In contrast, SuperNICs are designed specifically for contemporary AI workloads.
Benefits of SuperNICs in AI Computing Environments
Data processing units (DPUs) are capable of high throughput, low latency network connectivity, and many other sophisticated characteristics. DPUs have become more and more common in the field of cloud computing since its launch in 2020, mostly because of their ability to separate, speed up, and offload computation from data center hardware.
SuperNICs and DPUs both have many characteristics and functions in common, however SuperNICs are specially designed to speed up networks for artificial intelligence.
The performance of distributed AI training and inference communication flows is highly dependent on the availability of network capacity. Known for their elegant designs, SuperNICs scale better than DPUs and may provide an astounding 400Gb/s of network bandwidth per GPU.
When GPUs and SuperNICs are matched 1:1 in a system, AI workload efficiency may be greatly increased, resulting in higher productivity and better business outcomes.
SuperNICs are only intended to speed up networking for cloud computing with artificial intelligence. As a result, it uses less processing power than a DPU, which needs a lot of processing power to offload programs from a host CPU.
Less power usage results from the decreased computation needs, which is especially important in systems with up to eight SuperNICs.
One of the SuperNIC’s other unique selling points is its specialized AI networking capabilities. It provides optimal congestion control, adaptive routing, and out-of-order packet handling when tightly connected with an AI-optimized NVIDIA Spectrum-4 switch. Ethernet AI cloud settings are accelerated by these cutting-edge technologies.
Transforming cloud computing with AI
The NVIDIA BlueField-3 SuperNIC is essential for AI-ready infrastructure because of its many advantages.
Maximum efficiency for AI workloads: The BlueField-3 SuperNIC is perfect for AI workloads since it was designed specifically for network-intensive, massively parallel computing. It guarantees bottleneck-free, efficient operation of AI activities.
Performance that is consistent and predictable: The BlueField-3 SuperNIC makes sure that each job and tenant in multi-tenant data centers, where many jobs are executed concurrently, is isolated, predictable, and unaffected by other network operations.
Secure multi-tenant cloud infrastructure: Data centers that handle sensitive data place a high premium on security. High security levels are maintained by the BlueField-3 SuperNIC, allowing different tenants to cohabit with separate data and processing.
Broad network infrastructure: The BlueField-3 SuperNIC is very versatile and can be easily adjusted to meet a wide range of different network infrastructure requirements.
Wide compatibility with server manufacturers: The BlueField-3 SuperNIC integrates easily with the majority of enterprise-class servers without using an excessive amount of power in data centers.
#Describe a SuperNIC#On order to accelerate hyperscale AI workloads on Ethernet-based clouds#a new family of network accelerators called SuperNIC was created. With remote direct memory access (RDMA) over converged Ethernet (RoCE) te#it offers extremely rapid network connectivity for GPU-to-GPU communication#with throughputs of up to 400Gb/s.#SuperNICs incorporate the following special qualities:#Ensuring that data packets are received and processed in the same sequence as they were originally delivered through high-speed packet reor#In order to regulate and prevent congestion in AI networks#advanced congestion management uses network-aware algorithms and real-time telemetry data.#In AI cloud data centers#programmable computation on the input/output (I/O) channel facilitates network architecture adaptation and extension.#Low-profile#power-efficient architecture that effectively handles AI workloads under power-constrained budgets.#Optimization for full-stack AI#encompassing system software#communication libraries#application frameworks#networking#computing#and storage.#Recently#NVIDIA revealed the first SuperNIC in the world designed specifically for AI computing#built on the BlueField-3 networking architecture. It is a component of the NVIDIA Spectrum-X platform#which allows for smooth integration with the Ethernet switch system Spectrum-4.#The NVIDIA Spectrum-4 switch system and BlueField-3 SuperNIC work together to provide an accelerated computing fabric that is optimized for#Yael Shenhav#vice president of DPU and NIC products at NVIDIA#stated#“In a world where AI is driving the next wave of technological innovation#the BlueField-3 SuperNIC is a vital cog in the machinery.” “SuperNICs are essential components for enabling the future of AI computing beca
1 note
·
View note
Text
coding is the easiest part of IT in uni...
nothing in the world makes you feel quite as stupid as coding
48 notes
·
View notes
Note
would life as a larrow suck? like if you could choose to be isekai'd as a larrow rn would you take it up? what about the other way around, would a larrow want to be us
It doesn't really suck anymore than life as a human does, but a lot of humans would see it as bad or stressful in certain ways:
Larrow imago usually only live about 30 years, and it's not super abnormal for them to die before 20. They're also very tiny (like on average the size of a button quail or a smallish parrot) so compared to humans they seem pretty fragile.
Their society doesn't consistently exist; eggs are produced, hatched and grow up at roughly the same time, and all the larrow of a single generation usually die off entirely before new ones emerge from the ocean (with an occasional outlier). That next generation isn't exactly the same culture as before, just formed through similar needs and off of the technology left behind by the last. their whole 'rome falls every few decades' set up would probably be very offputting to most alien cultures
They have next to no health care; larrow learn medical care by themselves, for themselves, and they practice surgery and similarly extreme procedures on themselves quite regularly.
Larrow are basically fine not socializing and will sometimes go years without talking to one another; it's to a degree where even anti social humans may be stressed and lonely. They also don't really show a ton of concern for other people and animals, empathy is more of a philosophical idea than this totally innate thing.
The world they live has very extreme storms; their average low winds would be difficult for a human to walk around in. They don't have houses but public access "storm shelters" which, from a human perspective, look woefully incompetent as they're full of holes and look more like animal nests than a "real" building
On the other hand:
Larrow are adapted to live in an environment with constantly moving air and are instinctively adverse to areas with stagnant air, as they struggle to breathe in it and it can make them really sick. Human buildings seem really gross to them in the same way rot or mold does to us
The way humans are constantly trailing each other and actively trying to initiate touching and interaction all the time feels both animal-like and weird/scammy/aggressive to them, our social behavior is their "about to get mugged" behavior
complex nest building in constant storms was like their main evolutionary pressure to Get Good with the brain power, so they're very technologically minded in a way humans just aren't. They could open up a human car or computer (or indeed a body) for the first time and understand how it worked back to front. This is all just architecture to their lizard brains. Which means humans needing to go to school to study this stuff sounds like, really stupid to a larrow.
the whole idea that humans will bribe other humans to knock them out and operate on them sounds like a horror show. What if the doctors got bored and left? What if it turned out they wanted to hurt you while you were asleep? If letting other people chop you up is a normal cultural quirk why do they keep making scary movies about it
the way humans have all these complex daily networks of giving things up and gaining them is confusing and stressful. they're kind of like that boar in this tumblr post
This is all to say many humans would see larrow as living short lonely survivalist lives in ramshackle houses in a dying culture too selfish to care about each other, where many larrow would see humans as spending most of their lives in gross little prisons being so incompetent at everything that they'll die of minor ailments like "tumors" and "internal bleeding" if other humans don't randomly take pity on them.
Not to say some people wouldn't be interested or jealous about aspects of each other's lives... "what if you could just fly alone for weeks at a time and work on the first draft of your novel" would obviously be appealing to a lot of humans, and getting to root through a world of completely alien tech and biology would make a larrow feel like it was one of these caddisflies

245 notes
·
View notes
Text
let's talk about radiant garden!
hello and welcome back to another installment of KH3 Retry, my chaotic thought experiment where i try to fix everything i hate about the game
i've said it before and i'll say it again: radiant garden should have been the playable hub world instead of twilight town. there are so many plot threads wrapped up in this world, so many paths that cross here, and it's a shame that kh3 never bothered to explore them in any meaningful way
instead, all of the world's depth is flattened into set dressing for tedious exposition, with all the things that made it memorable either cut entirely or moved to twilight town, a poor substitute which is itself lacking in any meaningful development
so let's talk about it! i have a veritable mountain of ideas for what radiant garden could have been like in a universe where it continued to matter after bbs
take my hand
even beyond the general lack of final fantasy in kh3, which is its own can of worms, brushing the restoration committee aside and reducing all of their hard work to an unplayable HD recreation of the bbs map is downright bleak. as much as nomura wants to, you can't just sweep legacy characters under the rug and expect me to forget about them. i'm glad they at least got to appear in re:mind, but it doesn't change the fact that their absence feels like a massive, gaping hole in reality, like the universe has written them out of existence. i'm sure sora can relate
the problem is best summed up by ienzo:
yeah. that's called regression, and it sucks.
so on that note, please disregard (almost) everything that happens in radiant garden in kh3, because we are starting from scratch babeyyyy!!!
this got really long so i broke it down into sections covering different topics
-----
introduction
the town, finally livable again, looks not quite like the utopia of its past, but still beautiful, with the gardens of its namesake in full bloom and the streets filled with smiling faces
the debris has been cleared away to make room for zigzagging rows of houses and apartments, all built in a mish-mash of styles, sizes, and colors—a mosaic of the lives lived outside of this world. from a distance, the vast array of colors resembles a flowerbed, vibrant and alive
baskets of multicolored flowers hang from windows and the beginnings of vines grow around corners. now that the aqueduct system has been restored, life has really begun to flourish all around
patchwork stone walls and bridges weave through the town and line the border. outside the city walls, the water levels have risen and settled, but you can still see remnants of crumbling, moss-covered architecture poking through the surface
finally: the castle, once a pristine but imposing fortress, has been repurposed as a community center. the gates and guards have been removed so that the townspeople can visit freely, and indeed the balconies and halls are usually busy. just like the rest of town, plants bloom in abundance along its facade, nurtured by the light
the library has been reopened and other public services have moved into the castle to help with day to day life. however, some areas are closed off to the public for safety reasons
-----
characters
we'll start with cid—a brand new helipad and gummi garage have been built into one of the castle's tallest towers, and, naturally, he's in charge! now that the restoration is complete, he can focus on his true passion: flying contraptions :) he offers special blueprints for completing gummi ship challenges (including races, maybe??). he also runs a revamped gummi shop, with assistance from chip and dale
speaking of chip and dale, they've been busy. on top of inventing the gummiphone, they've also set up an inter-world network to connect the computers in disney castle and radiant garden, among other places, so they can share data, including the data from jiminy's journals
as a result, data riku gets a cameo as the equivalent of the network's clippy
over in the castle's lab, ienzo and leon are sorting through all of ansem the wise's notes for anything that might help sora or the town. they're working together, but the alliance is...uneasy. ienzo, dilan, and aeleus were, of course, with the people who kidnapped and experimented on civilians before inviting the darkness that destroyed everything. leon only agrees to their involvement on the condition that he supervises, and he always keeps his gunblade within reach
while leon manages the lab, yuffie manages aeleus and dilan as captain of the guard—or, as she calls it, Supreme Ninja Guardian. goofy congratulates her on the promotion! the two men don't particularly enjoy reporting to a teenager, but they also don't put up a fight because yuffie is actually quite reliable despite her antics, and she knows the town like the back of her hand. mainly they deal with any stray heartless that the claymore defense sytem doesn't catch. they feel that it's the least they can do
back in town, a new and improved shopping district has opened up, which is where you'll find aerith's gardening shop! you can trade her common cooking ingredients for specialty ones that she grows herself. when she's not running the shop, she's usually tending to the flowers around town or helping with the community garden
merlin's house hasn't changed, but it has moved, as is his tendency. it's now situated in a park on the outskirts of town, away from all the hubbub. since it's no longer being used as a base of operations, all the computer junk has been excised so he can finally have some peace and quiet. he's recently come into possession of a new project, which we'll get into later
after the events of this game, when ansem the wise has returned to radiant garden, he retires to live out the remainder of his days in peace, leaving the lab in ienzo's hands. the town has moved on without him and has no need for the rulers of its past. his former apprentices, especially ienzo, visit him from time to time, and i think he'd get on well with merlin
-----
axel and kairi
okay so axel and kairi! remember how both of them are from radiant garden? well instead of locking the two of them in a hyperbolic time chamber while the plot stalls out, how about letting them hang out here and bond over the things they have in common?
imagine axel's history with me. lea knew kairi's grandma as the kind old lady down the road who would hand out treats to all the neighborhood kids. he and isa once played a childish prank on her and got in heaps of trouble with their parents. they had to apologize to her in front of a crowd, which convinced them to never pull a stunt like that again (instead, they pivoted toward sneakier, much more dangerous stunts)
axel is also roughly the same age as leon (based on kh1 concept art and inference) so they probably went to school together, though they hung out with different crowds. leon remembers lea as an obnoxious class clown, but axel remembers squall as a broody punk. i think they'd get along now. imagine the banter
since they're not doing any dumb keyblade training
axel takes kairi on a tour of the town and shows her where her grandmother's house was. unfortunately, the lot is now empty, having been cleared of the wreckage. as tribute, kairi picks some of the nearby flowers and lays them in the place it used to stand
her conversations with axel help to clear up some of her hazy memories, which is something she's always been a little scared to do, but now something for which she's grateful. axel's just glad that he's doing something good for once
as kairi's happy memories begin to resurface, so too do the bad ones, and eventually they lead her deep within the castle to the ark where xehanort upended her life. she finds another one of xehanort's reports here with cryptic hints about what his intentions really were—something related to what he calls "the other side" of light and darkness
this concept is vaguely familiar to ienzo as something he overheard in the castle as a child, but he doesn't know any more about it. with any luck, something will turn up in ansem's notes
and then there's subject x, the girl axel and saix befriended inside the castle as children. i'll talk more about this further down
-----
gameplay
one of the defining features of the rebuilt radiant garden is that all the new architecture allows for a variety of ways to get around. you can take the stairs and bridges, of course, but you can also glide along the aqueducts, climb over rooftops, and swing across steel beams
i have a specific vision of being able to parkour your way up and down the outside of the castle on a series of jungle gym contraptions
it should also be noted that i have nothing but disdain for kh3's wall running ability, as i feel it takes all the fun out of platforming, so go ahead and pretend that doesn't exist
in addition to the gardening shop, the new shopping district houses the item, weapon, and accessory shops (manned by, who else, donald's nephews) as well as a moogle emporium for synthesis and keyblade upgrades
i'm also moving remi and the bistro here since twilight town is getting the axe. nothing else about them or the cooking minigames is changing, because they're fun and cute and i like them as is <3 i think scrooge decided to open shop here to stimulate the town's burgeoning economy. it's his way of helping
the outdoor movie theater can come too since it's related to the classic kingdom minigames. just stick it in a corner somewhere
-----
the castle
while the castle was being renovated, leon and the others uncovered even more secret passages, because this building is a lovecraftian nightmare. this is one of the areas barred off from the general public, but leon says that sora can go check it out whenever he has time. he might even join the party? 🤔
the passages lead deep into the earth and appear to be so old that ansem the wise may not have even known they existed
i've gotta tread lightly when it comes to lore that might be overturned in the future, but basically i want this to be an optional dungeon, à la cavern of remembrance, that hints at a connection to scala ad caelum and/or daybreak town. but the specifics are undecided
maybe the dark inferno boss can be moved here?? gotta think more about that one
also related to exploring the castle, i think we should get to see the chamber of repose and the prison cells connected to it, possibly by way of the new passages. both of these things play a role in the story
the chamber represents the part of xemnas that remembers being terra, which is something i want to flesh out more in this AU, to give xemnas more of an identity than master xehanort's goon. perhaps he and anti-aqua (see here for details) have a confrontation? imagine aqua discovering her armor in xemnas's secret clubhouse, imagine how conflicted she'd feel about him being her enemy
as for the prison...
-----
subject x
the prison cells once housed a girl known only by the designation of "subject x," a girl whom team nort seems very interested in these days
when subject x vanished, apprentice xehanort's experiments were brought to an abrupt halt. now, ansem SoD, ever the scientist, is spearheading the search to find the test subject that got away so he can finally complete the research he started all those years ago
saix, meanwhile, has been waiting for this opportunity since the day he joined the organization, and so volunteers to assist. if he plays his cards right, he may be able to kill two birds with one stone: find his friend, and commit subterfuge
but while ansem SoD is convinced that his old master had something to do with the girl's disappearance, saix is more perceptive. he had never trusted xigbar to begin with, but now the man is acting even more suspicious whenever the topic arises
at some point i want saix to go pester axel and try to deliver a covert message about the organization's plans, including subject x. he's a double agent, after all
axel doesn't have much reason to trust saix, but he takes the hint and goes to check the prison cell where they talked to her. what he finds is evidence that she must have been taken by someone within the castle, i.e. a keycard or something
basically i want saix and axel to have a more active role in this plot thread, seeing as it's the reason they joined the organization in the first place
unfortunately the subject x stuff can't really be resolved in this game since we still don't know her identity for sure. but since she's definitely from the union x era, i'm thinking maybe i can leave a clue in that optional dungeon, along with all the other stuff related to the age of fairytales
-----
hundred acre woods
also when those secret passages in the castle were uncovered, they found something else of interest: another volume of the winnie the pooh books, which merlin has been studying. it's in pretty bad condition, and while he's been trying to restore it, he's hit a wall, and so asks sora to check it out from the inside
inside, sora discovers that the books contain a shared universe, but the pathway to the first book is blocked due to the damage to the book's structure
it's implied that there's a whole series of these books, which merlin has been trying to collect for millenia
i'm cutting the entire plot of kh3's hundred acre woods because it goes nowhere and i hate it. what i would like to do is find a way to shoehorn in the plot of the tigger movie, but i haven't thought it through
in any case, you can count on more minigames 💃
-----
miscellaneous thoughts that didn't fit anywhere else
i wonder how riku feels about being back in the castle where he experienced his darkest moments. i go back and forth about this
in case you're wondering, my headcanon is that cloud isn't from radiant garden. i haven't decided if he's showing up in this AU, but if he does, it'll be in a different world. maybe he keeps in touch with aerith though?
with all that said, i would be down for a rinoa cameo! kh2 got my hopes up ;__;
i have an inkling of a potential tron/rinzler cameo by virtue of the bug blox appearing in san fransokyo. haven't worked through all that though. maybe the inter-world network intercepts a rogue signal that corrupts some data in the hollow bastion OS or something, idk
speaking of which, i know i also want to loop yen sid in to the network, simply because i never want to see the inside of his tower ever again. this could have been an email etc. etc. and if i have anything to say about it, it will be
i guess i could connect twilight town as well, but the problem is that nothing happens there, which is why i wanted to remove it in the first place
#hoo wee that was a lot#radiant garden is...so important to me#i spent several days on this post to ensure i wouldn't forget anything#but knowing me i'll remember something as soon as i hit post#kingdom hearts#kh3 retry#<- check out my other posts here
64 notes
·
View notes
Text
information flow in transformers
In machine learning, the transformer architecture is a very commonly used type of neural network model. Many of the well-known neural nets introduced in the last few years use this architecture, including GPT-2, GPT-3, and GPT-4.
This post is about the way that computation is structured inside of a transformer.
Internally, these models pass information around in a constrained way that feels strange and limited at first glance.
Specifically, inside the "program" implemented by a transformer, each segment of "code" can only access a subset of the program's "state." If the program computes a value, and writes it into the state, that doesn't make value available to any block of code that might run after the write; instead, only some operations can access the value, while others are prohibited from seeing it.
This sounds vaguely like the kind of constraint that human programmers often put on themselves: "separation of concerns," "no global variables," "your function should only take the inputs it needs," that sort of thing.
However, the apparent analogy is misleading. The transformer constraints don't look much like anything that a human programmer would write, at least under normal circumstances. And the rationale behind them is very different from "modularity" or "separation of concerns."
(Domain experts know all about this already -- this is a pedagogical post for everyone else.)
1. setting the stage
For concreteness, let's think about a transformer that is a causal language model.
So, something like GPT-3, or the model that wrote text for @nostalgebraist-autoresponder.
Roughly speaking, this model's input is a sequence of words, like ["Fido", "is", "a", "dog"].
Since the model needs to know the order the words come in, we'll include an integer offset alongside each word, specifying the position of this element in the sequence. So, in full, our example input is
[ ("Fido", 0), ("is", 1), ("a", 2), ("dog", 3), ]
The model itself -- the neural network -- can be viewed as a single long function, which operates on a single element of the sequence. Its task is to output the next element.
Let's call the function f. If f does its job perfectly, then when applied to our example sequence, we will have
f("Fido", 0) = "is" f("is", 1) = "a" f("a", 2) = "dog"
(Note: I've omitted the index from the output type, since it's always obvious what the next index is. Also, in reality the output type is a probability distribution over words, not just a word; the goal is to put high probability on the next word. I'm ignoring this to simplify exposition.)
You may have noticed something: as written, this seems impossible!
Like, how is the function supposed to know that after ("a", 2), the next word is "dog"!? The word "a" could be followed by all sorts of things.
What makes "dog" likely, in this case, is the fact that we're talking about someone named "Fido."
That information isn't contained in ("a", 2). To do the right thing here, you need info from the whole sequence thus far -- from "Fido is a", as opposed to just "a".
How can f get this information, if its input is just a single word and an index?
This is possible because f isn't a pure function. The program has an internal state, which f can access and modify.
But f doesn't just have arbitrary read/write access to the state. Its access is constrained, in a very specific sort of way.
2. transformer-style programming
Let's get more specific about the program state.
The state consists of a series of distinct "memory regions" or "blocks," which have an order assigned to them.
Let's use the notation memory_i for these. The first block is memory_0, the second is memory_1, and so on.
In practice, a small transformer might have around 10 of these blocks, while a very large one might have 100 or more.
Each block contains a separate data-storage "cell" for each offset in the sequence.
For example, memory_0 contains a cell for position 0 ("Fido" in our example text), and a cell for position 1 ("is"), and so on. Meanwhile, memory_1 contains its own, distinct cells for each of these positions. And so does memory_2, etc.
So the overall layout looks like:
memory_0: [cell 0, cell 1, ...] memory_1: [cell 0, cell 1, ...] [...]
Our function f can interact with this program state. But it must do so in a way that conforms to a set of rules.
Here are the rules:
The function can only interact with the blocks by using a specific instruction.
This instruction is an "atomic write+read". It writes data to a block, then reads data from that block for f to use.
When the instruction writes data, it goes in the cell specified in the function offset argument. That is, the "i" in f(..., i).
When the instruction reads data, the data comes from all cells up to and including the offset argument.
The function must call the instruction exactly once for each block.
These calls must happen in order. For example, you can't do the call for memory_1 until you've done the one for memory_0.
Here's some pseudo-code, showing a generic computation of this kind:
f(x, i) { calculate some things using x and i; // next 2 lines are a single instruction write to memory_0 at position i; z0 = read from memory_0 at positions 0...i; calculate some things using x, i, and z0; // next 2 lines are a single instruction write to memory_1 at position i; z1 = read from memory_1 at positions 0...i; calculate some things using x, i, z0, and z1; [etc.] }
The rules impose a tradeoff between the amount of processing required to produce a value, and how early the value can be accessed within the function body.
Consider the moment when data is written to memory_0. This happens before anything is read (even from memory_0 itself).
So the data in memory_0 has been computed only on the basis of individual inputs like ("a," 2). It can't leverage any information about multiple words and how they relate to one another.
But just after the write to memory_0, there's a read from memory_0. This read pulls in data computed by f when it ran on all the earlier words in the sequence.
If we're processing ("a", 2) in our example, then this is the point where our code is first able to access facts like "the word 'Fido' appeared earlier in the text."
However, we still know less than we might prefer.
Recall that memory_0 gets written before anything gets read. The data living there only reflects what f knows before it can see all the other words, while it still only has access to the one word that appeared in its input.
The data we've just read does not contain a holistic, "fully processed" representation of the whole sequence so far ("Fido is a"). Instead, it contains:
a representation of ("Fido", 0) alone, computed in ignorance of the rest of the text
a representation of ("is", 1) alone, computed in ignorance of the rest of the text
a representation of ("a", 2) alone, computed in ignorance of the rest of the text
Now, once we get to memory_1, we will no longer face this problem. Stuff in memory_1 gets computed with the benefit of whatever was in memory_0. The step that computes it can "see all the words at once."
Nonetheless, the whole function is affected by a generalized version of the same quirk.
All else being equal, data stored in later blocks ought to be more useful. Suppose for instance that
memory_4 gets read/written 20% of the way through the function body, and
memory_16 gets read/written 80% of the way through the function body
Here, strictly more computation can be leveraged to produce the data in memory_16. Calculations which are simple enough to fit in the program, but too complex to fit in just 20% of the program, can be stored in memory_16 but not in memory_4.
All else being equal, then, we'd prefer to read from memory_16 rather than memory_4 if possible.
But in fact, we can only read from memory_16 once -- at a point 80% of the way through the code, when the read/write happens for that block.
The general picture looks like:
The early parts of the function can see and leverage what got computed earlier in the sequence -- by the same early parts of the function. This data is relatively "weak," since not much computation went into it. But, by the same token, we have plenty of time to further process it.
The late parts of the function can see and leverage what got computed earlier in the sequence -- by the same late parts of the function. This data is relatively "strong," since lots of computation went into it. But, by the same token, we don't have much time left to further process it.
3. why?
There are multiple ways you can "run" the program specified by f.
Here's one way, which is used when generating text, and which matches popular intuitions about how language models work:
First, we run f("Fido", 0) from start to end. The function returns "is." As a side effect, it populates cell 0 of every memory block.
Next, we run f("is", 1) from start to end. The function returns "a." As a side effect, it populates cell 1 of every memory block.
Etc.
If we're running the code like this, the constraints described earlier feel weird and pointlessly restrictive.
By the time we're running f("is", 1), we've already populated some data into every memory block, all the way up to memory_16 or whatever.
This data is already there, and contains lots of useful insights.
And yet, during the function call f("is", 1), we "forget about" this data -- only to progressively remember it again, block by block. The early parts of this call have only memory_0 to play with, and then memory_1, etc. Only at the end do we allow access to the juicy, extensively processed results that occupy the final blocks.
Why? Why not just let this call read memory_16 immediately, on the first line of code? The data is sitting there, ready to be used!
Why? Because the constraint enables a second way of running this program.
The second way is equivalent to the first, in the sense of producing the same outputs. But instead of processing one word at a time, it processes a whole sequence of words, in parallel.
Here's how it works:
In parallel, run f("Fido", 0) and f("is", 1) and f("a", 2), up until the first write+read instruction. You can do this because the functions are causally independent of one another, up to this point. We now have 3 copies of f, each at the same "line of code": the first write+read instruction.
Perform the write part of the instruction for all the copies, in parallel. This populates cells 0, 1 and 2 of memory_0.
Perform the read part of the instruction for all the copies, in parallel. Each copy of f receives some of the data just written to memory_0, covering offsets up to its own. For instance, f("is", 1) gets data from cells 0 and 1.
In parallel, continue running the 3 copies of f, covering the code between the first write+read instruction and the second.
Perform the second write. This populates cells 0, 1 and 2 of memory_1.
Perform the second read.
Repeat like this until done.
Observe that mode of operation only works if you have a complete input sequence ready before you run anything.
(You can't parallelize over later positions in the sequence if you don't know, yet, what words they contain.)
So, this won't work when the model is generating text, word by word.
But it will work if you have a bunch of texts, and you want to process those texts with the model, for the sake of updating the model so it does a better job of predicting them.
This is called "training," and it's how neural nets get made in the first place. In our programming analogy, it's how the code inside the function body gets written.
The fact that we can train in parallel over the sequence is a huge deal, and probably accounts for most (or even all) of the benefit that transformers have over earlier architectures like RNNs.
Accelerators like GPUs are really good at doing the kinds of calculations that happen inside neural nets, in parallel.
So if you can make your training process more parallel, you can effectively multiply the computing power available to it, for free. (I'm omitting many caveats here -- see this great post for details.)
Transformer training isn't maximally parallel. It's still sequential in one "dimension," namely the layers, which correspond to our write+read steps here. You can't parallelize those.
But it is, at least, parallel along some dimension, namely the sequence dimension.
The older RNN architecture, by contrast, was inherently sequential along both these dimensions. Training an RNN is, effectively, a nested for loop. But training a transformer is just a regular, single for loop.
4. tying it together
The "magical" thing about this setup is that both ways of running the model do the same thing. You are, literally, doing the same exact computation. The function can't tell whether it is being run one way or the other.
This is crucial, because we want the training process -- which uses the parallel mode -- to teach the model how to perform generation, which uses the sequential mode. Since both modes look the same from the model's perspective, this works.
This constraint -- that the code can run in parallel over the sequence, and that this must do the same thing as running it sequentially -- is the reason for everything else we noted above.
Earlier, we asked: why can't we allow later (in the sequence) invocations of f to read earlier data out of blocks like memory_16 immediately, on "the first line of code"?
And the answer is: because that would break parallelism. You'd have to run f("Fido", 0) all the way through before even starting to run f("is", 1).
By structuring the computation in this specific way, we provide the model with the benefits of recurrence -- writing things down at earlier positions, accessing them at later positions, and writing further things down which can be accessed even later -- while breaking the sequential dependencies that would ordinarily prevent a recurrent calculation from being executed in parallel.
In other words, we've found a way to create an iterative function that takes its own outputs as input -- and does so repeatedly, producing longer and longer outputs to be read off by its next invocation -- with the property that this iteration can be run in parallel.
We can run the first 10% of every iteration -- of f() and f(f()) and f(f(f())) and so on -- at the same time, before we know what will happen in the later stages of any iteration.
The call f(f()) uses all the information handed to it by f() -- eventually. But it cannot make any requests for information that would leave itself idling, waiting for f() to fully complete.
Whenever f(f()) needs a value computed by f(), it is always the value that f() -- running alongside f(f()), simultaneously -- has just written down, a mere moment ago.
No dead time, no idling, no waiting-for-the-other-guy-to-finish.
p.s.
The "memory blocks" here correspond to what are called "keys and values" in usual transformer lingo.
If you've heard the term "KV cache," it refers to the contents of the memory blocks during generation, when we're running in "sequential mode."
Usually, during generation, one keeps this state in memory and appends a new cell to each block whenever a new token is generated (and, as a result, the sequence gets longer by 1).
This is called "caching" to contrast it with the worse approach of throwing away the block contents after each generated token, and then re-generating them by running f on the whole sequence so far (not just the latest token). And then having to do that over and over, once per generated token.
#ai tag#is there some standard CS name for the thing i'm talking about here?#i feel like there should be#but i never heard people mention it#(or at least i've never heard people mention it in a way that made the connection with transformers clear)
313 notes
·
View notes
Text
The history of computing is one of innovation followed by scale up which is then broken by a model that “scales out”—when a bigger and faster approach is replaced by a smaller and more numerous approaches. Mainframe->Mini->Micro->Mobile, Big iron->Distributed computing->Internet, Cray->HPC->Intel/CISC->ARM/RISC, OS/360->VMS->Unix->Windows NT->Linux, and on and on. You can see this at these macro levels, or you can see it at the micro level when it comes to subsystems from networking to storage to memory. The past 5 years of AI have been bigger models, more data, more compute, and so on. Why? Because I would argue the innovation was driven by the cloud hyperscale companies and they were destined to take the approach of doing more of what they already did. They viewed data for training and huge models as their way of winning and their unique architectural approach. The fact that other startups took a similar approach is just Silicon Valley at work—the people move and optimize for different things at a micro scale without considering the larger picture. See the sociological and epidemiological term small area variation. They look to do what they couldn’t do at their previous efforts or what the previous efforts might have been overlooking.
- DeepSeek Has Been Inevitable and Here's Why (History Tells Us) by Steven Sinofsky
45 notes
·
View notes
Text
History and Basics of Language Models: How Transformers Changed AI Forever - and Led to Neuro-sama
I have seen a lot of misunderstandings and myths about Neuro-sama's language model. I have decided to write a short post, going into the history of and current state of large language models and providing some explanation about how they work, and how Neuro-sama works! To begin, let's start with some history.
Before the beginning
Before the language models we are used to today, models like RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory networks) were used for natural language processing, but they had a lot of limitations. Both of these architectures process words sequentially, meaning they read text one word at a time in order. This made them struggle with long sentences, they could almost forget the beginning by the time they reach the end.
Another major limitation was computational efficiency. Since RNNs and LSTMs process text one step at a time, they can't take full advantage of modern parallel computing harware like GPUs. All these fundamental limitations mean that these models could never be nearly as smart as today's models.
The beginning of modern language models
In 2017, a paper titled "Attention is All You Need" introduced the transformer architecture. It was received positively for its innovation, but no one truly knew just how important it is going to be. This paper is what made modern language models possible.
The transformer's key innovation was the attention mechanism, which allows the model to focus on the most relevant parts of a text. Instead of processing words sequentially, transformers process all words at once, capturing relationships between words no matter how far apart they are in the text. This change made models faster, and better at understanding context.
The full potential of transformers became clearer over the next few years as researchers scaled them up.
The Scale of Modern Language Models
A major factor in an LLM's performance is the number of parameters - which are like the model's "neurons" that store learned information. The more parameters, the more powerful the model can be. The first GPT (generative pre-trained transformer) model, GPT-1, was released in 2018 and had 117 million parameters. It was small and not very capable - but a good proof of concept. GPT-2 (2019) had 1.5 billion parameters - which was a huge leap in quality, but it was still really dumb compared to the models we are used to today. GPT-3 (2020) had 175 billion parameters, and it was really the first model that felt actually kinda smart. This model required 4.6 million dollars for training, in compute expenses alone.
Recently, models have become more efficient: smaller models can achieve similar performance to bigger models from the past. This efficiency means that smarter and smarter models can run on consumer hardware. However, training costs still remain high.
How Are Language Models Trained?
Pre-training: The model is trained on a massive dataset to predict the next token. A token is a piece of text a language model can process, it can be a word, word fragment, or character. Even training relatively small models with a few billion parameters requires trillions of tokens, and a lot of computational resources which cost millions of dollars.
Post-training, including fine-tuning: After pre-training, the model can be customized for specific tasks, like answering questions, writing code, casual conversation, etc. Certain post-training methods can help improve the model's alignment with certain values or update its knowledge of specific domains. This requires far less data and computational power compared to pre-training.
The Cost of Training Large Language Models
Pre-training models over a certain size requires vast amounts of computational power and high-quality data. While advancements in efficiency have made it possible to get better performance with smaller models, models can still require millions of dollars to train, even if they have far fewer parameters than GPT-3.
The Rise of Open-Source Language Models
Many language models are closed-source, you can't download or run them locally. For example ChatGPT models from OpenAI and Claude models from Anthropic are all closed-source.
However, some companies release a number of their models as open-source, allowing anyone to download, run, and modify them.
While the larger models can not be run on consumer hardware, smaller open-source models can be used on high-end consumer PCs.
An advantage of smaller models is that they have lower latency, meaning they can generate responses much faster. They are not as powerful as the largest closed-source models, but their accessibility and speed make them highly useful for some applications.
So What is Neuro-sama?
Basically no details are shared about the model by Vedal, and I will only share what can be confidently concluded and only information that wouldn't reveal any sort of "trade secret". What can be known is that Neuro-sama would not exist without open-source large language models. Vedal can't train a model from scratch, but what Vedal can do - and can be confidently assumed he did do - is post-training an open-source model. Post-training a model on additional data can change the way the model acts and can add some new knowledge - however, the core intelligence of Neuro-sama comes from the base model she was built on. Since huge models can't be run on consumer hardware and would be prohibitively expensive to run through API, we can also say that Neuro-sama is a smaller model - which has the disadvantage of being less powerful, having more limitations, but has the advantage of low latency. Latency and cost are always going to pose some pretty strict limitations, but because LLMs just keep getting more efficient and better hardware is becoming more available, Neuro can be expected to become smarter and smarter in the future. To end, I have to at least mention that Neuro-sama is more than just her language model, though we have only talked about the language model in this post. She can be looked at as a system of different parts. Her TTS, her VTuber avatar, her vision model, her long-term memory, even her Minecraft AI, and so on, all come together to make Neuro-sama.
Wrapping up - Thanks for Reading!
This post was meant to provide a brief introduction to language models, covering some history and explaining how Neuro-sama can work. Of course, this post is just scratching the surface, but hopefully it gave you a clearer understanding about how language models function and their history!
33 notes
·
View notes
Text
Pros and Cons of Infrastructure as a Service (IaaS) in Cloud Computing - Here are the various advantages and challenges of IaaS that you should know about before using the service model.
#cloud services houston#enterprise network architect services#cloud computing houston#cloud based network security#houston managed it#intelligent architecture consulting#cloud computing in houston#cybersecurity consulting#secure cloud computing
0 notes
Text
Surveillance capitalists discovered that the most-predictive behavioral data come from intervening in the state of play in order to nudge, coax, tune, and herd behavior toward profitable outcome. Competitive pressures produced this shift, in which automated machine processes not only know our behavior but also shape our behavior at scale. With this reorientation from knowledge to power, it is no longer enough to automate information flows about us; the goal now is to automate us. In this phase of surveillance capitalism’s evolution, the means of production are subordinated to an increasingly complex and comprehensive “means of behavioral modification.” In this way, surveillance capitalism births a new species of power that I call instrumentarianism. Instrumentarian power knows and shapes human behavior toward others’ ends. Instead of armaments and armies, it works its will through the automated medium of an increasingly ubiquitous computational architecture of “smart” networked devices, things, and spaces.
Shoshana Zuboff, The Age of Surveillance Capitalism
47 notes
·
View notes