#MuJoCo Simulation | Explore Tumblr posts and blogs

softrobotcritics · 4 days ago

Text

Large Language Models and industrial manufacturing, a bibliography

References

Autodesk. (n.d.). Autodesk simulation. Retrieved July 14, 2023, from https://www.autodesk.com/solutions/simulation/overview

Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., . . . Zaremba, W. (2021). Evaluating large language models trained on code. ArXiv.https://doi.org/10.48550/arXiv.2107.03374

Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 30, pp. 4299–4307). Curran Associates. https://papers.nips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html

Dassault Systèmes. (n.d.). Dassault Systèmes simulation. Retrieved July 14, 2023, from https://www.3ds.com/products-services/simulia/overview/

Dhariwal, P., Jun, H., Payne, C., Kim, J. W., Radford, A., & Sutskever, I. (2020). Jukebox: A generative model for music. ArXiv. https://doi.org/10.48550/arXiv.2005.00341

Du, T., Inala, J. P., Pu, Y., Spielberg, A., Schulz, A., Rus, D., Solar-Lezama, A., & Matusik, W. (2018). InverseCSG: Automatic conversion of 3D models to CSG trees. ACM Transactions on Graphics (TOG), 37(6), Article 213. https://doi.org/10.1145/3272127.3275006

Du, T., Wu, K., Ma, P., Wah, S., Spielberg, A., Rus, D., & Matusik, W. (2021). DiffPD: Differentiable projective dynamics. ACM Transactions on Graphics (TOG), 41(2), Article 13. https://doi.org/10.1145/3490168

Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2023). Improving factuality and reasoning in language models through multiagent debate. ArXiv. https://doi.org/10.48550/arXiv.2305.14325

Erez, T., Tassa, Y., & Todorov, E. (2015). Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX. In Amato, N. (Ed.), Proceedings of the 2015 IEEE International Conference on Robotics and Automation (pp. 4397–4404). IEEE. https://doi.org/10.1109/ICRA.2015.7139807

Erps, T., Foshey, M., Luković, M. K., Shou, W., Goetzke, H. H., Dietsch, H., Stoll, K., von Vacano, B., & Matusik, W. (2021). Accelerated discovery of 3D printing materials using data-driven multiobjective optimization. Science Advances, 7(42), Article eabf7435. https://doi.org/10.1126/sciadv.abf7435

Featurescript introduction. (n.d.). Retrieved July 11, 2023, from https://cad.onshape.com/FsDoc/

Ferruz, N., Schmidt, S., & Höcker, B. (2022). ProtGPT2 is a deep unsupervised language model for protein design. Nature Communications, 13(1), Article 4348. https://doi.org/10.1038/s41467-022-32007-7

Guo, M., Thost, V., Li, B., Das, P., Chen, J., & Matusik, W. (2022). Data-efficient graph grammar learning for molecular generation. ArXiv. https://doi.org/10.48550/arXiv.2203.08031

Jiang, B., Chen, X., Liu, W., Yu, J., Yu, G., & Chen, T. (2023). MotionGPT: Human motion as a foreign language. ArXiv. https://doi.org/10.48550/arXiv.2306.14795

JSCAD user guide. (n.d.). Retrieved July 14, 2023, from https://openjscad.xyz/dokuwiki/doku.php

Kashefi, A., & Mukerji, T. (2023). ChatGPT for programming numerical methods. Journal of Machine Learning for Modeling and Computing, 4(2), 1–74. https://doi.org/10.1615/JMachLearnModelComput.2023048492

Koo, B., Hergel, J., Lefebvre, S., & Mitra, N. J. (2017). Towards zero-waste furniture design. IEEE Transactions on Visualization and Computer Graphics, 23(12), 2627–2640. https://doi.org/10.1109/TVCG.2016.2633519

Li, J., Rawn, E., Ritchie, J., Tran O’Leary, J., & Follmer, S. (2023). Beyond the artifact: Power as a lens for creativity support tools. In Follmer, S., Han, J., Steimle, J., & Riche, N. H. (Eds.), Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (Article 47). Association for Computing Machinery. https://doi.org/10.1145/3586183.3606831

Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., & Vondrick, C. (2023). Zero-1-to-3: Zero-shot one image to 3D object. ArXiv. https://doi.org/10.48550/arXiv.2303.11328

Ma, P., Du, T., Zhang, J. Z., Wu, K., Spielberg, A., Katzschmann, R. K., & Matusik, W. (2021). DiffAqua: A differentiable computational design pipeline for soft underwater swimmers with shape interpolation. ACM Transactions on Graphics (TOG), 40(4), Article 132. https://doi.org/10.1145/3450626.3459832

Makatura, L., Wang, B., Chen, Y.-L., Deng, B., Wojtan, C., Bickel, B., & Matusik, W. (2023). Procedural metamaterials: A unified procedural graph for metamaterial design. ACM Transactions on Graphics, 42(5), Article 168. https://doi.org/10.1145/3605389

Mathur, A., & Zufferey, D. (2021). Constraint synthesis for parametric CAD. In M. Okabe, S. Lee, B. Wuensche, & S. Zollmann (Eds.), Pacific Graphics 2021: The 29th Pacific Conference on Computer Graphics and Applications: Short Papers, Posters, and Work-in-Progress Papers (pp. 75–80). The Eurographics Association. https://doi.org/10.2312/pg.20211396

Mirchandani, S., Xia, F., Florence, P., Ichter, B., Driess, D., Arenas, M. G., Rao, K., Sadigh, D., & Zeng, A. (2023). Large language models as general pattern machines. ArXiv. https://doi.org/10.48550/arXiv.2307.04721

Müller, P., Wonka, P., Haegler, S., Ulmer, A., & Van Gool, L. (2006). Procedural modeling of buildings. In ACM SIGGRAPH 2006 papers (pp. 614–623). Association for Computing Machinery. https://doi.org/10.1145/1179352.1141931

Ni, B., & Buehler, M. J. (2024). MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge. Extreme Mechanics Letters, 67, Article 102131. https://doi.org/10.1016/j.eml.2024.102131

O’Brien, J. F., Shen, C., & Gatchalian, C. M. (2002). Synthesizing sounds from rigid-body simulations. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (pp. 175–181). Association for Computing Machinery. https://doi.org/10.1145/545261.545290

OpenAI. (2023). GPT-4 technical report. ArXiv. https://doi.org/10.48550/arXiv.2303.08774

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 27730–27744). Curran Associates. https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html

Özkar, M., & Stiny, G. (2009). Shape grammars. In ACM SIGGRAPH 2009 courses (Article 22). Association for Computing Machinery. https://doi.org/10.1145/1667239.1667261

Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Alobeidli, H., Pannier, B., Almazrouei, E., & Launay, J. (2023). The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data, and web data only. ArXiv. https://doi.org/10.48550/arXiv.2306.01116

Prusinkiewicz, P., & Lindenmayer, A. (1990). The algorithmic beauty of plants. Springer Science & Business Media.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019, February 14). Language models are unsupervised multitask learners. OpenAI. https://openai.com/index/better-language-models/

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. Proceedings of Machine Learning Research, 139, 8821–8831. https://proceedings.mlr.press/v139/ramesh21a.html

Repetier. (n.d.). Repetier software. Retrieved July 20, 2023, from https://www.repetier.com/

Richards, T. B. (n.d.). AutoGPT. Retrieved February 11, 2024, from https://github.com/Significant-Gravitas/AutoGPT

Rozenberg, G., & Salomaa, A. (1980). The mathematical theory of L systems. Academic Press.

Slic3r. (n.d.). Slic3r - Open source 3D printing toolbox. Retrieved July 20, 2023, from https://slic3r.org/

Stiny, G. (1980). Introduction to shape and shape grammars. Environment and Planning B: Planning and Design, 7(3), 343–351. https://doi.org/10.1068/b070343

Sullivan, D. M. (2013). Electromagnetic simulation using the FDTD method. John Wiley & Sons. https://doi.org/10.1002/9781118646700

Todorov, E., Erez, T., & Tassa, Y. (2012). MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 5026–5033). IEEE. https://doi.org/10.1109/IROS.2012.6386109

Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, s2-42(1), 230–265. https://doi.org/10.1112/plms/s2-42.1.230

Willis, K. D., Pu, Y., Luo, J., Chu, H., Du, T., Lambourne, J. G., Solar-Lezama, A., & Matusik, W. (2021). Fusion 360 gallery: A dataset and environment for programmatic CAD construction from human design sequences. ACM Transactions on Graphics (TOG), 40(4), Article 54. https://doi.org/10.1145/3450626.3459818

Xu, J., Chen, T., Zlokapa, L., Foshey, M., Matusik, W., Sueda, S., & Agrawal, P. (2021). An end-to-end differentiable framework for contact-aware robot design. ArXiv. https://doi.org/10.48550/arXiv.2107.07501

Yu, L., Shi, B., Pasunuru, R., Muller, B., Golovneva, O., Wang, T., Babu, A., Tang, B., Karrer, B., Sheynin, S., Ross, C., Polyak, A., Howes, R., Sharma, V., Xu, P., Tamoyan, H., Ashual, O., Singer, U., . . . Aghajanyan, A. (2023). Scaling autoregressive multi-modal models: Pretraining and instruction tuning. ArXiv.https://doi.org/10.48550/arXiv.2309.02591

Zhang, Y., Yang, M., Baghdadi, R., Kamil, S., Shun, J., & Amarasinghe, S. (2018). GraphIt: A high-performance graph DSL. Proceedings of the ACM on Programming Languages, 2(OOPSLA), Article 121. https://doi.org/10.1145/3276491

Zhao, A., Xu, J., Konaković-Luković, M., Hughes, J., Spielberg, A., Rus, D., & Matusik, W. (2020). RoboGrammar: Graph grammar for terrain-optimized robot design. ACM Transactions on Graphics (TOG), 39(6), Article 188. https://doi.org/10.1145/3414685.3417831

Zilberstein, S. (1996). Using anytime algorithms in intelligent systems. AI Magazine, 17(3), 73–83. https://doi.org/10.1609/aimag.v17i3.1232

0 notes

jhavelikes · 2 years ago

Quote

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.

Language to Rewards for Robotic Skill Synthesis

0 notes

dailytechnologynews · 4 years ago

Photo

DeepMind takes next step in robotics research - The company has acquired the rigid-body physics simulator MuJoCo and has made it freely available to the research community. https://ift.tt/3mufaPs

#Futurology #Technology #Tech #Studies #Humanity #Development #New #Gadgets

14 notes · View notes

hackernewsrobot · 4 years ago

Text

DeepMind open-sourcing MuJoCo simulator

https://deepmind.com/blog/announcements/mujoco Comments

2 notes · View notes

recopilabot · 4 years ago

Text

Futurology

DeepMind takes next step in robotics research - The company has acquired the rigid-body physics simulator MuJoCo and has made it freely available to the research community. https://ift.tt/3mufaPs Submitted November 01, 2021 at 09:37AM by izumi3682 via reddit https://ift.tt/3BEJPOE

#reddit #futurology

0 notes

jeetjagani · 4 years ago

Photo

DeepMind takes next step in robotics research - The company has acquired the rigid-body physics simulator MuJoCo and has made it freely available to the research community. via /r/Futurology https://ift.tt/3BEJPOE

#IFTTT #reddit

0 notes

tastydregs · 6 years ago

Text

Facebook’s AI uses schemas to teach robots to manipulate objects in less than 10 hours of training

How might a two-armed robot go about accomplishing a task like opening a bottle? Invariably, it’ll need to hold the bottle’s base with one hand while grasping the cap with the other and twisting it off. That high-level sequence of steps is what’s known as a schema, and it’s thankfully uninfluenced by objects’ geometric and spatial states. As an added bonus, unlike reinforcement learning techniques that aim to solve tasks by learning a policy, schemas don’t require millions of examples ingested over the course of hours, weeks, or even months.

Recently, a team at Facebook AI Research sought to imbue two robotic Sawyer arms with the ability to select appropriate steps from a library to complete an objective. At each timestep, their agent had to decide which skill to use and what arguments to use for it (e.g., the location to apply force, the amount of force, or the target pose to move to). Despite the complexity involved, the team says that their approach yielded improvements in learning efficiency, such that manipulation skills could be discovered within only a few hours of training.

The team’s key insight was that for many tasks, the learning process could be split into two parts: (1) leaning a task schema and (2) learning a policy that chooses appropriate parameterizations for the different skills. They assert that this approach leads to faster learning, in part because data from different versions of a given task could be used to improve shared skills. Moreover, they say it allowed for the transfer of learned schemas among related tasks.

“For example, suppose we have learned a good schema for picking up a long bar in simulation, where we have access to object poses, geometry information, [and more],” explained the coauthors of the paper detailing the work. “We can then reuse that schema for a related task such as picking up a tray in the real world from only raw camera observations, even though both the state space and the optimal parameterizations (e.g., grasp poses) differ significantly. As the schema is fixed, policy learning for this tray pickup task will be very efficient, since it only requires learning the (observation-dependent) arguments for each skill.”

The researchers gave the aforementioned two robotic arms a generic library of skills such as twisting, lifting, and reaching, which they had to apply to several lateral lifting, picking, opening, and rotating tasks involving varying objects, geometries, and initial poses. The schemas were learned in MuJoCo (a simulation environment) by training with low-dimensional input data like geometric and proprioceptive features (joint positions, joint velocities, end effector pose), and then transferred to visual inputs both in simulation as well as in the real world.

During experiments, the Sawyer arms — which were equipped with cameras and controlled by Facebook’s PyRobot open source robotics platform — were tasked with manipulating nine household objects (such as a rolling pin, soccer ball, glass jar, and T-wrench) that required two parallel-jaw grippers to interact with. Despite having to learn from raw visual images, they say that the system learned to manipulate most items using 2,000 skills with over 90% success in around 4-10 hours of training.

“We have studied how to leverage state-independent sequences of skills to greatly improve the sample efficiency of model-free reinforcement learning,” wrote the coauthors. “Furthermore, we have shown experimentally that transferring sequences of skills learned in simulation to real-world tasks enables us to solve sparse-reward problems from images very efficiently, making it feasible to train real robots to perform complex skills such as bimanual manipulation.”

0 notes

rafi1228 · 5 years ago

Link

Apply deep learning to artificial intelligence and reinforcement learning using evolution strategies, A2C, and DDPG

DEEP REINFORCEMENT LEARNING

Created by Lazy Programmer Inc.

Last updated 12/2019

English

English [Auto-generated]

What you’ll learn

Understand a cutting-edge implementation of the A2C algorithm (OpenAI Baselines)

Understand and implement Evolution Strategies (ES) for AI

Understand and implement DDPG (Deep Deterministic Policy Gradient)

Description

Welcome to Cutting-Edge AI!

This is technically Deep Learning in Python part 11 of my deep learning series, and my 3rd reinforcement learning course.

Deep Reinforcement Learning is actually the combination of 2 topics: Reinforcement Learning and Deep Learning (Neural Networks).

While both of these have been around for quite some time, it’s only been recently that Deep Learning has really taken off, and along with it, Reinforcement Learning.

The maturation of deep learning has propelled advances in reinforcement learning, which has been around since the 1980s, although some aspects of it, such as the Bellman equation, have been for much longer.

Recently, these advances have allowed us to showcase just how powerful reinforcement learning can be.

We’ve seen how AlphaZero can master the game of Go using only self-play.

This is just a few years after the original AlphaGo already beat a world champion in Go.

We’ve seen real-world robots learn how to walk, and even recover after being kicked over, despite only being trained using simulation.

Simulation is nice because it doesn’t require actual hardware, which is expensive. If your agent falls down, no real damage is done.

We’ve seen real-world robots learn hand dexterity, which is no small feat.

Walking is one thing, but that involves coarse movements. Hand dexterity is complex – you have many degrees of freedom and many of the forces involved are extremely subtle.

Imagine using your foot to do something you usually do with your hand, and you immediately understand why this would be difficult.

Last but not least – video games.

Even just considering the past few months, we’ve seen some amazing developments. AIs are now beating professional players in CS:GO and Dota 2 – DEEP REINFORCEMENT LEARNING

So what makes this course different from the first two?

Now that we know deep learning works with reinforcement learning, the question becomes: how do we improve these algorithms?

This course is going to show you a few different ways: including the powerful A2C (Advantage Actor-Critic) algorithm, the DDPG (Deep Deterministic Policy Gradient) algorithm, and evolution strategies.

Evolution strategies is a new and fresh take on reinforcement learning, that kind of throws away all the old theory in favor of a more “black box” approach, inspired by biological evolution.

What’s also great about this new course is the variety of environments we get to look at.

First, we’re going to look at the classic Atari environments. These are important because they show that reinforcement learning agents can learn based on images alone.

Second, we’re going to look at MuJoCo, which is a physics simulator. This is the first step to building a robot that can navigate the real-world and understand physics – we first have to show it can work with simulated physics.

Finally, we’re going to look at Flappy Bird, everyone’s favorite mobile game just a few years ago.

Thanks for reading, and I’ll see you in class!

Suggested prerequisites:

Calculus

Probability

Object-oriented programming

Python coding: if/else, loops, lists, dicts, sets

Numpy coding: matrix and vector operations

Linear regression

Gradient descent

Know how to build a convolutional neural network (CNN) in TensorFlow

Markov Decision Proccesses (MDPs)

TIPS (for getting through the course):

Watch it at 2x.

Take handwritten notes. This will drastically increase your ability to retain the information.

Write down the equations. If you don’t, I guarantee it will just look like gibberish.

Ask lots of questions on the discussion board. The more the better!

Realize that most exercises will take you days or weeks to complete.

Write code yourself, don’t just sit there and look at my code.

WHAT ORDER SHOULD I TAKE YOUR COURSES IN?:

Check out the lecture “What order should I take your courses in?” (available in the Appendix of any of my courses, including the free Numpy course)

Who this course is for:

Students and professionals who want to apply Reinforcement Learning to their work and projects

Anyone who wants to learn cutting-edge Artificial Intelligence and Reinforcement Learning

Size: 3.30GB

DOWNLOAD TUTORIAL

The post CUTTING-EDGE AI: DEEP REINFORCEMENT LEARNING IN PYTHON appeared first on GetFreeCourses.Me.

#IFTTT #Blogger

0 notes

jhavelikes · 2 years ago

Quote

In “Language to Rewards for Robotic Skill Synthesis”, we propose an approach to enable users to teach robots novel actions through natural language input. To do so, we leverage reward functions as an interface that bridges the gap between language and low-level robot actions. We posit that reward functions provide an ideal interface for such tasks given their richness in semantics, modularity, and interpretability. They also provide a direct connection to low-level policies through black-box optimization or reinforcement learning (RL). We developed a language-to-reward system that leverages LLMs to translate natural language user instructions into reward-specifying code and then applies MuJoCo MPC to find optimal low-level robot actions that maximize the generated reward function. We demonstrate our language-to-reward system on a variety of robotic control tasks in simulation using a quadruped robot and a dexterous manipulator robot. We further validate our method on a physical robot manipulator.

Language to rewards for robotic skill synthesis – Google Research Blog

0 notes

dorcasrempel · 6 years ago

Text

MIT robot combines vision and touch to learn the game of Jenga

In the basement of MIT’s Building 3, a robot is carefully contemplating its next move. It gently pokes at a tower of blocks, looking for the best block to extract without toppling the tower, in a solitary, slow-moving, yet surprisingly agile game of Jenga.

The robot, developed by MIT engineers, is equipped with a soft-pronged gripper, a force-sensing wrist cuff, and an external camera, all of which it uses to see and feel the tower and its individual blocks.

As the robot carefully pushes against a block, a computer takes in visual and tactile feedback from its camera and cuff, and compares these measurements to moves that the robot previously made. It also considers the outcomes of those moves — specifically, whether a block, in a certain configuration and pushed with a certain amount of force, was successfully extracted or not. In real-time, the robot then “learns” whether to keep pushing or move to a new block, in order to keep the tower from falling.

Details of the Jenga-playing robot are published today in the journal Science Robotics. Alberto Rodriguez, the Walter Henry Gale Career Development Assistant Professor in the Department of Mechanical Engineering at MIT, says the robot demonstrates something that’s been tricky to attain in previous systems: the ability to quickly learn the best way to carry out a task, not just from visual cues, as it is commonly studied today, but also from tactile, physical interactions.

“Unlike in more purely cognitive tasks or games such as chess or Go, playing the game of Jenga also requires mastery of physical skills such as probing, pushing, pulling, placing, and aligning pieces. It requires interactive perception and manipulation, where you have to go and touch the tower to learn how and when to move blocks,” Rodriguez says. “This is very difficult to simulate, so the robot has to learn in the real world, by interacting with the real Jenga tower. The key challenge is to learn from a relatively small number of experiments by exploiting common sense about objects and physics.”

He says the tactile learning system the researchers have developed can be used in applications beyond Jenga, especially in tasks that need careful physical interaction, including separating recyclable objects from landfill trash and assembling consumer products.

“In a cellphone assembly line, in almost every single step, the feeling of a snap-fit, or a threaded screw, is coming from force and touch rather than vision,” Rodriguez says. “Learning models for those actions is prime real-estate for this kind of technology.”

The paper’s lead author is MIT graduate student Nima Fazeli. The team also includes Miquel Oller, Jiajun Wu, Zheng Wu, and Joshua Tenenbaum, professor of brain and cognitive sciences at MIT.

Push and pull

In the game of Jenga — Swahili for “build” — 54 rectangular blocks are stacked in 18 layers of three blocks each, with the blocks in each layer oriented perpendicular to the blocks below. The aim of the game is to carefully extract a block and place it at the top of the tower, thus building a new level, without toppling the entire structure.

To program a robot to play Jenga, traditional machine-learning schemes might require capturing everything that could possibly happen between a block, the robot, and the tower — an expensive computational task requiring data from thousands if not tens of thousands of block-extraction attempts.

Instead, Rodriguez and his colleagues looked for a more data-efficient way for a robot to learn to play Jenga, inspired by human cognition and the way we ourselves might approach the game.

The team customized an industry-standard ABB IRB 120 robotic arm, then set up a Jenga tower within the robot’s reach, and began a training period in which the robot first chose a random block and a location on the block against which to push. It then exerted a small amount of force in an attempt to push the block out of the tower.

For each block attempt, a computer recorded the associated visual and force measurements, and labeled whether each attempt was a success.

Rather than carry out tens of thousands of such attempts (which would involve reconstructing the tower almost as many times), the robot trained on just about 300, with attempts of similar measurements and outcomes grouped in clusters representing certain block behaviors. For instance, one cluster of data might represent attempts on a block that was hard to move, versus one that was easier to move, or that toppled the tower when moved. For each data cluster, the robot developed a simple model to predict a block’s behavior given its current visual and tactile measurements.

Fazeli says this clustering technique dramatically increases the efficiency with which the robot can learn to play the game, and is inspired by the natural way in which humans cluster similar behavior: “The robot builds clusters and then learns models for each of these clusters, instead of learning a model that captures absolutely everything that could happen.”

Stacking up

The researchers tested their approach against other state-of-the-art machine learning algorithms, in a computer simulation of the game using the simulator MuJoCo. The lessons learned in the simulator informed the researchers of the way the robot would learn in the real world.

“We provide to these algorithms the same information our system gets, to see how they learn to play Jenga at a similar level,” Oller says. “Compared with our approach, these algorithms need to explore orders of magnitude more towers to learn the game.”

Curious as to how their machine-learning approach stacks up against actual human players, the team carried out a few informal trials with several volunteers.

“We saw how many blocks a human was able to extract before the tower fell, and the difference was not that much,” Oller says.

But there is still a way to go if the researchers want to competitively pit their robot against a human player. In addition to physical interactions, Jenga requires strategy, such as extracting just the right block that will make it difficult for an opponent to pull out the next block without toppling the tower.

For now, the team is less interested in developing a robotic Jenga champion, and more focused on applying the robot’s new skills to other application domains.

“There are many tasks that we do with our hands where the feeling of doing it ‘the right way’ comes in the language of forces and tactile cues,” Rodriguez says. “For tasks like these, a similar approach to ours could figure it out.”

This research was supported, in part, by the National Science Foundation through the National Robotics Initiative.

MIT robot combines vision and touch to learn the game of Jenga syndicated from https://osmowaterfilters.blogspot.com/

0 notes

theresawelchy · 6 years ago

Text

Fostering AI Research: Meet Unity at AAAI-19

Last month, several members from the AI @ Unity team were present at NeurIPS in Montreal. At the Unity booth, we had the opportunity to meet hundreds of researchers and introduce them to Artificial Intelligence and Machine Learning projects at Unity. Later this month, we’re heading to AAAI-19 (an annual AI conference) in Honolulu where we’ll be hosting a booth, and also co-organizing the AAAI-19 Workshop on Games and Simulations for Artificial Intelligence. In this blog post, we’ll provide you with a brief overview of the workshop and explain why we are eager to foster research that leverages games and simulation platforms.

If you’re attending AAAI, consider joining our workshop on January 28 – it’s packed with fantastic speakers and papers covering games and simulations for AI. Also, drop by our booth (January 29 – 31) to say hi, watch some demos, and learn about teams and projects at Unity.

A brief history of games in AI research

Games have a long history in AI research, dating back to at least 1949 when Claude Shannon (shortly after developing information entropy) got interested in writing a computer program to play the game of Chess. In his paper “Programming a Computer for Playing Chess”, Shannon writes:

“The chess machine is an ideal one to start with, since: (1) the problem is sharply defined both in allowed operations (the moves) and in the ultimate goal (checkmate); (2) it is neither so simple as to be trivial nor too difficult for satisfactory solution; (3) chess is generally considered to require “thinking” for skilful[sic] play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of “thinking”; (4) the discrete structure of chess fits well into the digital nature of modern computers.”

That was in 1949. Since then, there has been an enduring interest in creating computer programs that can play games as skillfully as human players, even beating respective world champions. Shannon inspired Arthur Samuel’s seminal work on Checkers in the 1950’s and 1960’s. While Samuel’s program was unable to beat expert players, it was considered a major achievement as it was the first program to effectively utilize heuristic search procedures and learning-based methods. The first success story of achieving expert-level ability was Chinook, a checkers program developed at the University of Alberta in 1989 that began beating most human players and by 1994 the best players could at best play to a draw. This trend continued with other 2-player board games such as Backgammon (with Gerald Tesauro’s TD-Gammon, 1992-2002) and Chess (when IBM’s Deep Blue beat Garry Kasparov, 1997), and most recently with Go. An important scientific breakthrough of the last few years was when, in 2016, DeepMind’s AlphaGo beat 18-time world champion Lee Sedol 4 to 1, the subject of the Netflix documentary, AlphaGo.

(Source) Chinook vs Marion Tinsley (1994)

The progress over the last 70 years since Claude Shannon’s paper has not been limited to solving increasingly more difficult 2-player board games but has expanded to other complex scenarios. These include 3D multiplayer games such as Starcraft II and Dota 2 and more challenging game tasks such as learning to play Doom and Atari 2600 games using only the raw screen pixel inputs instead of a hand-coded representation of the game state. In a 2015 Nature paper, DeepMind presented a deep reinforcement learning system, termed deep Q-network (DQN), that was able to achieve superhuman performance on a number of Atari 2600 games using only the raw screen pixel inputs. What was particularly remarkable was how a single system (fixed input/output spaces, algorithm, and parameters), trained independently on each game, was able to perform well on such a large number of diverse games. More recently, OpenAI developed OpenAI Five, a team of five neural networks that can compete with amateur players in Dota 2.

The effectiveness of game engines & simulation platforms

It’s not just games that have played a central role in AI development. Game engines (and other simulation platforms) themselves are now becoming a powerful tool for researchers across many disciplines such as robotics, computer vision, autonomous vehicles, and natural language understanding.

A primary reason for adopting game engines for AI research is the ability to generate large amounts of synthetic data. This is exceptionally powerful as recent advances in AI and the availability of managed hardware in the cloud (e.g. GPUs, TPUs) have resulted in algorithms that can efficiently leverage huge volumes of data. Our partnership with DeepMind is one example of a premier research lab fully investing in utilizing virtual worlds to study AI. The use of game engines is even more profound in scenarios in which data set generation in the real world is prohibitively expensive or dangerous. A second reason for adopting game engines is their rendering quality and physics fidelity which enables the study of real-world problems in a safe and controlled environment. It also enables models trained on synthetic data to be transferred to the real world with minimal changes. A common example is training self-driving cars and Baidu’s move to leverage Unity to evaluate its algorithms is representative of an ongoing shift to embrace modern game engines.

AI is dubbed the new electricity due to its potential to transform multiple industries. We foresee game engines and simulation platforms playing a very important role in that transformation. This is evident by the large number of platforms that have recently been created to study a number of research problems such as playing video games, physics-based control, locomotion, 3D pose estimation, natural language instruction following, embodied question answering, and autonomous vehicles (e.g. Arcade Learning Environment, Starcraft II Learning Environment, ViZDoom, General Video Game AI, MuJoCo, Gibson, Allen Institute AI2-Thor, Facebook House3D, Microsoft AirSim, CARLA). The list also includes our own Unity ML-Agents Toolkit which can be used to transform any Unity scene into a learning environment to train intelligent agents using deep reinforcement learning and imitation learning algorithms. Consequently, we’re eager to encourage and foster AI research that leverages games and simulation platforms.

AAAI-19 workshop overview

At AAAI, later this month, we are co-organizing the Workshop in Games and Simulations for AI with Julian Togelius (Professor at New York University) and Roozbeh Mottaghi (Research Scientist at the Allen Institute for Artificial Intelligence). The workshop will include a full day of presentations by invited speakers and authors of peer-reviewed papers. The presentations will cover a number of topics including large-scale training of deep reinforcement learning systems such as AlphaGo, high-performance rendering for learning robot dexterity, learning to map natural language to controls of a quadcopter, and using drones to protect wildlife in the African savannah. If you are attending AAAI, join us at the workshop to learn more about how games and simulations are being used to power AI research.

Aloha!

DataTau published first on DataTau

#Technology #Finance

0 notes

eurekakinginc · 7 years ago

Photo

"[P] Which network architecture for A2C RL using PPO for MuJoCo control?"- Detail: I am looking to build a reinforcement learning agent using A2C with PPO optimization to solve a MuJoCo object manipulation problem (similar to OpenAI's learning dexterity https://ift.tt/2OuJnuS but solely in simulation). I am assuming all variables about the environment are known and can be passed to the network (e.g. joint torques, object position/rotation, etc.) so a pixel-based state approximation method is not required.Although I am fairly new to RL I am confident with the algorithm but not so much with which architecture(s) to choose for actor/critic. The literature seems to point towards LSTM-based models but I am not sure why a network with memory would be necessary for a task where the entire state is known and the chosen action should in theory be independent of past states. I am also not sure if I should use separate networks for actor/critic or one should suffice.Any advice on which architectures may work and why would be fantastic!. Caption by lantern_lol. Posted By: www.eurekaking.com

#machine learning #data science #software #programming #engineering #saas #eCommerce #marketing

0 notes

djgblogger-blog · 8 years ago

Text

Model-based reinforcement learning with neural network dynamics

http://bit.ly/2j6CaD4

youtube

By Anusha Nagabandi and Gregory Kahn

Enabling robots to act autonomously in the real-world is difficult. Really, really difficult. Even with expensive robots and teams of world-class researchers, robots still have difficulty autonomously navigating and interacting in complex, unstructured environments.

Fig 1. A learned neural network dynamics model enables a hexapod robot to learn to run and follow desired trajectories, using just 17 minutes of real-world experience.

Why are autonomous robots not out in the world among us? Engineering systems that can cope with all the complexities of our world is hard. From nonlinear dynamics and partial observability to unpredictable terrain and sensor malfunctions, robots are particularly susceptible to Murphy’s law: everything that can go wrong, will go wrong. Instead of fighting Murphy’s law by coding each possible scenario that our robots may encounter, we could instead choose to embrace this possibility for failure, and enable our robots to learn from it. Learning control strategies from experience is advantageous because, unlike hand-engineered controllers, learned controllers can adapt and improve with more data. Therefore, when presented with a scenario in which everything does go wrong, although the robot will still fail, the learned controller will hopefully correct its mistake the next time it is presented with a similar scenario. In order to deal with complexities of tasks in the real world, current learning-based methods often use deep neural networks, which are powerful but not data efficient: These trial-and-error based learners will most often still fail a second time, and a third time, and often thousands to millions of times. The sample inefficiency of modern deep reinforcement learning methods is one of the main bottlenecks to leveraging learning-based methods in the real-world.

We have been investigating sample-efficient learning-based approaches with neural networks for robot control. For complex and contact-rich simulated robots, as well as real-world robots (Fig. 1), our approach is able to learn locomotion skills of trajectory-following using only minutes of data collected from the robot randomly acting in the environment. In this blog post, we’ll provide an overview of our approach and results. More details can be found in our research papers listed at the bottom of this post, including this paper with code here.

Sample efficiency: model-free versus model-based

Learning robotic skills from experience typically falls under the umbrella of reinforcement learning. Reinforcement learning algorithms can generally be divided into categories: model-free, which learn a policy or value function, and model-based, which learn a dynamics model. While model-free deep reinforcement learning algorithms are capable of learning a wide range of robotic skills, they typically suffer from very high sample complexity, often requiring millions of samples to achieve good performance, and can typically only learn a single task at a time. Although some prior work has deployed these model-free algorithms for real-world manipulation tasks, the high sample complexity and inflexibility of these algorithms has hindered them from being widely used to learn locomotion skills in the real world.

Model-based reinforcement learning algorithms are generally regarded as being more sample efficient. However, to achieve good sample efficiency, these model-based algorithms have conventionally used either relatively simple function approximators, which fail to generalize well to complex tasks, or probabilistic dynamics models such as Gaussian processes, which generalize well but have difficulty with complex and high-dimensional domains, such as systems with frictional contacts that induce discontinuous dynamics. Instead, we use medium-sized neural networks to serve as function approximators that can achieve excellent sample efficiency, while still being expressive enough for generalization and application to various complex and high-dimensional locomotion tasks.

Neural Network Dynamics for Model-Based Deep Reinforcement Learning

In our work, we aim to extend the successes that deep neural network models have seen in other domains into model-based reinforcement learning. Prior efforts to combine neural networks with model-based RL in recent years have not achieved the kinds of results that are competitive with simpler models, such as Gaussian processes. For example, Gu et. al. observed that even linear models achieved better performance for synthetic experience generation, while Heess et. al. saw relatively modest gains from including neural network models into a model-free learning system. Our approach relies on a few crucial decisions. First, we use the learned neural network model within a model predictive control framework, in which the system can iteratively replan and correct its mistakes. Second, we use a relatively short horizon look-ahead so that we do not have to rely on the model to make very accurate predictions far into the future. These two relatively simple design decisions enable our method to perform a wide variety of locomotion tasks that have not previously been demonstrated with general-purpose model-based reinforcement learning methods that operate directly on raw state observations.

A diagram of our model-based reinforcement learning approach is shown in Fig. 2. We maintain a dataset of trajectories that we iteratively add to, and we use this dataset to train our dynamics model. The dataset is initialized with random trajectories. We then perform reinforcement learning by alternating between training a neural network dynamics model using the dataset, and using a model predictive controller (MPC) with our learned dynamics model to gather additional trajectories to aggregate onto the dataset. We discuss these two components below.

Fig 2. Overview of our model-based reinforcement learning algorithm.

Dynamics Model

We parameterize our learned dynamics function as a deep neural network, parameterized by some weights that need to be learned. Our dynamics function takes as input the current state $s_t$ and action $a_t$, and outputs the predicted state difference $s_{t+1}-s_t$. The dynamics model itself can be trained in a supervised learning setting, where collected training data comes in pairs of inputs $(s_t,a_t)$ and corresponding output labels $(s_{t+1},s_t)$.

Note that the “state” that we refer to above can vary with the agent, and it can include elements such as center of mass position, center of mass velocity, joint positions, and other measurable quantities that we choose to include.

Controller

In order to use the learned dynamics model to accomplish a task, we need to define a reward function that encodes the task. For example, a standard “x_vel” reward could encode a task of moving forward. For the task of trajectory following, we formulate a reward function that incentivizes staying close to the trajectory as well as making forward progress along the trajectory.

Using the learned dynamics model and task reward function, we formulate a model-based controller. At each time step, the agent plans $H$ steps into the future by randomly generating $K$ candidate action sequences, using the learned dynamics model to predict the outcome of those action sequences, and selecting the sequence corresponding to the highest cumulative reward (Fig. 3). We then execute only the first action from the action sequence, and then repeat the planning process at the next time step. This replanning makes the approach robust to inaccuracies in the learned dynamics model.

Fig 3. Illustration of the process of simulating multiple candidate action sequences using the learned dynamics model, predicting their outcome, and selecting the best one according to the reward function.

Results

We first evaluated our approach on a variety of MuJoCo agents, including the swimmer, half-cheetah, and ant. Fig. 4 shows that using our learned dynamics model and MPC controller, the agents were able to follow paths defined by a set of sparse waypoints. Furthermore, our approach used only minutes of random data to train the learned dynamics model, showing its sample efficiency.

Note that with this method, we trained the model only once, but simply by changing the reward function, we were able to apply the model at runtime to a variety of different desired trajectories, without a need for separate task-specific training.

Fig 4: Trajectory following results with ant, swimmer, and half-cheetah. The dynamics model used by each agent in order to perform these various trajectories was trained just once, using only randomly collected training data.

What aspects of our approach were important to achieve good performance? We first looked at the effect of varying the MPC planning horizon H. Fig. 5 shows that performance suffers if the horizon is too short, possibly due to unrecoverable greedy behavior. For half-cheetah, performance also suffers if the horizon is too long, due to inaccuracies in the learned dynamics model. Fig. 6 illustrates our learned dynamics model for a single 100-step prediction, showing that open-loop predictions for certain state elements eventually diverge from the ground truth. Therefore, an intermediate planning horizon is best to avoid greedy behavior while minimizing the detrimental effects of an inaccurate model.

Fig 5: Plot of task performance achieved by controllers using different horizon values for planning. Too low of a horizon is not good, and neither is too high of a horizon.

Fig 6: A 100-step forward simulation (open-loop) of the dynamics model, showing that open-loop predictions for certain state elements eventually diverge from the ground truth.

We also varied the number of initial random trajectories used to train the dynamics model. Fig. 7 shows that although a higher amount of initial training data leads to higher initial performance, data aggregation allows even low-data initialization experiment runs to reach a high final performance level. This highlights how on-policy data from reinforcement learning can improve sample efficiency.

Fig 7: Plot of task performance achieved by dynamics models that were trained using differing amounts of initial random data.

It is worth noting that the final performance of the model-based controller is still substantially lower than that of a very good model-free learner (when the model-free learner is trained with thousands of times more experience). This suboptimal performance is sometimes referred to as “model bias,” and is a known issue in model-based RL. To address this issue, we also proposed a hybrid approach that combines model-based and model-free learning to eliminate the asymptotic bias at convergence, though at the cost of additional experience. This hybrid approach, as well as additional analyses, are available in our paper.

Learning to run in the real world

Fig 8: The VelociRoACH is 10 cm in length, approximately 30 grams in weight, can move up to 27 body-lengths per second, and uses two motors to control all six legs.

Since our model-based reinforcement learning algorithm can learn locomotion gaits using orders of magnitude less experience than model-free algorithms, it is possible to evaluate it directly on a real-world robotic platform. In other work, we studied how this method can learn entirely from real-world experience, acquiring locomotion gaits for a millirobot (Fig. 8) completely from scratch.

Millirobots are a promising robotic platform for many applications due to their small size and low manufacturing costs. However, controlling these millirobots is difficult due to their underactuation, power constraints, and size. While hand-engineered controllers can sometimes control these millirobots, they often have difficulties with dynamic maneuvers and complex terrains. We therefore leveraged our model-based learning technique from above to enable the VelociRoACH millirobot to do trajectory following. Fig. 9 shows that our model-based controller can accurately follow trajectories at high speeds, after having been trained using only 17 minutes of random data.

Fig 9: The VelociRoACH following various desired trajectories, using our model-based learning approach.

To analyze the model’s generalization capabilities, we gathered data on both carpet and styrofoam terrain, and we evaluated our approach as shown in Table 1. As expected, the model-based controller performs best when executed on the same terrain that it was trained on, indicating that the model incorporates knowledge of the terrain. However, performance diminishes when the model is trained on data gathered from both terrains, which likely indicates that more work is needed to develop algorithms for learning models that are effective across various task settings. Promisingly, Table 2 shows that performance increases as more data is used to train the dynamics model, which is an encouraging indication that our approach will continue to improve over time (unlike hand-engineered solutions).

Table 1: Trajectory following costs incurred for models trained with different types of data and for trajectories executed on different surfaces.

Table 2: Trajectory following costs incurred during the use of dynamics models trained with differing amounts of data. legs.

We hope that these results show the promise of model-based approaches for sample-efficient robot learning and encourage future research in this area.

This article was initially published on the BAIR blog, and appears here with the authors’ permission.

We would like to thank Sergey Levine and Ronald Fearing for their feedback.

This post is based on the following papers:

Neural Network Dynamics Models for Control of Under-actuated Legged Millirobots A Nagabandi, G Yang, T Asmar, G Kahn, S Levine, R Fearing Paper

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning A Nagabandi, G Kahn, R Fearing, S Levine Paper, Website, Code

0 notes