Hot Chips 2020 Live Blog: Keynote Day 2, Dan Belov of Deepmind (1:30pm PT)

AnandTech Live Blog: The newest updates are at the top. This page will auto-update, there’s no need to manually refresh your browser.

04:55PM EDT – Building models of things like drug discovery etc is tough

04:55PM EDT – Allows for simple games, but also complex visual environments

04:55PM EDT – State trees are predicted imaginary states

04:54PM EDT – Model Based Reinforcement learning – learning the effects of actions

04:53PM EDT – Expand search beyond Go for Chess, Shoji

04:53PM EDT – Pruning the search space is critical

04:53PM EDT – reinforcement learning helps build these training regimes and policies

04:53PM EDT – Policy network to indicate which are the best moves, to reduce search even further

04:52PM EDT – No need to analyse future board positions that are known to give losing games

04:52PM EDT – End up with a value network to explore branches in a search tree

04:52PM EDT – Value network – use likelihood of winning from a given position based on previous games known as future stones

04:51PM EDT – Very difficult to point score a given board position

04:51PM EDT – Here’s a search tree for brute force search

04:51PM EDT – Simple game for structure, complex game to master

04:50PM EDT – Now for Go

04:50PM EDT – better and more diverse data to improve

04:50PM EDT – combined with policy optimization

04:50PM EDT – Iterative improvement in dataset quality over time

04:49PM EDT – Handling non-scriptable objects

04:49PM EDT – eventually outperform humans

04:49PM EDT – understanding failure is critical to learn good behaviour

04:49PM EDT – Need to train on clean examples but also bad data to observe failure

04:48PM EDT – Behaviour with adversarial environments

04:48PM EDT – Everything that the robot does with all this data is stored, and used for future iteration laernings

04:47PM EDT – Batch RL

04:47PM EDT – Protein folding or robotics can be difficult to decide how close you are to the goal, so learn programs that assign rewards from programs

04:46PM EDT – humans annotate random attempts to indicate where the rewards are

04:45PM EDT – Initiate with as good data as possible

04:45PM EDT – failed experiments, random policies, interferences

04:45PM EDT – Neverending storage

04:45PM EDT – Never throw any data away, no matter how bad it is

04:44PM EDT – Scale up reinforcement learning in robotics

04:44PM EDT – Reward Sketching – listing human preferences

04:43PM EDT – train networks against future values of themselves

04:43PM EDT – predicting which future states give the best reward

04:43PM EDT – future rewards exponentially decay

04:43PM EDT – How to measure success, as in the real world

04:42PM EDT – All about the value function

04:42PM EDT – Maximise total reward during lifetime of agent

04:41PM EDT – Make good decisions by learning from experience

04:41PM EDT – Reinforcement learning

04:41PM EDT – Networks are growing 3x per year on average

04:41PM EDT – More diverse data, bigger network, more compute, gives better results

04:40PM EDT – Iron Law of Deep Learning: More is More

04:40PM EDT – Supervised DL – inferring knowledge from observations

04:40PM EDT – generalise to apply to new interactions

04:40PM EDT – Recipes to train programs

04:39PM EDT – Machine Learning is about creating new knowledge, using the present klnowledge, to solve a large diversity of novel problem

04:39PM EDT – Performing human level or better

04:39PM EDT – Task that is unlikely to be solved by random interaction

04:38PM EDT – Sequences of low level actions

04:38PM EDT – 2019 – solving puzzles in the real world

04:38PM EDT – Some of the solutions are very human like

04:38PM EDT – Took four hours of training – minimum effort for maximum game

04:37PM EDT – Such as playing breakout with RL

04:37PM EDT – Physically accurate simulations as required

04:37PM EDT – Easy rules to test new approaches in parallel simulations

04:37PM EDT – Research using games

04:36PM EDT – Neuro-physical phenomena

04:36PM EDT – Neuroscience can act as a catalyst

04:36PM EDT – DM has a unique approach to AI

04:35PM EDT – Independent of Alphabet but backed by them

04:35PM EDT – Research Institute inside Alphabet, 400 researchers

04:35PM EDT – Deepmind – An Apollo Program for AI

04:34PM EDT – Intro to Deepmind

04:34PM EDT – Desire to build bigger and bigger machines

04:33PM EDT – No formal training in hardware or systems – purely only a software guy

04:33PM EDT – AI research at scale

04:27PM EDT – This will likely be an update to what’s going on at Deepmind (now owned by Alphabet) and what they’re planning for the future of AI. We might get some insight as to how the company is working with other departments inside Alphabet – it has been cited that Deepmind has used its algorithms to increase the efficiency of cooling inside Google’s datacenters, for example.

04:27PM EDT – Deepmind is the company that created the AlphaGo program that played professional Go champion Lee Sedol in 2016, with the final score of 4-1 in favor of the artificial intelligence.

04:26PM EDT – Keynote for Day 2 of Hot Chips is from Dan Belov of Deepmind