WEBVTT

1
00:00:27.410 --> 00:00:34.620
Dilma Da Silva: I see the number of participants is still going on. It's up, so let's wait until it gets a little stable. I guess

2
00:01:00.160 --> 00:01:02.050
Dilma Da Silva: so. Hello, everyone

3
00:01:02.230 --> 00:01:17.290
Dilma Da Silva: I'm Dilma da Silva. I am the division director for Ccf. Computing communication foundations in size, and here we size are delighted to have the opportunity to host a Professor Mike Carbine.

4
00:01:17.630 --> 00:01:35.330
Dilma Da Silva: So Professor Carbine is an associate professor of electrical engineering and computer science at Mit, where he leads the programming systems group in his work. He aims at improving reliability, performance, energy, consumption, and resilience for computer systems, one hundred

5
00:01:35.340 --> 00:01:50.219
Dilma Da Silva: using techniques from programming language. We are happy to say that Professor Kami has received an Nsf. Career. Award Islam Foundation Research Award Mit Frank Banks award for excellence in graduated devising.

6
00:01:50.230 --> 00:01:59.490
Dilma Da Silva: His work has been recognized in so many forums, but one of those you know best paper awards for I Ilr and I cfp the

7
00:02:07.430 --> 00:02:22.229
Dilma Da Silva: He got his undergrad in computer science from Stanford in two thousand and six, and then a Phd. In electoral and computer science from Mit.

8
00:02:22.240 --> 00:02:31.299
Dilma Da Silva: And he then went to Microsoft Research, where he worked with deep learning systems from two thousand and fourteen to two thousand and eighteen.

9
00:02:31.740 --> 00:02:33.390
Dilma Da Silva: So ah!

10
00:02:33.490 --> 00:02:44.129
Dilma Da Silva: We are very pleased to have you here, Professor Carmine, Mike, and go ahead, and we are looking forward to learning a lot of programming,

11
00:02:44.300 --> 00:02:48.000
Dilma Da Silva: uncertainty and the relationship between them.

12
00:02:48.370 --> 00:02:52.280
Michael Carbin: Great. Thank you so much for that generous interaction.

13
00:02:52.290 --> 00:02:57.149
Michael Carbin: So yeah, So today i'm going to talk about. I guess some work that I've been working on

14
00:02:57.280 --> 00:03:00.990
Michael Carbin: since graduate school, I guess, coming on almost fifteen years now, and

15
00:03:01.000 --> 00:03:30.880
Michael Carbin: the the flavor that has changed over the years as our computing systems have changed over the years. I mean just thinking about the past few months of what's going on with, you know. Chat your D. And G before, and just a new ways of thinking about software. It seems quite. I mean quite fantasical. It's a waste. Um, but for a long time it's It's sort of been what I've been expecting in some ways, because what I what I think and the way I think about the world where computation is going is that every single computation that we're dealing with and programming that or many of the very interesting, but emerging computations that we're thinking about are what I call a certain conversation,

16
00:03:31.210 --> 00:03:42.489
Michael Carbin: and so to let me try to explain that over the course of this talk. I try to explain this, give some examples, and also sort of organize how we might think about the future software and important problems that we might need to be working on.

17
00:03:42.500 --> 00:03:45.369
Michael Carbin: So what is an uncertain computation?

18
00:03:45.380 --> 00:03:50.390
Michael Carbin: Well, the way I like to explain this these days is by looking at this cartoon diagram of the computing stack.

19
00:03:50.400 --> 00:04:06.929
Michael Carbin: And the way I came to this explanation was, I was teaching a computer systems engineering class a couple years ago now, and I was. We were talking with students. I was talking with students about basically trust. You know, your trustee, What what parts of your system do you trust to be reliable.

20
00:04:06.940 --> 00:04:20.879
Michael Carbin: Perhaps here are free or secure. You know what definitions and we're going through levels of the stack, and we started up with applications. The students said, Oh, applications! Libraries can't trust those I write code. I write out the they have bugs,

21
00:04:20.890 --> 00:04:34.469
Michael Carbin: you know, compiler and runtime system. So a layer lower than that, you know It's not not necessarily so obvious to everyone that there there are bugs. Um that we need to deal with at those systems. But There's a long history of work, you know, to my community the Program Age community broadly,

22
00:04:34.480 --> 00:04:44.089
Michael Carbin: and of course they put it wonderfully by and Nsf. On actually identifying bugs in these types of systems and reading them. How can we actually do secure computation

23
00:04:44.100 --> 00:04:59.889
Michael Carbin: Now, below that operating systems? You know, I think all of us have had the unfortunate experience of dealing with operating system. Bug. These are windows, even back any of these things. But where things got interesting, and I was talking to the students asked about hardware systems, and he said, Oh, no, hardware. That's great,

24
00:04:59.900 --> 00:05:17.829
Michael Carbin: you know. And this was before, you know. Perhaps the specter in meltdown, more security focus that perhaps we're seeing more broadly in the community around hardware security. But these again, were undergraduate students, and they said, Yes, you know if I have some instruction, it's one plus one. Then i'm going to get two as each, and I unfortunately

25
00:05:17.840 --> 00:05:20.030
Michael Carbin: does not always the kiss.

26
00:05:20.040 --> 00:05:36.960
Michael Carbin: And so This game is a shock, and it's a prize, and sometimes it comes to a shock. Is it so? People who don't really think about these issues? But, in fact, you right now you could go on the machine that you're sitting right now and go to Intel's website and find the errata, or whichever processor you know, sitting in your machine right now. Here's one from a few years ago.

27
00:05:36.970 --> 00:05:51.099
Michael Carbin: And what is going to tell you? Actually, all the all, the all the bugs all the errors that may have it while executing programs or software on your system. Now, some of these may be rather benign. So this first one at the top cache on cache performance. Monitoring events need to be accurate.

28
00:05:51.110 --> 00:06:11.530
Michael Carbin: You're doing low-level performance engineering. Maybe this this isn't something that is going to affect your day to day but this one below here at the bottom this this is a little bit more ambitious, so it says, processor may hang or cause unpredictable behavior and really unpredictable system behavior here. But what it means is that your system can return arbitrarily incorrect results at sometimes.

29
00:06:11.540 --> 00:06:19.889
Michael Carbin: And this is not just, you know. In theory it's very, very rare circumstances, but this actually shows up in practice.

30
00:06:19.900 --> 00:06:49.040
Michael Carbin: You're your consumer, and you're really hammering on your machine. You know consumer level really doing extensive computations. You can see these types of errors actually show up in corruptor stations. Now, if you go up another level and think about large massive organizations. They're running supercomputer and data set on their jobs. These types of errors actually are a primary first order concern in the design of these systems, because you can see that even within the course of a single day, you're very likely to see some sort of corruption inside your company

31
00:06:49.050 --> 00:06:50.150
station,

32
00:06:50.200 --> 00:07:10.170
Michael Carbin: and this is because we look at a large distributed system. Wow! The probability that, let's say one processor encounters some sort of issue as it's executed may be very small. If you take a large distributed computation, remember tens of thousands, hundreds of thousands of cores. The probability that one gets something that hash becomes much higher, much, much higher.

33
00:07:10.180 --> 00:07:23.689
Michael Carbin: And we started to see more attention. There's been a long held focus actually in the research community of thinking about these sorts of issues. We're starting to see these again, re-emerge the different flavor and different names and research that we've been seeing coming up the on Facebook over these years.

34
00:07:23.850 --> 00:07:26.889
Michael Carbin: So how do we cope with this? I have a problem

35
00:07:27.030 --> 00:07:34.790
Michael Carbin: practice and the type of errors I want to talk about here are trains dance non-determined scares in the sense that if I have a computation, I run it once,

36
00:07:34.800 --> 00:07:36.710
Michael Carbin: and it encounters an error.

37
00:07:36.780 --> 00:07:44.689
Michael Carbin: If I start over and run that exact same computation again on that same machine with high probability. I'm gonna get the correct answer.

38
00:07:44.700 --> 00:07:49.369
Michael Carbin: So these are non-deterministic These are not permanent error. We're always going to get an incorrect with all.

39
00:07:49.380 --> 00:07:51.159
Michael Carbin: Well, in this instance,

40
00:07:51.170 --> 00:07:54.389
Michael Carbin: classically, what we've seen is people use something like replication,

41
00:07:54.400 --> 00:07:59.910
Michael Carbin: right? So there's been a long history of essentially taking a computation that I might have, and I duplicate it.

42
00:07:59.940 --> 00:08:03.330
Michael Carbin: I ran twice. I check to see if the results agree.

43
00:08:03.340 --> 00:08:15.739
Michael Carbin: The results disagree. Then i'm going to go ahead and run it again. I'll keep running it and running it until I get the right result. Of course, there's some specific restrictions on the types of errors and frequency of errors in which there is a sensible approach.

44
00:08:15.750 --> 00:08:24.440
Michael Carbin: This has been a class approach. It's so a classic that going back in the computing literature you can find that the first time this really came into computing was von Neumann

45
00:08:24.450 --> 00:08:38.140
Michael Carbin: when he was trying to construct reliable computations from these unreliable components, and what was very interesting at that time. As for if you go read this paper, it's actually these unreliable owners look a lot like neural networks, and I guess i'll come back to that.

46
00:08:39.000 --> 00:08:50.109
Michael Carbin: But I think it's It's reminiscent of again these types of concerns that we're doing within model of software where things like neural networks are starting to become part and parcel of the way that we're.

47
00:08:50.410 --> 00:09:00.590
Michael Carbin: So now replication plastic strategy for doing this. But I want to encourage us to think differently about this problem strictly. How might we solve this otherwise?

48
00:09:00.870 --> 00:09:12.039
Michael Carbin: Well, one alternative approach here is what if we just let errors happen right instead of trying to correct errors. Let's just let's just let them have let's just let system process, and what what what could go wrong?

49
00:09:12.420 --> 00:09:22.120
Michael Carbin: Well, let's let's explore this right. I'm sure you know, if your so and said this to you, you probably got in your mind all these horrible ways in which this can go wrong. But I want to show an example.

50
00:09:22.170 --> 00:09:32.140
Michael Carbin: And this example here is a very simple method for solving systems of linear occasions called the the Jacobi method, and you don't need to understand this program at all.

51
00:09:32.150 --> 00:09:44.829
Michael Carbin: But I do want to highlight a few specific features of this. So one. This is an iterative. Algorithm So you can see there's this while loop where this program is going to run until it converges after some number of sense.

52
00:09:44.940 --> 00:09:46.710
Michael Carbin: So that's the first one.

53
00:09:46.720 --> 00:10:03.750
Michael Carbin: The other thing I want to highlight are these operations in red? So these are operations where I want to show you what happens if we inject errors into these operations, and let these errors happen, You let these errors let these operations be corrupted, and let them compute on and go on and see what happens.

54
00:10:04.590 --> 00:10:22.730
Michael Carbin: So this is what happens. So here is an illustration of the Jacobi method executing over some number of iterations. So iterations of x-axis. Why, what we're going to have is the difference between our current. Yes, it's a solution to the system linear equations, and in the actual solution.

55
00:10:22.740 --> 00:10:31.389
Michael Carbin: And so this blue line is, if I just run the computation with no errors. Let me see this nice convergent behavior over time, which is very nice.

56
00:10:31.400 --> 00:10:35.280
Michael Carbin: Now, this red line here is what happened if we let errors happen,

57
00:10:35.520 --> 00:10:39.519
Michael Carbin: And in particular we're going to inject errors here on iteration, ten thousand.

58
00:10:39.620 --> 00:10:44.039
Michael Carbin: And So what we see here is initially, there's quite a significant leap up

59
00:10:44.200 --> 00:10:52.709
Michael Carbin: right in error from where we were before, and in fact, it looks like we've lost nearly five hundred or six thousand iterations of progress

60
00:10:52.770 --> 00:10:54.660
Michael Carbin: by encountering this error.

61
00:10:54.670 --> 00:10:56.100
Michael Carbin: However,

62
00:10:56.110 --> 00:11:06.699
Michael Carbin: if we actually let this computation continue running out, what we see is that we actually add only two point, six percent more iterations until we converge

63
00:11:06.740 --> 00:11:26.090
Michael Carbin: to an equally accurate result. So this is interesting. This is interesting, because now it gives us another strategy for thinking about dealing with these types of errors. Right? So, on one hand, we can do something like replication. And so replication. You might think this is be even horribly expensive, and in some cases it really is. But if you scour through the literature, and they're

64
00:11:26.100 --> 00:11:36.879
Michael Carbin: very clever tricks for compilation and hardware support, you can find that maybe it is down to fifty percent in this, and at least in latency energy. Harvard, All these other things. There are other measure that you may need to keep, in fact,

65
00:11:37.240 --> 00:11:40.890
Michael Carbin: conceptually fifty percent open. And we're thinking about time to execution.

66
00:11:41.100 --> 00:11:43.710
Michael Carbin: Now, approximation. On the other hand,

67
00:11:43.920 --> 00:11:47.610
Michael Carbin: what this gives us is instead two point six percent overhead.

68
00:11:47.780 --> 00:11:59.229
Michael Carbin: But the difference Now, the difference here on the crux is that our intermediate results or our results in general may differ from what we intended them to be by some epsilon.

69
00:12:00.310 --> 00:12:01.989
Michael Carbin: And so this Epsilon is important.

70
00:12:02.000 --> 00:12:16.389
Michael Carbin: So when I think about uncertain computations. What I think about our computations where we've composed the software system, and somewhere along the way, or maybe even at the output, results are being computed to some imprecision or inaccuracy,

71
00:12:16.490 --> 00:12:34.069
Michael Carbin: and classically as computer scientists, we have this very discrete view of the world. We don't necessarily think about these things. Of course, if we talk to our friends in numerical analysis or scientific computing is something they think about over time and over the course of time. What I found through. My research is if we can move towards software systems

72
00:12:34.080 --> 00:12:36.750
Michael Carbin: where this epsilon is more explicit,

73
00:12:36.880 --> 00:12:49.450
Michael Carbin: we get new opportunities. And so we, seeing one right here with approximation. Or if we know about this Epsilon, we can reason about this epsilon. Perhaps we can build new systems that are more efficient and tolerant of things like hardware errors.

74
00:12:49.460 --> 00:12:59.010
Michael Carbin: And I think this is critically important, as we're thinking forward as a community of how computing is going to evolve, because this type of uncertainty is now absolutely everywhere.

75
00:12:59.310 --> 00:13:18.450
Michael Carbin: So it's coming up from below from from our hardware systems. We already see this with things like errors. Um that are happening in our hardware systems. But even more you think about the trajectory of machine learning system. They're moving towards reduced precisionances where we're going to have approximation of numerical computations that we're doing.

76
00:13:18.460 --> 00:13:32.890
Michael Carbin: We're seeing very emerging computations in stochastic or analog or neuromorphic computer, or even quantum review. But there are suddenly these fundamentally probabilistic or uncertain computational substrates that we can use. We need to use,

77
00:13:32.900 --> 00:13:36.490
Michael Carbin: or the trade-off is we can get much more efficient or much more capable systems,

78
00:13:36.500 --> 00:13:40.020
Michael Carbin: but it's unavoidable that we now need to think about the subsequent

79
00:13:40.160 --> 00:13:52.339
Michael Carbin: the other way. This is coming up is from the application, or more specifically thinking about the environment, the fact that we're building these applications that are trying to reach out and model the fundamentally uncertain physical world.

80
00:13:52.570 --> 00:14:12.490
Michael Carbin: So take, for example, this computer vision example here, or something like object detection. The goal of these types of systems is to try to actually identify the objects or the locations and positions and boundary boxes around objects that we have in the stream. And unfortunately, this system only has noisy axis,

81
00:14:12.500 --> 00:14:28.570
Michael Carbin: right to this physical world, right? This environment is only partially observable. The data that we're getting back from sensors is noisy, and the models that we're using to construct these types of bounding boxes in essence are are fundamentally only correct with some some sort of probability, because they're typically approximations

82
00:14:28.740 --> 00:14:40.399
Michael Carbin: and as well moving across to other applications and machine learning, image processing computer vision quantum. Again, we have fundamentally applications themselves, for this Epsilon is now

83
00:14:40.410 --> 00:14:45.570
Michael Carbin: so uncertainty These epsilons are starting to show up everywhere in our computing stack

84
00:14:45.830 --> 00:14:57.490
Michael Carbin: again come from below coming from above. And this is interesting. It's amazing. It's giving us new capabilities. We're able to build software systems that press. We couldn't even build before

85
00:14:57.500 --> 00:15:12.840
Michael Carbin: by thinking about what neural networks are really able to do these days and processing on with certain sensor data that's coming in. We're getting very interesting opportunities for performance, Thinking about again what happened with approximate computing or this approximation opportunity. I talked about a few slides ago,

86
00:15:12.850 --> 00:15:26.420
Michael Carbin: and of course, resilience as well resilience to errors if things go wrong, and they're only epsilon wrong. Can we figure out a way to still work through those types of there. But there's a fundamental research question here which is well, how do we build

87
00:15:26.800 --> 00:15:28.190
Michael Carbin: software this way?

88
00:15:28.200 --> 00:15:48.120
Michael Carbin: How can we productively build software. That has all the nice properties that we think about things like modularity and compositionality, and things that we can reason about in the presence of these Epsilons, because this is not how we formally or traditionally constructed software. It Hasn't been the main focus computer science again, been thinking about deterministic and discrete.

89
00:15:48.660 --> 00:16:07.840
Michael Carbin: So that gives me brings me essentially to a world really the subject of of today's talk. And really what I've been doing over the years of thinking about ways of building software in this new reality from programming methodologies. How do we think about these new emerging software systems that are now mixed between classical code

90
00:16:07.850 --> 00:16:27.250
Michael Carbin: neural networks? And now, with things like Chat, Gbt, Gb. Four. All the other models that are out there, perhaps natural language, that there's really interesting compositions where ultimately, at the end of the day. Even what these computations do. We only know what these some of these computations do up to, sometimes

91
00:16:27.500 --> 00:16:47.199
Michael Carbin: moving beyond that their fundamental questions in semantics, or many of these computations now are real valued, but nondeterministic or probabilistic. As I said before, this is this is not where computer science has spent most of the time right? And instead, we need to fundamentally think about what are the semantics of programs that actually execute with these types of data types, because It is actually fundamentally different

92
00:16:47.740 --> 00:17:02.660
Michael Carbin: beneath that. When you think about how do we represent things like real numbers, non-determinism, probability How do we compute what are efficient representations of these types of data types, so that we can compute on them productively in these types of applications.

93
00:17:02.800 --> 00:17:05.150
Michael Carbin: And then, finally, reason,

94
00:17:05.359 --> 00:17:07.529
Michael Carbin: once I have this software system.

95
00:17:07.540 --> 00:17:22.060
Michael Carbin: How do I actually now reason about what sort of guarantees that I can get from this? And I actually prove that with reasonable probabilities can satisfy some specification that I have in my mind. Or perhaps isn't going to be epsilon close to the specification I have.

96
00:17:22.500 --> 00:17:34.199
Michael Carbin: So today, what I want to do is I want to talk to you, maybe just a couple of these. But I'm going to motivate this. You know on how I've been seeing this. I've gone about looking at this problem from a very specific application

97
00:17:34.210 --> 00:17:54.580
Michael Carbin: of thinking about how do we build software sort of in this domain? And let's say these programming methodologies that are very rich and expressive in the sense that we get networks and perhaps other specific techniques really took them to solve classic problem, and it's one of my great problems in in in compilation. How can we build new compilers that incorporate learning?

98
00:17:54.590 --> 00:17:55.890
Michael Carbin: Ah! To tackle

99
00:17:55.910 --> 00:18:13.730
Michael Carbin: the fact that now our underlying uncertain. Our underlying hardware systems are actually very uncertain when we need very expressive models on rich, learned models for behavior. So i'm going to do that, and then use that as an example, really to concretize a few of the opportunities that I see in this space.

100
00:18:14.320 --> 00:18:15.370
Michael Carbin: So

101
00:18:15.480 --> 00:18:22.340
Michael Carbin: the core a compiler. So this is a an incredibly simplified perspective from. I know we have a broad audience here

102
00:18:22.350 --> 00:18:42.049
Michael Carbin: for some of you. You spend time hacking on compiler. I try to write a pilot every year. It's sort of my therapy. We're all right. But for many of you this is perhaps a reasonably simple diagram in the sense that I have source, code, c. Plus c. Java, something coming in on one end into the system, and then on the other side. I'm going to get machine code,

103
00:18:42.060 --> 00:18:58.899
Michael Carbin: and of course they shouldn't be broken out into finer and finer granularities. If you actually know how these systems are composed, which is oftentimes the front end, going from source code to an intermediate representation, and then from that intermediate representation, then down to the machine code itself.

104
00:18:59.150 --> 00:19:06.889
Michael Carbin: Now, where I want to focus. My attention is mostly on this this back end, where we have this rich intermediate representation that represents the program.

105
00:19:06.900 --> 00:19:09.279
Michael Carbin: And we're figuring out, How do we take that.

106
00:19:09.460 --> 00:19:11.419
Michael Carbin: I really map it onto the machine.

107
00:19:11.720 --> 00:19:16.630
Michael Carbin: And so this problem, this optimization problem we can think of

108
00:19:16.760 --> 00:19:19.790
Michael Carbin: as follows: So here is a simple example

109
00:19:19.800 --> 00:19:35.999
Michael Carbin: where I have this code here in the middle, where I've got three little instructions and my goal here. One particular goal here that you can have in a back end is, we want to figure out what's the optimal ordering of these instructions so that I can get the best performance on the underlying.

110
00:19:36.130 --> 00:19:53.290
Michael Carbin: Now. So this this border, what is an overall? It's going to be some permutation of the instructions that I have, and this is going to give us essentially a transformation space. So the in space of valid transformations are going to have of this particular code, and we can represent that here as this as this dag that i'm showing here.

111
00:19:53.300 --> 00:19:56.520
Michael Carbin: But what this is representing is the data dependencies.

112
00:19:56.530 --> 00:20:07.849
Michael Carbin: Obviously I can't compute, C until I've computed A and B. So A and B. Need to proceed, c. In any particular ordering that I have of this computation, that I'm. Ultimately going to generate

113
00:20:08.030 --> 00:20:15.840
Michael Carbin: So any topological sort through this graph is actually be a valid ordering thing in have of a conversation.

114
00:20:15.970 --> 00:20:18.030
Michael Carbin: So that's my transformation space.

115
00:20:18.570 --> 00:20:31.959
Michael Carbin: The next total component here is I have something that's called a performance model. So I want to take any permutation of these instructions, and then be able to predict what the underlying performance is going to be on the actual machine,

116
00:20:31.970 --> 00:20:55.379
Michael Carbin: and oftentimes what I want to be able to protect is because I want to be able to explore broadly in this transformation space the transcription. Space may be much larger, just the two options that we see here. Rather it could be thousands, more than thousands exponentially, many even in the size of the program. And with that in mind we want efficient ways to be able to search this rather than actually trying to run perhaps a potentially expensive competition on an online machine,

117
00:20:55.390 --> 00:20:59.889
Michael Carbin: right? So our performance model is here to actually help us out. So, for example, here's some

118
00:20:59.900 --> 00:21:06.620
Michael Carbin: examples where a performance model may say, Okay, the cost of this one is going to be three. There's first one that costs the second one. It's going to be two.

119
00:21:06.860 --> 00:21:10.360
So now, with that performance, model in hand. Now we can search.

120
00:21:10.380 --> 00:21:22.190
Michael Carbin: There's a methodology for trying to search efficiently. Search these types of spaces, and this particular instance we're going to go ahead and choose this one with cost two which we think is going to be more efficient. So this compiler back in this is

121
00:21:22.200 --> 00:21:29.700
Michael Carbin: in some sense the basic strategy that we have. We're now a critical component for getting to this work is is the underlying performance model.

122
00:21:30.060 --> 00:21:38.379
Michael Carbin: How do we understand how to model what the machine is doing beneath us, so that our compiler can make effective search decisions and how to generate.

123
00:21:38.890 --> 00:21:42.309
Michael Carbin: Now, this performs modeling problems

124
00:21:42.320 --> 00:21:55.079
Michael Carbin: always been a very tricky problem. It's becoming even even more tricky in in modern computing, because not only are we thinking about this, our traditional classic processor, we're starting to get these very heterogeneous.

125
00:21:55.170 --> 00:21:57.390
Michael Carbin: They are spanning many different chips,

126
00:21:57.400 --> 00:22:14.879
Michael Carbin: very divergent styles. Some are very efficient in particular dimensions of particular computations; some really want to use for very expensive computations, you know, such as machine learning, and we really want to be able to map essentially an application in parts of that application down

127
00:22:15.090 --> 00:22:25.889
Michael Carbin: to each of the individual chips that I may have available inside of my system. And so this is the sort of heterogeneous mapping problem is is a heart problem, and it's become increasingly hard.

128
00:22:26.370 --> 00:22:45.460
Michael Carbin: And so in the systems communities there is just this fundamentally hard systems challenge that many people in the community at this point are trying to have, which is, we want to take our program. We have some sort of hardware specification of all the components that we may have the sea of components that we may have down here at the bottom, that we may be targeting, and we need a performance model,

129
00:22:45.470 --> 00:22:53.390
Michael Carbin: and our compiler is supposed to. Then, you know, magically come in, navigate this space and generate code that appropriately targets the machines below.

130
00:22:53.720 --> 00:22:56.599
Michael Carbin: Now this is becoming increasingly harder,

131
00:22:56.840 --> 00:23:03.159
Michael Carbin: partly because what have I talked about before in the sense that each of these components, each of these

132
00:23:03.780 --> 00:23:21.969
Michael Carbin: chips, each of these machine components, may have a different behavioral model. You know. Again, we're moving away from these traditional models of computation where I have single input going into my process, learning in a single input that's coming out. And instead, I need to think about emerging models of computation

133
00:23:21.980 --> 00:23:25.589
Michael Carbin: or single inputs may now be turning into distributions of inputs

134
00:23:25.600 --> 00:23:37.089
Michael Carbin: and individual devices may have very different distributions of inputs. And so now, if I really want to think about their emerging systems challenge, I need to broaden that to now include this behavioral.

135
00:23:37.420 --> 00:23:39.989
Michael Carbin: And so given the complexity of these,

136
00:23:40.000 --> 00:23:44.599
Michael Carbin: and the fact that really the only effective way that we can measure

137
00:23:44.780 --> 00:23:52.269
Michael Carbin: the true behavior and performances of machines, as often through observations as refugees by running things and observations. This, in essence is

138
00:23:52.310 --> 00:24:03.419
Michael Carbin: has for a long time been viewed as a learning problem, and is increasingly so getting viewed as a learning problem in something where we're starting to see very effective techniques for applying.

139
00:24:04.780 --> 00:24:09.319
So what I want to talk about today is a little bit of our work that we've been done on the performance modeling side

140
00:24:09.330 --> 00:24:16.910
Michael Carbin: of using machine learning to build these performance models right and again, with an eye for concretizing exactly the

141
00:24:16.920 --> 00:24:33.280
Michael Carbin: how these sorts of exotic software systems are starting to be developed again, being motivated by the fact that well learning may be one of the most effective ways to capture the underlying uncertainty, the exact behavior of the underlying machine. And therefore, if we incorporate learning, we can get a more effective

142
00:24:33.820 --> 00:24:35.090
Michael Carbin: so performance one.

143
00:24:35.100 --> 00:24:46.790
Michael Carbin: So we already got a little peek preview of this before a few slides ago, and as I mentioned. This is critical for compilation, but also other areas of forms, engineering and debugging. But essentially what this is going to take is

144
00:24:46.910 --> 00:25:03.949
Michael Carbin: some piece of code. Let's say X, eighty, six basic blocks. This is the domain in which we've actually validated some of our results, and we can throw it into a performance model, and it's going to give us a number of cost. And here, in this case, what we are looking at is throughput, so throughput

145
00:25:03.960 --> 00:25:10.669
Michael Carbin: of sequence with structures is essentially, or to think about taking this code and just unrolling it all the way to

146
00:25:10.680 --> 00:25:15.519
Michael Carbin: right and then processing it. It is in the steady state of executing that.

147
00:25:15.530 --> 00:25:27.360
Michael Carbin: How many cycles does it take to execute essentially that basic block, an iteration of that basic bar? And again, thinking about the steady state unrolled and averaged off across, perhaps an infinite number of iterations,

148
00:25:27.450 --> 00:25:34.280
Michael Carbin: and you know the lower. This cost the fewer cycles it takes the better or the faster I can execute that code. So

149
00:25:34.670 --> 00:25:52.910
Michael Carbin: now this problem is hard and alluded to. It's hard, and I want to give you some more evidence that it's hard to go through some examples here. But the reason it's hard is because this is a very complicated device, a system that we're trying to characterize. So if you look at your modern microprocessor, what you're going to see is

150
00:25:52.920 --> 00:26:07.779
Michael Carbin: is a very sophisticated type of system, and if you're building a performance molecule, an accurate performance model. You need to figure out or come up with some effective model of many of the components of this system where instructions are coming in in this front end.

151
00:26:07.790 --> 00:26:27.289
Michael Carbin: They're being fetched. They're being decoded. They're being scheduled. They're being actually executed. There's also, of course, a whole memory subsystem that has to be dealt with. And if you actually, in principle want to model this, you need to come up with essentially some model of the behavior of all these individual components, how they work, how they fit together, and of course, the appropriate cost

152
00:26:27.300 --> 00:26:29.810
Michael Carbin: associated with each one of these.

153
00:26:29.820 --> 00:26:33.789
Michael Carbin: Now, what makes this more challenging is while we had these sort of cartoon,

154
00:26:33.800 --> 00:26:36.389
and there's some fidelity that we have in these diagrams.

155
00:26:36.400 --> 00:26:41.989
Michael Carbin: The reality is that the true access that you're going to have to. One of these machines is going to look like this

156
00:26:42.000 --> 00:26:44.030
Michael Carbin: right? And that's because

157
00:26:44.050 --> 00:26:45.790
Michael Carbin: we don't have a full specification.

158
00:26:45.800 --> 00:26:47.470
We do have some help.

159
00:26:48.010 --> 00:27:03.280
Michael Carbin: So the Intel. And so when I give this part of a talk in in person, there's usually someone that sees the Intel manual slides and sort of puts their head down and starts crying, weeping, because they've spent endless hours reading through this documentation with trying to figure out what exactly your process is coming,

160
00:27:03.380 --> 00:27:09.159
Michael Carbin: but it's it's a beast, it's, it's, it's a beast. It documents many, many aspects

161
00:27:09.170 --> 00:27:21.390
Michael Carbin: of of the architecture, interaction, specification of optimization, cpu parameters things that can help you towards the end of building, if not a mental performance model, an actual physical performance model.

162
00:27:21.400 --> 00:27:25.790
Michael Carbin: And again, this is this is filled. So it's filled with things like these sorts of instruction tables.

163
00:27:25.800 --> 00:27:28.130
Michael Carbin: But again, this is not

164
00:27:30.000 --> 00:27:31.020
Michael Carbin: it's.

165
00:27:31.030 --> 00:27:32.550
Michael Carbin: Now, if you actually looked at

166
00:27:32.560 --> 00:27:48.090
Michael Carbin: these instruction tables and you go through what you start to see or are footnotes like this. So these instruction tables are telling you various aspects of the latency throughput of instructions on on a given machine. But if you start to dig in here you'll see comments like this: All tables in this appendix.

167
00:27:48.150 --> 00:27:49.390
Michael Carbin: Our estimates

168
00:27:49.400 --> 00:28:02.919
Michael Carbin: right. Actual performance of these instructions right? Can range from somewhat faster to significantly faster. So what it means as though you're getting again a peak. You're not getting a full peak,

169
00:28:02.930 --> 00:28:14.119
Michael Carbin: and partly because this is a very hard problem. So very. These are very complicated systems, and two Intel's credits are trying to do a very reasonable job of trying to expose some sort of specification that people can work with. But

170
00:28:14.360 --> 00:28:19.589
Michael Carbin: this abstraction, this single instruction, abstraction with these latencies and throughput is is the

171
00:28:19.600 --> 00:28:24.650
Michael Carbin: complete abstraction. It's and it's very hard to try to deliver this this abstraction.

172
00:28:25.610 --> 00:28:44.610
Michael Carbin: So what this means is if someone's sitting down trying to build this performance model. Oftentimes, of course they'll be consulting these tables. But within the full complexity of this process, what they'll do is they'll start to make some kitchen, especially when you're thinking about compilation rather than you know other areas of research when you need higher for that.

173
00:28:44.620 --> 00:28:46.450
Michael Carbin: And so what they do is they're abstract.

174
00:28:47.130 --> 00:28:56.830
Michael Carbin: They'll choose a smaller subset of these components make some assumptions about their behaviors, and then set some parameters based on what they found, and they're going to get this.

175
00:28:57.180 --> 00:29:07.529
Michael Carbin: And so Now your modern performance model, at least, is represented in something like your modern compiler. Llvm. Is going to look something like this very simplified version of the reality that we have

176
00:29:07.700 --> 00:29:11.470
Michael Carbin: now is an example here just to see a hard. The citizen

177
00:29:11.610 --> 00:29:14.689
Michael Carbin: i'm giving you a test. No, I don't expect i'm an actor.

178
00:29:14.700 --> 00:29:22.659
Michael Carbin: We're gonna walk through this one. So here's an example, and that is how many cycles does it take to compute one hundred iterations of this following?

179
00:29:23.060 --> 00:29:38.189
Michael Carbin: Now I don't expect you to understand instruction, but we'll go ahead with the Intel Manual, and we'll go ahead and decode this instruction. So I go and look at this page and the Intel Manual. What i'll find is Aha! This is actually it's it's a Vector instruction. But what it's doing is essentially a bit-wise logical Xor

180
00:29:38.250 --> 00:29:40.300
Michael Carbin: of these registers,

181
00:29:40.720 --> 00:29:49.190
Michael Carbin: so I can rewrite this. And now I get something that's maybe a little bit more interpretable to your your average computer scientist, or I've got some register r one, and i'm going to

182
00:29:49.200 --> 00:29:52.199
Michael Carbin: store into it the value of R. One X over two,

183
00:29:54.270 --> 00:30:07.640
Michael Carbin: and now I can start to try to figure out a performance model. It can go to another part of the Intel Manual and look up the latency and the throughput for this instruction again similar to those tables that we saw a few slides ago.

184
00:30:07.670 --> 00:30:20.399
Michael Carbin: So if I go and look here, what I can find is that destruction has a latency of one. So what that means is it's going to take one cycle. The abstraction is, Make one cycle, the processor to execute this instruction, and then it has a throughput

185
00:30:20.540 --> 00:30:25.279
Michael Carbin: point three three. You are really one over three. And the way to think about this is

186
00:30:25.290 --> 00:30:26.870
Michael Carbin: it is a processor

187
00:30:26.910 --> 00:30:35.190
Michael Carbin: can execute three concurrent instances of this instruction at the same time, provided there are no data dependencies between that. So an excuse

188
00:30:35.220 --> 00:30:36.890
Michael Carbin: breathe is the same time,

189
00:30:37.060 --> 00:30:39.020
Michael Carbin: and he's a numbers before.

190
00:30:39.280 --> 00:30:46.790
Michael Carbin: So now, with this information, we can take a first crack of figuring out. What is it? How many cycles will it take to compute one hundred iterations.

191
00:30:46.810 --> 00:30:48.230
Michael Carbin: I Oh, this instruction!

192
00:30:49.490 --> 00:30:51.030
Michael Carbin: It's!

193
00:30:57.130 --> 00:31:00.829
Michael Carbin: There are a wider idea of different answers when I better do this in person,

194
00:31:00.840 --> 00:31:04.339
Michael Carbin: some all of them actually pretty good, because this is, this is a weird part.

195
00:31:05.080 --> 00:31:18.180
Michael Carbin: But so the first one and very reasonable one here is actually the answer that you would get from Llvm's performance model, as of several years ago. When we were doing this work, and then, after they saw our work, they they changed performance. But

196
00:31:18.410 --> 00:31:24.980
Michael Carbin: this is the answer, they get one hundred cycles. And the reason why you would think that is because, while I've got this instruction here,

197
00:31:25.210 --> 00:31:30.390
Michael Carbin: it takes one cycle, and, Moreover, this instruction reads from R One: The

198
00:31:30.400 --> 00:31:35.190
Michael Carbin: then writes to our right: I can't begin executing my next instruction

199
00:31:35.930 --> 00:31:43.430
Michael Carbin: unless I've figured, or unless I've completed executing the previous instruction if they're reading and writing from that that same data point.

200
00:31:43.440 --> 00:31:51.290
Michael Carbin: So this is. This is a very reasonable thing to start with, just by looking at the data dependencies, right? So our one is going to flow in R one. So

201
00:31:51.410 --> 00:31:57.849
Michael Carbin: of course, this is going to take one hundred cycles, because each instruction takes one cycle to execute, and this data depends between.

202
00:32:00.540 --> 00:32:19.970
Michael Carbin: Now There, there's a trick here, right? So if you, if you look at this a little bit more carefully, or and perhaps you're familiar with your bit twilling and bit vector arithmetic. What you'll see is that Well, this is actually this: is this Isn't: Really, this instruction, this is really this instruction. I'm: setting R: one to zero.

203
00:32:19.980 --> 00:32:20.989
Michael Carbin: You're like, Okay,

204
00:32:21.000 --> 00:32:32.489
Michael Carbin: that's interesting, like, Why, this is a cool optimization like um. I feel like my compiler should generate that. Well, not only might a compiler try to generate that your processor actually I dynamically identify this

205
00:32:32.500 --> 00:32:37.489
Michael Carbin: rewrite the instructions. So if you go and actually point around through the Intel, you

206
00:32:37.500 --> 00:32:53.210
Michael Carbin: and you'll find that there are zero idioms. Well, the processor itself has been manually optimized to find these sorts of zero idioms. And if you go and find this list, you'll see our instruction that we have. That's our mnemonic that we were looking with before we we wrote it to be simpler,

207
00:32:53.220 --> 00:33:04.540
Michael Carbin: and what we'll find is the processor finds all instances of these zero idioms, and they get removed, and they have no execution, and so great. Suddenly our latency goes to zero,

208
00:33:04.890 --> 00:33:11.890
Michael Carbin: and now we can, instead start thinking about throughput right. There's no dated prints here and throughput here said We can execute three at a time.

209
00:33:12.250 --> 00:33:15.999
Michael Carbin: So what that means is now. Well, maybe this is going to be thirty three cycles.

210
00:33:16.770 --> 00:33:17.990
Michael Carbin: That's it bigger.

211
00:33:19.050 --> 00:33:20.060
Michael Carbin: But

212
00:33:20.210 --> 00:33:38.269
Michael Carbin: you do some more poking, and you actually go and measure this, which you end up with this twenty, five cycles, or if you use Intel's proprietary performance, model performance analysis tool by Aaka that they had released at the time. You see this twenty five cycles, and then you start to wonder what's what's going on here.

213
00:33:38.280 --> 00:33:53.759
Michael Carbin: And, in fact, what eventually ends up happening is once you go through this renaming process, You're no longer even bottlenecked by these instruction specifications, and instead, you become bottleneck by the front end, which can actually do four instructions for cycles. And so you get an entirely different

214
00:33:53.790 --> 00:33:55.730
Michael Carbin: ultimately when she published this.

215
00:33:55.920 --> 00:33:58.120
Michael Carbin: So this is this is hard.

216
00:33:58.650 --> 00:34:00.539
Michael Carbin: This is hard, because

217
00:34:00.940 --> 00:34:04.000
Michael Carbin: how do we go from what Intel gave us?

218
00:34:04.250 --> 00:34:17.889
Michael Carbin: It's this very reasonable instruction specification which you know an actual production compiler was using, because this was the process that they use to the actual measured performance or the proprietary performance. When Intel has worked in all the optimizations and smarts,

219
00:34:17.900 --> 00:34:22.680
Michael Carbin: how how can we cobble together all this documentation, our understanding and measurements, to get there?

220
00:34:22.730 --> 00:34:28.480
Michael Carbin: Well, in practice. What people do is they do this. They will build their simplified performance, model the

221
00:34:28.630 --> 00:34:34.599
Michael Carbin: they will stack that against their actual processor They'll get a data set of examples.

222
00:34:34.850 --> 00:34:36.720
Michael Carbin: I'll pass that through. Both.

223
00:34:36.929 --> 00:34:39.950
Michael Carbin: Compare the difference. See where they're wrong.

224
00:34:41.560 --> 00:34:47.969
Michael Carbin: This by and large. If you go and look through the development process of many of these certainly open source

225
00:34:48.230 --> 00:34:51.789
Michael Carbin: performance models. You'll see this iter of refinement process.

226
00:34:51.800 --> 00:34:59.349
Michael Carbin: And so in the process. When we were doing this research at the time. We sort of looked at this and said, This looks an awful lot like machine learning

227
00:34:59.960 --> 00:35:11.500
Michael Carbin: good data set. I've got labels. There's some sort of model. I'm trying to tune. I can look at the difference, and i'm refining my model. And so why Don't, we just go ahead and replace that with the neural network. Let's let's just see what happens.

228
00:35:11.510 --> 00:35:30.189
Michael Carbin: And so on. Some work called Iphone. All this is exactly what we did, right where we came up with the neuron at work. We built a data set, a very extensive data set at the time that had some specific invariance designed to capture and variance that we have for these performance models. Things like no cache misses preemptions,

229
00:35:30.200 --> 00:35:38.600
Michael Carbin: very simplified, setting not the full month thing, but this is, in fact, how systems like Llvm's Yay, they actually do model themselves.

230
00:35:38.660 --> 00:35:41.259
Michael Carbin: And then, of course, we get a comprehensive dataset.

231
00:35:41.270 --> 00:35:52.160
Michael Carbin: And so we went and ran this, and we found that we can actually using this technique Just moral neural networks build a system that had half of the error of Llvm in into as Aka. No, at that time.

232
00:35:52.170 --> 00:36:05.229
Michael Carbin: And so this this was great. This was an amazing success. I think it was one of the first papers really out there, at least in the modern era to be able to demonstrate that we could scale up this type of deep learning to actually target this type of problem.

233
00:36:05.340 --> 00:36:18.979
Michael Carbin: And so I them all. There are some details to its architecture. I don't really want to focus on those too much. There's some novelties, but I think the most important part here is really that it's entirely an unrepresentation

234
00:36:18.990 --> 00:36:31.099
Michael Carbin: so unlike previous work and incorporating machine learning into these types of recording volunteering situations. What I from all did is, it takes text, tokenized text and predicts a number.

235
00:36:31.630 --> 00:36:36.790
Michael Carbin: There's no featureization of the code. We don't need to extract related penses. All these sorts of things.

236
00:36:36.800 --> 00:36:43.359
Michael Carbin: Nothing right. Learned representations has has been the interest and fair of, Let's say, the past,

237
00:36:43.510 --> 00:36:55.340
Michael Carbin: you know ten years of modern deep learning, which is that we can move away from the explicit futurization of our data and have these incredibly capable systems come in and learn purely from inputs or text

238
00:36:55.390 --> 00:36:57.380
Michael Carbin: to get a very effective result.

239
00:36:58.160 --> 00:37:00.609
Michael Carbin: And so, if you look at it, if you come back,

240
00:37:00.750 --> 00:37:17.989
Michael Carbin: this example that we have, if Mall ends up getting pretty close to this measured version. So it actually is able to figure out this optimization that is happening beneath the scenes by the processor that's been detailed truly from date purely from from yesterday.

241
00:37:19.380 --> 00:37:24.370
Michael Carbin: Now, so I them. All this has been great. It's taking on our standard methodology. We

242
00:37:25.180 --> 00:37:27.149
Michael Carbin: replaced it with a neural network.

243
00:37:27.160 --> 00:37:27.950
Bye,

244
00:37:28.150 --> 00:37:30.189
Michael Carbin: but not all is perfect

245
00:37:30.200 --> 00:37:46.519
Michael Carbin: right? Because one benefit of your traditional performance model is, you can in principle understand what's happening. These These performance models oftentimes when they're written, they sort of have a trace to some sort of explanation. There's a There's a simulation

246
00:37:46.530 --> 00:38:02.269
Michael Carbin: that you can read off that models individual stages of the processor that gives you a hypothesis on how the system believes the underlying processor is actually behaving. And so this can be very helpful for things like performance to. I really trying to figure out. How do I optimize my system, change my code

247
00:38:02.400 --> 00:38:06.080
Michael Carbin: based on the observations that i'm staying from the simulation.

248
00:38:06.540 --> 00:38:11.589
Michael Carbin: So there's lots of interpretability motivated us to think a little bit differently here,

249
00:38:11.600 --> 00:38:20.020
Michael Carbin: once we had I them all, which was, you know. Can we maybe try to combine the best of both worlds, where, instead of entirely throwing out the

250
00:38:20.710 --> 00:38:25.039
Michael Carbin: our analytical model and replacing with the neural network, maybe what we can do is

251
00:38:25.340 --> 00:38:29.249
Michael Carbin: a combining both and more of a neural symbolic approach.

252
00:38:30.360 --> 00:38:45.589
Michael Carbin: And so in the next line of work called Diff tune, we did exactly that, and the way that we started out is, we said, Okay, let's look at this llvm. Um um Mca. Simul that we have, which had a very detailed simulation pipeline,

253
00:38:45.600 --> 00:38:47.419
Michael Carbin: but unfortunately

254
00:38:47.580 --> 00:38:51.010
Michael Carbin: had some wrong intuition on what was actually going on.

255
00:38:51.460 --> 00:39:10.119
Michael Carbin: In particular. It's parametrized. It's parametrized by these sorts of numbers that we're seeing down here at the bottom, and it has its own hypothesis on what these numbers are, and we said, could we use machine learning to delivery, figure out better sets of parameters to then plug into Llvm: Yeah. To, therefore, get better accuracy, but still

256
00:39:10.460 --> 00:39:14.829
Michael Carbin: be able to interpret essentially a trace of the behavior of the underlying system.

257
00:39:16.070 --> 00:39:26.009
Michael Carbin: So we dug into a l of themca, and, like I said, it is a It's a fairly detailed simulation again. This isn't going to be, you know, architecture or level research

258
00:39:26.020 --> 00:39:36.899
Michael Carbin: psychoactive assimilation. But it's reasonable enough, as is used inside of a compiler to be able to validate various scheduling-based optimizations that are going on inside of

259
00:39:37.710 --> 00:39:46.390
Michael Carbin: now. L. O. Mmca is is not small. It's not super massive, but not small. It's about ten thousand ten thousand lines a Cpos,

260
00:39:46.400 --> 00:39:52.089
Michael Carbin: but where it does become big, is. It has about eleven thousand integer-valued parameters

261
00:39:52.100 --> 00:39:54.440
Michael Carbin: that determine its behavior.

262
00:39:54.990 --> 00:40:06.419
Michael Carbin: And so this this is the challenge and these parameters come from it trying to plug in the pieces of this model that it has right? So does the dispatcher

263
00:40:06.590 --> 00:40:18.119
Michael Carbin: right, which essentially is a very simplified model of of a front-end instructions coming in. At what rate did they get depoted and dispatched up to the rest of the processing pipeline, and was single prime.

264
00:40:18.420 --> 00:40:21.340
Michael Carbin: How many instructions can be dispatched per cycle.

265
00:40:22.540 --> 00:40:28.460
Michael Carbin: There's also reorder buff, right? So it's also modeling out of order execution and the

266
00:40:28.640 --> 00:40:32.590
Michael Carbin: the size of the reorder buffer. Right. There's some number of instructions that it has.

267
00:40:32.680 --> 00:40:49.830
Michael Carbin: But where things start to get more complicated is that it starts to have per instruction Parameters such as these right latency and meat advice and read advanced cycles, which is, if I have instruction, and, for example, it has a right. How soon can that

268
00:40:49.840 --> 00:40:56.550
Michael Carbin: computation be made available to a subsequent instruction before it actually really needs to be fully retired

269
00:40:56.690 --> 00:41:13.620
Michael Carbin: and as well. There are execution units with things like Port Mac, so I don't expect everyone really to understand all the the details in here. But the real primary point here is that every single one of these individual components we have some manner of parameters, some of them global, some of them being very specific to the individual instructions that we have,

270
00:41:13.700 --> 00:41:18.079
Michael Carbin: we're all untold together. We end up with eleven thousand parameters

271
00:41:18.090 --> 00:41:19.080
Michael Carbin: where

272
00:41:19.110 --> 00:41:28.689
Michael Carbin: the low Vmca. Developers. Unfortunately, the way they do this is how we've talked about spending a lot of time looking at design documents, and also doing some measurement and refining.

273
00:41:29.200 --> 00:41:31.240
Michael Carbin: So with our system d tune.

274
00:41:31.310 --> 00:41:37.379
Michael Carbin: When we wanted to apply learning, we thought about this really as an optimization problem

275
00:41:37.660 --> 00:41:40.789
Michael Carbin: where the way we can think about this is,

276
00:41:40.800 --> 00:41:51.240
Michael Carbin: I have some simulator, right? So F is my simulator here, right? It has some set of parameters data, and on a given input X. It's going to produce

277
00:41:51.450 --> 00:41:56.849
Michael Carbin: protection, right? Some some model the performance of that particular input meaning a basic block,

278
00:41:57.100 --> 00:41:58.990
Michael Carbin: all right. How many cycles it's going to take.

279
00:41:59.340 --> 00:42:04.089
Michael Carbin: Now, the optimization problem is Well, if I have a data set right,

280
00:42:04.100 --> 00:42:16.789
Michael Carbin: can I essentially minimize or find the best set of parameters that minimize the discrepancy between this prediction that I have, and a true measured ground-truth prediction that I have for the data. Set

281
00:42:16.800 --> 00:42:18.380
So nothing Fancy here,

282
00:42:18.390 --> 00:42:24.850
Michael Carbin: right? This is just optimization problem, specification of thinking about. How can we

283
00:42:25.110 --> 00:42:35.589
Michael Carbin: optimize the simulator or it's indexed again by about these eleven thousand parameters, which together give us an absolutely massive number of possible configurations.

284
00:42:35.600 --> 00:42:54.159
Michael Carbin: So this problem, as it's posed is unfortunately horribly interesting. Um, using search or stochastic search as we validated by using other tools available in the techniques, such as open tuner um available in the community, such as open tuner, These existing tools that we have just simply do not scale

285
00:42:54.170 --> 00:42:55.870
Michael Carbin: to this type of problem.

286
00:42:56.350 --> 00:43:01.739
Michael Carbin: So what we said is perhaps we can build a surrogate optimization problem.

287
00:43:01.900 --> 00:43:08.090
Michael Carbin: So the key thing here is that this is exactly the same optimization problem except you have an F hat here. Now,

288
00:43:08.260 --> 00:43:11.250
Michael Carbin: where the idea is, F hat is going to be a circuit.

289
00:43:11.720 --> 00:43:16.879
Michael Carbin: It's going to be a neural network that approximates the behavior of the simulator.

290
00:43:16.960 --> 00:43:26.320
Michael Carbin: No conditioned on a center parameters right. So given some set of parameters. Theta This surrogate is going to intuit what our simulator would do

291
00:43:26.490 --> 00:43:29.049
Michael Carbin: given those parameters. Theta

292
00:43:29.210 --> 00:43:39.290
Michael Carbin: And the reason why we did this, and with this approach is because now it's going to allow us to do gradient-based optimization. What we can do is now take gradients with respect to these parameters.

293
00:43:39.840 --> 00:43:45.900
Michael Carbin: Some judges gradient descent to try to figure out if we can find effective parameters. They're going to minimize

294
00:43:46.030 --> 00:43:48.070
Michael Carbin: this this problem that we have.

295
00:43:48.470 --> 00:43:52.690
Michael Carbin: And just to put this on pictorially what we're trying to do here

296
00:43:52.700 --> 00:43:55.520
Michael Carbin: is this is sort of a visualization of

297
00:43:55.650 --> 00:44:06.130
Michael Carbin: in an instruction that we're going to have inside of our of our architecture, where there's a parameter that we can control, particularly the dispatch with

298
00:44:06.660 --> 00:44:19.930
Michael Carbin: right. And this parameter of llamca, we can look at the timing or the latency of this particular, we look at the throughput of this particular instruction as a function of this parameter,

299
00:44:21.060 --> 00:44:29.899
Michael Carbin: and so what we can see is Lvmca. What it's going to do is for each individual discrete point of this parameter. It will have a prediction.

300
00:44:30.170 --> 00:44:41.929
Michael Carbin: Now, again, the problem is, this is going to be essentially a discrete optimization, and even using Zeroth order methods, we are unable to find effective ways to navigate to space. But if we build a surrogate,

301
00:44:42.340 --> 00:44:50.580
Michael Carbin: What the Surrogate can do is now give us a smooth interpolation once trained between all of these points,

302
00:44:50.590 --> 00:45:05.989
Michael Carbin: and so now, what it allows us to do is, try to find and navigate through this curve to identify the exact timing right where this is the ground truth timing that we're going to have for this instruction, and this is going to intersect sleep between here and here,

303
00:45:06.000 --> 00:45:12.430
Michael Carbin: and this is going to give us some understanding of what this parameter should be, which we can now find. Your gradient descent

304
00:45:16.360 --> 00:45:18.419
Michael Carbin: is our surrogate optimization problem.

305
00:45:18.540 --> 00:45:24.719
We're gonna have this f hat instead of our traditional simulator, our discrete simulator, which is otherwise

306
00:45:24.730 --> 00:45:42.799
Michael Carbin: not differentiable. This is a highly discrete, highly discrete computation that has not only discrete integers that it's operating it, but very many discrete data structures that it's using to organize the simulation. So this neural network gives us now a very differentiable surrogate of this otherwise discrete contradiction.

307
00:45:43.670 --> 00:46:02.140
Michael Carbin: Now, where's the surrogate? Come from. Well, what we did is we use? I am all so. We used our previous approach. If the Mall changed it ever so slightly, so it can be conditioned on a set of parameters. We used a variant, an iphomol architecture to then mimic the behavior of our surrogate. So now this is going to introduce a surrogate learning problem.

308
00:46:02.150 --> 00:46:04.580
Michael Carbin: Our goal here is now to find a hat

309
00:46:04.770 --> 00:46:13.389
Michael Carbin: again over this data set that we're going to have. This is a data set we're using for learning slightly different, enumerating over this data set over the

310
00:46:13.400 --> 00:46:18.870
Michael Carbin: parameters and also examples. What we want to do is minimize the difference between our surrogate here

311
00:46:19.770 --> 00:46:21.219
Michael Carbin: it are actual, simply

312
00:46:21.280 --> 00:46:25.050
Michael Carbin: so. We are training our surrogate to exactly mimic

313
00:46:25.090 --> 00:46:31.889
Michael Carbin: and the best that it can. The behavior of our simply so over a wide variety of different parameter settings.

314
00:46:32.090 --> 00:46:33.709
Michael Carbin: So the result here

315
00:46:34.210 --> 00:46:54.139
Michael Carbin: are, or actually quite great, so compared to open tuner. Again, using essentially at that time the state of the art that we had for doing Zeroth order methods. It uses a wide portfolio of random search techniques and zero other methods to to try to find solutions to tuning problems for for systems

316
00:46:54.320 --> 00:47:12.129
Michael Carbin: and so on. The left. Here. What we've got is the mean, absolute percentage error. So lower is going to be better here and here. This is me validated on various different micro-structures that we're going to have from intel and also eighteen. And so what we see is open intruder by far unsuccessful in this problem.

317
00:47:12.140 --> 00:47:31.469
Michael Carbin: Our experts, these were the default expert parameters that had been provided from us, and it had been tuned manually over the years through this open source effort. And then, finally, we had. We had this tune, which came in essentially all instances with either competitive or or better accuracy, and this is again learned entirely from data.

318
00:47:33.400 --> 00:47:35.689
Michael Carbin: So we go back to our example again.

319
00:47:35.700 --> 00:47:47.070
Michael Carbin: No, L. Ovm. Where we started at was one hundred. We had it them all, which did a great job of really getting much closer to the true measured performance, and even again expert, modeled, proprietary expert model performance,

320
00:47:47.080 --> 00:47:55.009
Michael Carbin: and then dive. Tune comes in at twenty seven. So it's actually able to come in and get very close to our at the moment.

321
00:47:55.020 --> 00:48:09.420
Michael Carbin: But again we're covering the interpretability now of Llvm, in the sense that we can get full trace of what we anticipate, or what Lvm Nca. Hallucinates, or believes is the underlying execution behavior of a piece of code on our march.

322
00:48:10.630 --> 00:48:12.040
Michael Carbin: So now

323
00:48:12.900 --> 00:48:20.989
Michael Carbin: I them all d to, and even efforts, that we've had continuing on from there have been tackling this performance, modeling problem learning.

324
00:48:21.000 --> 00:48:38.510
Michael Carbin: We've We've had some work in behavioral modeling coming from a variety of different directions, and oftentimes from our approximate computing. Mark. I'm not going to talk about that there. But we've also made efforts in the past several years to think differently about even learning the compiler and the compiler itself can now take

325
00:48:38.520 --> 00:48:47.389
Michael Carbin: It's different. Or this he's learned performance models. These learn behavioral models. And really now, can we think about automatically generating compiler or automatically generating a back-end.

326
00:48:47.400 --> 00:48:51.920
Michael Carbin: Can we do all of this really from first principles from data?

327
00:48:52.070 --> 00:49:10.580
Michael Carbin: Without necessarily the the manual process that we normally have Here and there's been a very vibrant community standing up doing work in this space right now. But there is by and large a of a grand challenge here of automatically synthesizing a compiler from data. And of course, a hardware

328
00:49:11.230 --> 00:49:15.090
Michael Carbin: now. But what I want to highlight here again is the process that we went through.

329
00:49:15.100 --> 00:49:18.809
Michael Carbin: So we had this standard development methodology for a performance modeling the

330
00:49:18.990 --> 00:49:37.070
Michael Carbin: um. We then realized that what we're trying to do is it's a modeling problem, right? And in particular, this is we're trying to model uncertainty. We have some fundamental uncertainty about this this hardware, substrate, and neural networks. In the last decade or so have emerged as a very effective technique for modeling uncertainty.

331
00:49:37.170 --> 00:49:39.469
Michael Carbin: So that's how we moved to I

332
00:49:40.090 --> 00:49:54.359
Michael Carbin: now, of course, with item all we lacked. Now the interpretability of understanding our system. While this may be great, this does, while it may be great in performance in terms of how well the model matches reality.

333
00:49:54.370 --> 00:50:02.579
Michael Carbin: It's, unfortunately, not necessarily usable for all circumstances, because in many circumstances we do want to understand what our system is doing,

334
00:50:02.740 --> 00:50:06.859
Michael Carbin: and So with that we move to div tune, which is now this neural symbolic approach.

335
00:50:07.940 --> 00:50:21.659
Michael Carbin: Now, of course, if you've been paying it, I imagine you've been paying attention to Jack Gbt and everything that's been going on in the world for the past few months of nine years. But you can see now this trend of what we have with software.

336
00:50:21.670 --> 00:50:30.040
Michael Carbin: Again, where we were before neural networks, neural symbolic combinations of neural networks and traditionally written components. And now,

337
00:50:30.380 --> 00:50:48.390
Michael Carbin: really a future of software that may even include large language models, instances where our program is now going to be half-code half-trained neural networks. We're not half a third code third trade neural networks, and maybe another third of just just English. Right? You're really starting to see people

338
00:50:48.400 --> 00:50:57.120
Michael Carbin: start to report these very strange experiences of Yeah. My program is about half code and half English, and you prompts a large language model.

339
00:50:57.240 --> 00:51:00.389
Michael Carbin: And this this is a very weird feature, right?

340
00:51:00.400 --> 00:51:08.289
Michael Carbin: But I personally think this feature wasn't, man, before this is where we've been moving towards for some number of years.

341
00:51:08.300 --> 00:51:15.410
Michael Carbin: Again, as I've said before, uncertainty has been creeping in downwards from our environment,

342
00:51:15.730 --> 00:51:35.330
Michael Carbin: thinking about how we're deploying systems on all these new places where we're trying to get them to model uncertainty and machine learning has finally emerged as an effective way to model a certainty really end-to-end large-scale, differentiable systems have really been you know a huge way for thinking about how to address many of these people,

343
00:51:36.090 --> 00:51:39.929
Michael Carbin: and we're seeing that as well with things like Eisenhower and do

344
00:51:41.010 --> 00:51:45.589
Michael Carbin: coming up below as I've talked about right. This is the other case where we're seeing uncertainty

345
00:51:45.600 --> 00:52:05.379
Michael Carbin: and with tiff tun. Yeah, we're we're with iythm on diff tune. We are modeling this uncertainty in the hardware system as I've talked about right using these techniques where, of course, this hardware system is our environment. So this performance modeling particular system. But even moving forward, you know, as I talked about four, with the approximate computing example,

346
00:52:05.390 --> 00:52:19.439
Michael Carbin: our hardware systems are just going to be increasingly less precise, right? And so here we see that with again this Jacobi iterative method where we saw. Well, maybe if we recognize that we're actually

347
00:52:19.500 --> 00:52:31.070
Michael Carbin: fundamentally operating on a uncertain computing, substrate, and perhaps architect our software. With that in mind we can get performance benefits versus a replication traction.

348
00:52:31.740 --> 00:52:43.360
Michael Carbin: Right? So for me, when I see a diagram where i'm being pushed from above and being pushed from below with uncertainty, coming in all these directions. I think it's it's meant to me that it was an inevitable that we're going to start thinking about

349
00:52:43.370 --> 00:53:00.590
Michael Carbin: how we apply these sorts of neural symbolic systems systems themselves are starting to use uncertain components, things like large language models to really tackle the fact that they're all trying to model we're modeling this type of uncertainty in our in our modern software or where we're deploying our modern software systems

350
00:53:00.600 --> 00:53:05.150
Michael Carbin: really a first-order concern. And as we're seeing it just can no longer be more.

351
00:53:06.720 --> 00:53:26.210
Michael Carbin: And I think personally, there's a wide variety of different interesting research directions to take that we've been discovering, and I think that many people in the community have been working on as well, and I hope people are inspired, perhaps today to go and think about some of these problems as well, but just as a simple preview of some of the things that we've been doing.

352
00:53:26.220 --> 00:53:32.720
Michael Carbin: For example, we've been thinking about what what is a programming methodology. Now we think of methodologies like you

353
00:53:33.550 --> 00:53:35.689
Michael Carbin: declared a program or functional program,

354
00:53:35.700 --> 00:53:37.389
our object-oriented program

355
00:53:37.400 --> 00:53:54.039
Michael Carbin: like, What what are the types of programming methodologies that make sense in this in the space? And one thing that we've had in the previous few years inspired by what we've done in tune is actually a surrogate program which is starting to outline techniques and tactics for thinking about building these types of surrogates, such as what we saw with fifty.

356
00:53:54.800 --> 00:54:02.420
Michael Carbin: So, for example, the circuit, just as we we had talked about the Surrogate is going to be a neural network that's going to model the I o behavior of some.

357
00:54:03.340 --> 00:54:21.170
Michael Carbin: And what we've wanted to work towards is, can we start thinking of a theory of learning surrogates? And maybe we've been associated program analysis that allow us to write a program that we're trying to surrogatize for some definition to the problem that we're trying to or to the model that we are.

358
00:54:22.150 --> 00:54:29.970
Michael Carbin: We've made initial progress towards this right towards this theory of learning surrogates. So you know, programs are just functions. Is,

359
00:54:30.180 --> 00:54:40.689
Michael Carbin: There's nothing super special about that. And the literature you actually comb through the machine learning literature. You'll find, for example, sample complexity bounds on the

360
00:54:40.700 --> 00:54:55.459
Michael Carbin: a neural networks ability to learn classes of functions so correspondingly. There. There could be opportunities here to think about which classes of programs that neural networks can learn effectively if we're thinking about using surrogates as a primary programming tool,

361
00:54:55.950 --> 00:55:02.390
Michael Carbin: so something that we've done over the past year or so has been using the intentional representations of the program. Since we

362
00:55:02.400 --> 00:55:03.789
can analyze a program

363
00:55:03.800 --> 00:55:06.879
Michael Carbin: and then use that to provide better learning bounds.

364
00:55:06.940 --> 00:55:11.389
Michael Carbin: So, as a conqueror, it is concreteization here. If I want to learn a circuit of this type of program

365
00:55:11.400 --> 00:55:22.399
Michael Carbin: where it has a branch in it like there's a large program inside. If there's a branch I can go left, or I can go right left to F and left to G right, and what I want to do is again learn a surrogate here.

366
00:55:22.410 --> 00:55:26.389
Michael Carbin: But the question now is to learn a surrogate is, I need data.

367
00:55:26.400 --> 00:55:45.789
Michael Carbin: What What data should I generate here? And clearly I need to generate data that's going to exercise both side of these branches. Should I generate this uniformly? Should I give away? Is there some sort of the input data distribution in which i'm going to deploy it. There. There are all sorts of strategies here, but this is going to be a fundamental question. If you're trying to build a surrogate for a system.

368
00:55:45.820 --> 00:55:47.939
Michael Carbin: So what we found

369
00:55:48.260 --> 00:55:57.419
Michael Carbin: that what we can do is actually do program analysis. We can try to use program analysis to reason about the complexity of the functions

370
00:55:57.770 --> 00:56:17.599
Michael Carbin: below this branch, where so, for example, with a nice linear function over here versus something much more wiggly, which is going to have higher frequency. What you'll find is based on our analysis. Our analysis will tell you when you generate inputs. What you want to do is bias. Your examples more towards the complex programs. Complex part of the program rather than a simple part,

371
00:56:18.700 --> 00:56:23.429
Michael Carbin: seems obvious in hindsight. But it's something where we can actually now start to build

372
00:56:23.490 --> 00:56:30.599
Michael Carbin: programming languages, tool program analysis tools that now automatically inform the way in which we are going to train these sorts of neural networks

373
00:56:30.610 --> 00:56:32.399
Michael Carbin: and in preliminary results,

374
00:56:32.590 --> 00:56:48.780
Michael Carbin: where we applied this to a renderer, which is again, No. If you take diff tune, which is a no physical simulator of hardware systems as me, a physical simulator of the that amps of light. So we apply this to a renderer, and what we can see is

375
00:56:48.790 --> 00:56:58.829
Michael Carbin: of our ground. Truth image here, and if we do a naive uniform sampling which you might otherwise do versus our guided sampling is maybe a much higher quality result for the same number.

376
00:57:00.850 --> 00:57:12.829
Michael Carbin: The last one I wanted to talk about is really more open challenges. I myself have had work in the space of reasoning. How can we actually reason now about the correctness of software systems builds in these ways.

377
00:57:12.850 --> 00:57:16.969
Michael Carbin: Um. But many people in the community have also as well.

378
00:57:16.980 --> 00:57:26.810
Michael Carbin: And so I put this here. More it more is, you know, inspiration into a rope discussion, which is, How do we? What is? What does it mean for these systems to even be correct now?

379
00:57:26.930 --> 00:57:39.370
Michael Carbin: Or certainly we have traditional properties like safety. Does a program satisfy basic invariance? Right? Does it always return a result that is greater than zero, right? It is a distance metric, and always needs to do this. No fire

380
00:57:39.380 --> 00:57:54.849
Michael Carbin: do something strange. I approximate this program. It's a way. This is a basic property that needs to fold at the final property, or it could be something even more intentional where i'm. Never going to reference at all. These are classic types of invariance that we may want to be true of a piece of software.

381
00:57:54.860 --> 00:58:06.030
Michael Carbin: Well, in this concept of uncertainty we may need to and not know what we do, we may need to. We really do need to rethink these types of properties, where now, perhaps, they're more probabilistic.

382
00:58:06.050 --> 00:58:08.720
Michael Carbin: Safety hold with some probability, the

383
00:58:09.270 --> 00:58:18.049
Michael Carbin: or are my results within epsilon of some idealized right that a history or results doing both of these things.

384
00:58:18.450 --> 00:58:21.390
Michael Carbin: Now there are also concepts like generalization,

385
00:58:21.450 --> 00:58:34.089
Michael Carbin: it starts to become new, You know. I think we've only informally thought about this in some with our work, in testing and even thinking about the difference between our dev environments and reduction. But

386
00:58:34.100 --> 00:58:51.749
Michael Carbin: fundamentally, what we're What we're thinking about is a generalization problem. If I have one of these systems crafted in this way it's been developed. Now it's been fundamentally developed with with the data set that we've used during the training of this system. Will this system generalize in production to a distribution of unseen inputs.

387
00:58:52.260 --> 00:58:55.189
Michael Carbin: Now, also, questions about robustness.

388
00:58:55.200 --> 00:58:59.599
Michael Carbin: Do systems, behaviors change under slight perturbations, perturbations of data.

389
00:58:59.880 --> 00:59:18.560
Michael Carbin: Now, if I were to ever so slightly regenerate my data set. Am I going to get an entirely different system with entirely different behaviors? Because that's not necessarily going to be the most comfortable idea in the world. I'm thinking about how robust our system may be to. For example, replication. We need to build it again. If someone is trying to replicate

390
00:59:18.570 --> 00:59:20.250
Michael Carbin: the underlying system.

391
00:59:20.640 --> 00:59:26.310
Michael Carbin: Interpretability, right? Can we actually understand and triage failures of these systems? What can we do? We

392
00:59:26.320 --> 00:59:30.879
Michael Carbin: now? There's been a wide variety of different results going on the community of

393
00:59:31.100 --> 00:59:44.060
Michael Carbin: all sorts of adversarial verification of neural networks, thinking about reinforcement, learning? How do you develop monitors and shields for reinforcement learning. I think a lot of this stuff is very interesting, and I think where this system or the

394
00:59:44.070 --> 01:00:01.679
Michael Carbin: community broadly should go is how you start to really connect these to some of the the real problems people are trying to do in their day to day and constructing these types of software systems, and my hope is that we'll be able to make strong progress here, and maybe what it requires us is to rethink

395
01:00:01.690 --> 01:00:19.659
Michael Carbin: how our systems behave a little bit differently, because, for example, with we go back to our Jacobiative method. So back in two thousand and eighteen, you know, we had this result. We demonstrated the benefits of the fact that we were only going to get two point, six percent more overhead. But we also sat down and wrote some proofs,

396
01:00:19.670 --> 01:00:26.369
Michael Carbin: right? We actually realized that we actually bound the number of additional iterations that you may incur

397
01:00:26.500 --> 01:00:35.389
Michael Carbin: by using this type of strategy for this particular problem. And so essentially, you have a delta T. That's going to be some function of the Epsilon that you're going to have.

398
01:00:35.400 --> 01:00:37.189
Michael Carbin: How big of an area you're going to have.

399
01:00:37.200 --> 01:00:49.790
Michael Carbin: And the reason why this worked is because these systems are self-stabilizing systems. So joel Kobe is very interesting. Where, if you encounter one of these errors, so Jacobi will converge to the right answer,

400
01:00:49.800 --> 01:00:59.749
Michael Carbin: regardless from wherever you start. In the space of Guesses initial initializations, it's always going to converge. So that's good. And so, when you have an error,

401
01:00:59.830 --> 01:01:01.770
Michael Carbin: what happens is

402
01:01:01.960 --> 01:01:06.609
Michael Carbin: essentially, you just restart right. It's like you're restarting.

403
01:01:06.860 --> 01:01:18.290
Michael Carbin: Now, fortunately, the way these errors propagate through these systems is not catastrophic. So rather than restarting from a random point. The reason why we're converging much more quickly is because we're restarting from a point

404
01:01:18.340 --> 01:01:27.519
Michael Carbin: that is still preserved some of the information that we've made in these initial ten thousand iterations of progress. It's not truly a

405
01:01:27.860 --> 01:01:40.810
Michael Carbin: right. So, starting to rethink our systems, as perhaps more of self-stabilizing systems systems where we can bound the impacts of errors. The impacts of uncertainty may lead us towards new directions and thinking about software.

406
01:01:41.680 --> 01:01:49.339
Michael Carbin: So with that just going to wrap up my my personal opinion landscape the software is just fundamentally changing the

407
01:01:49.480 --> 01:02:03.009
Michael Carbin: it's been changing, and I think it's It's It's finally here. I don't think you can deny the fact that virtually most programs are going to be written going forward, or now going to be programmed using a large language model which are generating code that

408
01:02:03.050 --> 01:02:05.650
Michael Carbin: who knows who understands.

409
01:02:05.690 --> 01:02:08.639
Michael Carbin: So I mean, this is the feature. It's

410
01:02:09.020 --> 01:02:23.699
Michael Carbin: again future coming from below, new hardware substrates future coming from above, deploying these systems in very rich, expressive, partially observable environments. And again computations with these opaque machine learning systems.

411
01:02:24.240 --> 01:02:29.990
Michael Carbin: Now the consequence of all of this is again that there's an Epsilon. There's these up along the way.

412
01:02:30.000 --> 01:02:47.870
Michael Carbin: We really, as a community, need to start thinking more carefully about Epsilon correctness, probabilistic correctness. Because this is these are the tools that we need to think about what these systems do, and it's not a way. That, at least is piece of me as a classic computer scientist has been trained to really think about what our systems are doing.

413
01:02:47.880 --> 01:02:49.600
Michael Carbin: But I think there's an opportunity,

414
01:02:49.650 --> 01:02:57.959
Michael Carbin: if we may progress here. I think we can get much more amazing, much more capable systems, but systems that we'll also still be able to understand.

415
01:02:58.110 --> 01:02:59.360
Michael Carbin: And so with that,

416
01:03:00.800 --> 01:03:02.740
Michael Carbin: have you taken any questions in the Qa.

417
01:03:04.610 --> 01:03:10.830
Dilma Da Silva: Thank you so much, Mike. This was, you know, an amazing

418
01:03:11.030 --> 01:03:14.089
Dilma Da Silva: set up, or things for all of us to think about,

419
01:03:14.100 --> 01:03:32.169
Michael Carbin: and really ah! And we have people in in in the Q. And A. Thank you for your talk, and ah, and and things like that I'll take. Ah! The first question that came early in your presentation, and I think that go ahead.

420
01:03:32.180 --> 01:03:36.990
Michael Carbin: Can you see the the Q. And A. Or I can, I can.

421
01:03:37.000 --> 01:03:43.759
Dilma Da Silva: I will. I will be, you know, triaging and asking you the questions.

422
01:03:43.770 --> 01:03:47.790
Michael Carbin: Okay, yeah, go ahead. Let me see, I might be able to get there, but it might take a little time.

423
01:03:47.800 --> 01:03:54.720
Michael Carbin: They just Yeah, that's okay. That's okay. So

424
01:03:56.170 --> 01:04:26.000
Dilma Da Silva: So the the work that you started, you know, reminding us the A. Certain comes from everywhere, and then you start to talk about those applications and the Jacobite the ratio, as one example of applications that have some tolerance for faults that you know we had in the in the Q. And A. The comment that there was this direction of the and I think that this may refer to the working.

425
01:04:26.010 --> 01:04:47.490
Dilma Da Silva: It's like supercomputing. Did you know they were because they worked so many with those models, right, those numerical analysis, iterative models. So they have been looking at that. Do you think this is coming from the same perspective, or how you may relate to when people thought twenty years ago about fault, tolerance, outreach.

426
01:04:47.500 --> 01:04:59.050
Michael Carbin: Oh, no, absolutely. I do think this is coming from that same vein. Um! You know the work that we had there in two thousand and eighteen built on some of the observations that we're seeing in. So, for example, or systems like fault, tolerance

427
01:04:59.060 --> 01:05:15.690
Michael Carbin: gymnasts that we're going on. These were again. Systems are coming from this intersection between the scientific computing, numerical analysts, mathematicians as well as a high performance. Hpc. Community of thinking about how we can build a fault, tolerant solvers,

428
01:05:15.700 --> 01:05:29.189
Michael Carbin: and I think in that space there are some very interesting results that came out. I'm not sure how broadly they've been adopted since. But I I think my my take on these is really, how do we move that to other parts of her stack.

429
01:05:29.200 --> 01:05:31.110
Michael Carbin: Right? How did we think about

430
01:05:31.590 --> 01:05:41.109
Michael Carbin: fault, tolerance, robustness for more traditional applications as well? Where there's not necessarily quite a nice mathematical formulas?

431
01:05:41.230 --> 01:05:52.959
Michael Carbin: We all need to start something, but I think that results. Those results of those times are very interesting and impactful, and I think that's the open question. Is, Can we generalize the observations there to other manners?

432
01:05:53.810 --> 01:05:55.290
Dilma Da Silva: Thank you.

433
01:05:55.300 --> 01:05:59.560
Dilma Da Silva: Now a question on

434
01:05:59.860 --> 01:06:07.809
Dilma Da Silva: neural network correctness. So when you say that a neural net is correct with some probability. What does that even mean?

435
01:06:07.820 --> 01:06:29.310
Dilma Da Silva: It is often the case that if you give the same, input, it reproduce the same error. So the correctness. Probability is zero, and I I think that there is a weak wing here. Ah! From from our colleague on that. But what a you know, if you want to elaborate On What do you mean about a narrow net correctness with some probabilities?

436
01:06:29.700 --> 01:06:58.040
Michael Carbin: So yeah. So what is your event space? That's really the question right? Are we talking about? This? Probability is quantified over the distribution of inputs that may have passes in the system that that's one way to think about this, because otherwise yes, you're right. This is the deterministic at least a neural network as constructed is deterministic. There are certainly other techniques for randomizing behaviors that you're going to have coming out of a neural network. And so that's another alternative way of instantiating this probability

437
01:06:58.050 --> 01:07:10.730
Michael Carbin: Um, personally. So I have played games with all sorts of different styles of these types of bounds. When we're doing the approximate computing work of can we, for example, again, quantify

438
01:07:10.740 --> 01:07:26.440
Michael Carbin: What are we quantifying with That's always a fundamental question. Is it over outputs? We look at some of the work with hardware reliability, sometimes what it is is the stochasticity is going to be the hardware substrate itself. So it's subject to potential errors that we're going to have in the distribution over those.

439
01:07:26.450 --> 01:07:34.289
Michael Carbin: But we've also even looked up techniques of randomizing the algorithms themselves. And so that way we're going to have a probability associated with that.

440
01:07:35.500 --> 01:08:05.309
Dilma Da Silva: Um, We have now a question that relates to, you know, a different domain in terms of uncertainty. So possibly another extreme way. For example, with space shuttle control computers in those systems they would have a number. The the guest here said five independent computers, and they are configured to have this very high level redundancy, computing the same thing. You know the kind of thing that you you you mentioned as one

441
01:08:05.320 --> 01:08:19.719
Michael Carbin: a way of doing that. And then there was another one computer that was completely independent computer and different group with different software, really meaning that it's not only replication, but there is

442
01:08:19.729 --> 01:08:34.830
Michael Carbin: ah another entity that was trying to work with the same thing. So the question would be in, You know, when you see your research, what do you see yet to fit this domain, for example, flights computers

443
01:08:34.840 --> 01:08:49.169
Dilma Da Silva: in computer control aviation. So if lives are in the line, or if it is really a space where reliability is higher than average softer, how do you see

444
01:08:49.580 --> 01:08:53.539
Dilma Da Silva: space evolving regarding handling uncertainty,

445
01:08:53.550 --> 01:08:57.990
Michael Carbin: and I hope I made a reasonable original presentation for me.

446
01:08:58.000 --> 01:09:05.229
Michael Carbin: Yeah. So things like redundancy, i'm still core supportive of these, you know, some of the idea of doing approximation rather than redundancy is,

447
01:09:05.390 --> 01:09:20.350
Michael Carbin: is illustrative of instances where perhaps, we can think differently, we don't necessarily always need our time to see if lives are online. Perhaps you don't want to be that it's not something you want to do. You don't accept this at all. Now, having said that there are other challenges thinking in

448
01:09:20.470 --> 01:09:35.019
Michael Carbin: in this type of space. So there's a line of work that we've done, my student era excellent, and um has done what we called belief program, which the idea, and where we actually applied. This was with the Mars Polar liner and So the mars-polar liner is an interesting case study.

449
01:09:35.029 --> 01:09:43.949
Michael Carbin: So their Gpl. Actually did a report out of this lander where the little Hendridge unfortunately Mars border Lander crashed into the surface.

450
01:09:44.029 --> 01:09:46.289
Michael Carbin: Um! And so what happened?

451
01:09:47.290 --> 01:09:51.859
Michael Carbin: And an abstract view is, the Lander was coming close to the surface.

452
01:09:51.910 --> 01:10:08.190
Michael Carbin: It has a sensor system, a sensor and essentially detection system that's trying to detect whether or not it has landed on the surface. So, unfortunately, several hundred feet above the surface, it believed it had actually touched the surface, even though it had not.

453
01:10:08.200 --> 01:10:13.890
Michael Carbin: And so it turned off at his engines, and therefore fell just into the into the surface.

454
01:10:13.900 --> 01:10:23.769
Michael Carbin: Now, if you look at that report. What's interesting there is what happened was there? There was a failure to account for the uncertainty in the senses. So what happened?

455
01:10:23.820 --> 01:10:38.750
Michael Carbin: Um! Is that the altimeter that they have is detecting the distance between the craft and the surface of the moon could actually be influenced by deploying the landing gears, Or, essentially what you do to these false positives that you had

456
01:10:38.760 --> 01:10:57.590
Michael Carbin: now in designing a system. This is something they knew that was a problem, or, to say a a source of uncertainty and sensor design. They had written essentially an analysis of how the system should operate, what should be done, how to account for the uncertainty the fact that these false positives would happen. But in the software implementation they actually implemented it incorrectly.

457
01:10:57.600 --> 01:10:59.900
Michael Carbin: And so this case wasn't handled appropriate

458
01:11:00.030 --> 01:11:11.389
Michael Carbin: Now, in this line of work that we have. What we've done is we've come up with a way of a modeling language for someone to write down this sensor uncertainty, essentially saying, Okay, this sensor to produce this altitude. However,

459
01:11:11.400 --> 01:11:23.989
Michael Carbin: if the if um if we deploy the landing gear, then it's possible that this sensor or this altitude or reading has been corrupted right? And so you can actually write this down essentially as a program.

460
01:11:24.000 --> 01:11:28.590
Michael Carbin: And then what we do is we have a system that can take this uncertainty model

461
01:11:28.600 --> 01:11:42.290
Michael Carbin: and actually dynamically track the insert? It can tell you. Is it possible, right, that there's an error with this sensor at this point in time, right? And if so, you can make a different decision. So what it does is actually automate some of the

462
01:11:42.300 --> 01:12:01.469
Michael Carbin: or work that they were attempting to do there in the software implementation, but had failed to implement correctly. And so, when you think about uncertainty, at least at that safety critical level. I would say there are multiple sources. Yes, there's the reliability issue thinking about hardware reliability. So the natural liability of being in a sensor based environment. And in that environment,

463
01:12:01.480 --> 01:12:13.589
Michael Carbin: in many instances there's a manual process for trying to manage that uncertainty that we think that there's a rise of new languages, things like published to programming. Like, I said, belief programming is what we had talked about, which,

464
01:12:13.600 --> 01:12:21.190
Michael Carbin: as a similar concept of, instead of tracking probabilities of events, and states that you may be in It's going to track sets,

465
01:12:21.290 --> 01:12:30.749
Michael Carbin: and then having systems that dynamically allow you to interrogate that uncertainty to then make decisions right without you needing to encode that sort of uncertainty yourself.

466
01:12:32.360 --> 01:12:33.190
Dilma Da Silva: It's

467
01:12:33.200 --> 01:12:44.389
Michael Carbin: that's That's quite interesting, and I think i'm going now to i'm not sure if i'm going to ask a question or give you a top of my, you know core

468
01:12:44.400 --> 01:12:49.209
Dilma Da Silva: reaction to this whole story. So you know. So

469
01:12:49.870 --> 01:13:00.189
Dilma Da Silva: people who have debugged from Gcc. Errors that only appears when you have a thousand cores or something like that.

470
01:13:00.380 --> 01:13:28.590
Dilma Da Silva: It's been, you know, months, months, months trying to find this fast that of course, the designers didn't think about. So it was, you know, a corner case. So when i'm thinking now that you know my next project, I may decide to use a compiler from your group that is, you know, generated. And i'm thinking now on this one, how we can explore a behavior and try to do that, debugging that we did before, because we understood exactly like by line we could try to follow

471
01:13:28.600 --> 01:13:51.889
Michael Carbin: the follow the interpretability by doing, tracing, as you mentioned in your talk. So that part of a problem determination becomes really hard. So when systems became really large, distributed systems explored, you would look at the top conferences in systems, Osdi sosb, And there are so many papers on helping people to find problems on those systems.

472
01:13:51.900 --> 01:13:58.520
Michael Carbin: They became it right really hard. Now, when I look that if we do go, you know, Surrogate,

473
01:13:58.530 --> 01:14:28.099
Dilma Da Silva: it's part of the methodology. I'm not even saying that I heard you that it were replaced. But let's say that now the software developer developed me a new file system. Then I will have a piece that is a surrogate to capture some things, and it will have other things that may come from, you know Posix Semantics, or whatever you know, legacy you need, and i'm going to marry. The hard part is that what is this story for problem determination? Who are the people who get the call? And now there is this problem, and it's really high, and we need a swath team to fix and find it

474
01:14:28.110 --> 01:14:49.699
Michael Carbin: how we're going to work on it. And I wonder if, when you're thinking about the program with the dollars you're thinking about that about as you put pieces together. If there is some breadcrumbs or some traces that you can give to help people to to find the problems, because that may be really hard. Now, another part of it

475
01:14:49.710 --> 01:14:51.870
Dilma Da Silva: is that. Uh,

476
01:14:51.880 --> 01:15:21.810
Dilma Da Silva: if you look at things like file systems. Most files are small and die soon, right? So a lot of our file systems are inefficient. If you go to the worst possible case scenario to guarantee correctness or file systems. But in your very, you know, approach. I was trying to just to find the distribution. They put data that you said, That is a hard problem how you do those Ah, ah data sets in order to get to some notion of reliability how that all fits together, since we may not

477
01:15:21.820 --> 01:15:41.119
Michael Carbin: see those Corner Corner Corner cases very common. So the reliability i'm repeating your last slide, just saying that I think this reliability and this this story would be a big challenge mine. So you probably have more answers about, You know you probably don't see as many

478
01:15:41.130 --> 01:15:45.150
Dilma Da Silva: her obstacles as I see right now.

479
01:15:45.160 --> 01:15:48.789
Michael Carbin: Uh, no, I I see lots of obstacles, right, I,

480
01:15:48.800 --> 01:15:56.459
Michael Carbin: as you pointed out on the dip-tune side you know this. This was part of our inspiration. How can we make this interpret? How can we still

481
01:15:56.470 --> 01:16:13.930
Michael Carbin: use the power of data, but give a description that essentially fully interpretable? So the output of dip tune is just a tuned version of Ah Lvm Mmca. Where you can interrogate. Still, essentially, all the decisions that are being made. So the final output is not something where you need to understand what? An our our system

482
01:16:14.030 --> 01:16:17.359
Michael Carbin: right now more broadly than that

483
01:16:17.380 --> 01:16:18.910
Michael Carbin: it is.

484
01:16:19.120 --> 01:16:24.419
Michael Carbin: It's going to be a fundamentally interesting future. I have to say I I

485
01:16:25.560 --> 01:16:31.480
Michael Carbin: I think the forces of software developing are moving in a direction where

486
01:16:32.530 --> 01:16:35.999
Michael Carbin: to access new capabilities. These types of

487
01:16:36.050 --> 01:16:40.690
Michael Carbin: perhaps naturally uninterpretable components are going to be integrated. Right?

488
01:16:40.700 --> 01:16:41.780
Michael Carbin: So

489
01:16:42.100 --> 01:16:59.990
Michael Carbin: I think the challenges you're outlining are ones that we have to face and be prepared to face right? And think of ways of enabling people to lose this type of technology in ways that overall give a good understanding of the system, right? So it's. I

490
01:17:00.000 --> 01:17:05.690
Michael Carbin: I think these are all I don't know if i'm hopeful. I think it's unavoidable

491
01:17:05.700 --> 01:17:13.709
Michael Carbin: Right? I think the power of these tools put in people's hands are such that they can do things that they couldn't do before, and so they will do that.

492
01:17:14.700 --> 01:17:25.809
Michael Carbin: And so that that's where we're going to end up. And so what comment are we going to make? We can tell people not to do it. But I I don't think that's really going to help us

493
01:17:25.820 --> 01:17:51.190
Dilma Da Silva: in the in your talk. Ah, that's a ah question here. Ah, there was an example of a program that was mixing cold and English, you know, saying that this is a way of doing ah becoming a common way of developing software. And you, I think you showed an example. But it went ah fast to read. So you want to tell it to tell us more about an example of a programs people are holding now where they have really just natural language as piece of it.

494
01:17:52.010 --> 01:17:57.399
Michael Carbin: So there there was an example that I was seeing where someone was trying to write.

495
01:17:57.970 --> 01:18:03.259
Michael Carbin: Essentially, After that there are a few things. So

496
01:18:03.450 --> 01:18:13.389
Michael Carbin: one example is actually a student mine who was working on a a search essentially a a semantic search, a semantic, abstract search of of art.

497
01:18:13.400 --> 01:18:32.449
Michael Carbin: So he's got this program. That's largely okay. There are a couple of keywords that I want to put together, and here is suddenly this pl: There's a prompt out to this large language model that says that I want you to, you know, search for a whole bunch of papers that, let's say, have cinema of all these words that I have here, and then bring those back to

498
01:18:32.460 --> 01:18:35.890
Michael Carbin: so cinema's of this word,

499
01:18:35.900 --> 01:18:45.779
Michael Carbin: where you know word is something that's been written. This is something that's not coming as input, but synonyms of is an algorithm that is now being interpreted by a large language model.

500
01:18:46.380 --> 01:18:54.690
Michael Carbin: I. There's we we just short of encoding all possible synonyms and

501
01:18:54.700 --> 01:19:10.660
Michael Carbin: right of the words that we have and expected's inputs. This is not something that's going to be programmed manually, but can be written in English, given an intuitive understanding, and can be interpreted by a large language model. To then go and retrieve results that are related to that.

502
01:19:10.670 --> 01:19:16.669
Michael Carbin: So these are the types of computations, I think, are

503
01:19:16.830 --> 01:19:36.489
Michael Carbin: in the short term being integrated where we can give natural language specifications to what they do. But the programming implementation is I mean it's. I guess you can go and try to develop a massive thesaurus and do this. Or perhaps there's an available source that's out there. But this can just be done automatically.

504
01:19:36.500 --> 01:19:37.630
Michael Carbin: It's

505
01:19:38.300 --> 01:19:41.119
Dilma Da Silva: in that. That's okay. That's reasonable.

506
01:19:41.130 --> 01:19:59.479
Dilma Da Silva: Okay? I think we Ah, we don't have Ah, more questions. Ah, for the Nsf. People! We'll be talking to Mike again at three Pm. Eastern. Ah, for a little bit, If we have, you know people to discuss more. Ah, but thank you so much. Thank you for people who attended.

