WEBVTT

1
00:00:07.270 --> 00:00:08.436
Dilma Da Silva: Hi! Everyone

2
00:00:11.240 --> 00:00:16.589
Dilma Da Silva: Hi! Everyone! I'm so pleased to have you here attending our size. Distinguished lecture.

3
00:00:17.091 --> 00:00:27.890
Dilma Da Silva: So I am, Dilmo da Silva. I'm the division director for communication and computing and communication foundations. And I'm you know, here the host today.

4
00:00:28.753 --> 00:00:39.019
Dilma Da Silva: I'll be very. I'll give you a very brief introduction to our speaker. We think it's much better that you know he will tell more as he sees fit.

5
00:00:39.210 --> 00:00:51.630
Dilma Da Silva: So Sumiti Gawan is the distinguished scientist and vice president at Microsoft, where he spearheads scientific advancements in AI technology and their integration into mass market products.

6
00:00:51.770 --> 00:00:55.812
Dilma Da Silva: So sumit, this sounds really like a very interesting job. So

7
00:00:56.490 --> 00:01:03.900
Dilma Da Silva: maybe people can even ask more about it later on, if we have time. So feel free to start. Thank you so much for

8
00:01:03.970 --> 00:01:05.239
Dilma Da Silva: of being here.

9
00:01:06.130 --> 00:01:09.170
Sumit Gulwani: Yes, thank you, Dilma, for this opportunity.

10
00:01:09.180 --> 00:01:14.060
Sumit Gulwani: and good morning. Good afternoon. Good evening to everyone who is joining in.

11
00:01:16.160 --> 00:01:18.960
Sumit Gulwani: So back in 1995,

12
00:01:18.980 --> 00:01:21.639
Sumit Gulwani: when I was picking colleges.

13
00:01:22.280 --> 00:01:23.890
Sumit Gulwani: I have 2 options.

14
00:01:24.760 --> 00:01:28.400
Sumit Gulwani: pick electrical engineering at it, comfort.

15
00:01:28.620 --> 00:01:30.670
Sumit Gulwani: which was closer to my hometown

16
00:01:31.600 --> 00:01:35.540
Sumit Gulwani: or study computer science in a different iit.

17
00:01:36.350 --> 00:01:37.910
Sumit Gulwani: Now I left programming.

18
00:01:38.470 --> 00:01:42.970
Sumit Gulwani: but the desire to stay closer to parents was stronger.

19
00:01:43.910 --> 00:01:47.439
Sumit Gulwani: and I thought I could self teach myself programming.

20
00:01:47.830 --> 00:01:51.279
Sumit Gulwani: So I ended up picking electrical engineering at it. Kanpur.

21
00:01:52.780 --> 00:01:56.870
Sumit Gulwani: So the 1st year in one of our intro programming courses.

22
00:01:57.820 --> 00:02:01.359
Sumit Gulwani: the instructor told us that we can multiply

23
00:02:01.580 --> 00:02:03.569
Sumit Gulwani: 2, 2 by 2 matrices

24
00:02:03.670 --> 00:02:06.570
Sumit Gulwani: using 7 multiplications instead of 8.

25
00:02:07.640 --> 00:02:09.529
Sumit Gulwani: Now that sounded very intriguing.

26
00:02:10.289 --> 00:02:12.200
Sumit Gulwani: and I wondered, how can you do that

27
00:02:13.030 --> 00:02:15.639
Sumit Gulwani: now? At that time there was no Google.

28
00:02:15.790 --> 00:02:18.159
Sumit Gulwani: and much less chat gpt.

29
00:02:19.170 --> 00:02:24.030
Sumit Gulwani: So I went back to the instructor and asked them, How is this possible?

30
00:02:25.450 --> 00:02:27.610
Sumit Gulwani: And you know what? The instructor responded.

31
00:02:29.090 --> 00:02:34.090
Sumit Gulwani: So they said that this premium material is meant only for Css. Students.

32
00:02:36.070 --> 00:02:38.130
Sumit Gulwani: and that upset me enough

33
00:02:39.150 --> 00:02:42.419
Sumit Gulwani: to rewrite the joint entrance examination

34
00:02:43.100 --> 00:02:46.820
Sumit Gulwani: and get myself admitted to study computer science at Iit Concord.

35
00:02:49.850 --> 00:02:51.690
Sumit Gulwani: and noticing that

36
00:02:51.800 --> 00:02:53.339
Sumit Gulwani: I followed it up

37
00:02:53.440 --> 00:02:57.630
Sumit Gulwani: with a Phd. In computer science at Uc. Berkeley.

38
00:02:59.590 --> 00:03:05.069
Sumit Gulwani: And then, after I got trained as a scientist, and I started my job at Microsoft Research.

39
00:03:06.130 --> 00:03:09.590
Sumit Gulwani: One of the 1st problems that I decided to tackle

40
00:03:10.270 --> 00:03:13.389
Sumit Gulwani: 1st how you can automatically synthesize

41
00:03:13.640 --> 00:03:15.750
Sumit Gulwani: such tricky programs

42
00:03:16.660 --> 00:03:21.660
Sumit Gulwani: say, by searching over the state space of all possible programs of a given size.

43
00:03:24.160 --> 00:03:27.880
Sumit Gulwani: And it took me 5 years to get my revenge

44
00:03:28.410 --> 00:03:30.069
Sumit Gulwani: when we had a paper

45
00:03:30.480 --> 00:03:32.100
Sumit Gulwani: that could do just that.

46
00:03:33.260 --> 00:03:36.190
Sumit Gulwani: And, in fact, one of the enabling technologies here was.

47
00:03:36.830 --> 00:03:46.839
Sumit Gulwani: and ability to verify the correctness of such programs in the 1st place, and that's why the paper is titled from program verification to program synthesis.

48
00:03:49.850 --> 00:03:53.020
Sumit Gulwani: So this paper that appeared in Purple 2,010

49
00:03:53.650 --> 00:03:56.660
Sumit Gulwani: also received the most influential paper award

50
00:03:57.330 --> 00:03:58.859
Sumit Gulwani: in 2,020,

51
00:04:01.660 --> 00:04:03.030
Sumit Gulwani: and, in fact.

52
00:04:03.540 --> 00:04:06.719
Sumit Gulwani: life has come back a full circle for me.

53
00:04:08.020 --> 00:04:09.680
Sumit Gulwani: When my elmometer

54
00:04:10.700 --> 00:04:12.569
Sumit Gulwani: told me 4 weeks ago

55
00:04:12.630 --> 00:04:15.669
Sumit Gulwani: that I'm being honored with the distinguished alumnus Award

56
00:04:15.870 --> 00:04:19.960
Sumit Gulwani: for all of my program synthesis work that followed since then.

57
00:04:20.839 --> 00:04:23.939
Sumit Gulwani: And this is the journey that I will share in this talk.

58
00:04:25.980 --> 00:04:30.380
Sumit Gulwani: Now, I'm so glad that Chat Gpt didn't exist at that time.

59
00:04:30.510 --> 00:04:34.820
Sumit Gulwani: because today, if you were to just specify this in natural language.

60
00:04:35.020 --> 00:04:37.530
Sumit Gulwani: It will actually give you this program

61
00:04:37.940 --> 00:04:42.770
Sumit Gulwani: of how to multiply 2 by 2 matrices using 7 multiplications instead of it.

62
00:04:43.650 --> 00:04:46.119
Sumit Gulwani: And not only this, you know, chat Gpt can

63
00:04:46.220 --> 00:04:48.700
Sumit Gulwani: give you many other programs and algorithms.

64
00:04:49.710 --> 00:04:51.599
Sumit Gulwani: In fact, it can even give you

65
00:04:51.650 --> 00:04:55.659
Sumit Gulwani: a way to multiply 2, 2 by 2 matrices using 6 multiplications.

66
00:04:56.000 --> 00:04:58.069
Sumit Gulwani: which is mathematically impossible.

67
00:05:00.320 --> 00:05:02.430
Sumit Gulwani: Now this is the challenge

68
00:05:02.560 --> 00:05:04.679
Sumit Gulwani: with such technologies

69
00:05:04.740 --> 00:05:06.420
Sumit Gulwani: that they can hallucinate

70
00:05:06.940 --> 00:05:10.429
Sumit Gulwani: even when they don't know how to do a certain task.

71
00:05:11.260 --> 00:05:12.890
Sumit Gulwani: And maybe this is time

72
00:05:13.090 --> 00:05:17.209
Sumit Gulwani: for us to go back to program verification program synthesis

73
00:05:17.400 --> 00:05:20.419
Sumit Gulwani: so that we can validate the results of

74
00:05:20.650 --> 00:05:25.349
Sumit Gulwani: these large language models. And in this talk I'm going to touch a little bit on this.

75
00:05:25.750 --> 00:05:28.970
Sumit Gulwani: But if we can do that, we not only get reliability.

76
00:05:29.290 --> 00:05:35.089
Sumit Gulwani: maybe this is the way for us to also discover new algorithms that we do not know yet.

77
00:05:38.610 --> 00:05:42.540
Sumit Gulwani: So once I was actually returning from a seminar

78
00:05:42.940 --> 00:05:45.920
Sumit Gulwani: on program synthesis. This was in 2,009.

79
00:05:46.430 --> 00:05:52.500
Sumit Gulwani: After presenting this algorithm that I described to you from program verification to program synthesis.

80
00:05:54.090 --> 00:05:56.439
Sumit Gulwani: And there was a lady sitting next to me in the plane.

81
00:05:57.050 --> 00:06:04.819
Sumit Gulwani: She was very impressed to know that I have a Phd. In computer science and that I work for Microsoft Research, and that I work on automated programming.

82
00:06:06.830 --> 00:06:10.979
Sumit Gulwani: So she thought that maybe I could help her with something that she was struggling with.

83
00:06:11.810 --> 00:06:13.650
Sumit Gulwani: So she opens up her laptop.

84
00:06:13.880 --> 00:06:17.650
Sumit Gulwani: fires up, excel, and shows me a column of names.

85
00:06:18.070 --> 00:06:21.679
Sumit Gulwani: and asks me, how can she reformat these names

86
00:06:22.100 --> 00:06:23.749
Sumit Gulwani: by giving me an example?

87
00:06:24.650 --> 00:06:26.150
Sumit Gulwani: Now, at that time

88
00:06:26.160 --> 00:06:29.420
Sumit Gulwani: I had no idea about the programming model underneath excel.

89
00:06:29.790 --> 00:06:32.430
Sumit Gulwani: so I had to excuse myself out of the situation.

90
00:06:33.230 --> 00:06:35.239
Sumit Gulwani: But after I got back home

91
00:06:35.450 --> 00:06:38.130
Sumit Gulwani: I try to look for a solution to this problem.

92
00:06:39.180 --> 00:06:41.189
Sumit Gulwani: and it is then that I observed

93
00:06:41.210 --> 00:06:46.789
Sumit Gulwani: that many, many people struggle with simple, repetitive tasks like these.

94
00:06:47.760 --> 00:06:52.339
Sumit Gulwani: and this inspired me to develop the flash field feature in Excel.

95
00:06:52.930 --> 00:06:57.200
Sumit Gulwani: where, if you give one example of the transformation that you want to do

96
00:06:57.770 --> 00:06:59.640
Sumit Gulwani: and then press control E.

97
00:07:00.120 --> 00:07:03.740
Sumit Gulwani: The system will generalize your example into a program

98
00:07:04.170 --> 00:07:06.939
Sumit Gulwani: and run that program to automate the task for you

99
00:07:08.990 --> 00:07:11.379
Sumit Gulwani: and the enabling breakthrough. Here was

100
00:07:11.520 --> 00:07:17.219
Sumit Gulwani: the ability to automatically synthesize such programs very efficiently.

101
00:07:17.980 --> 00:07:20.190
Sumit Gulwani: And just from one or more examples.

102
00:07:24.680 --> 00:07:28.830
Sumit Gulwani: Now let me tell you a little bit about how this algorithm works.

103
00:07:30.800 --> 00:07:36.219
Sumit Gulwani: So flash, fill searches for programs in an underlying domain specific language.

104
00:07:36.490 --> 00:07:40.740
Sumit Gulwani: and in this case the language consists of a typical string algebra.

105
00:07:40.880 --> 00:07:43.979
Sumit Gulwani: that is, operators like substring or concatenate.

106
00:07:46.150 --> 00:07:50.279
Sumit Gulwani: Now, one way to search for programs in a domain specific language

107
00:07:50.390 --> 00:07:58.110
Sumit Gulwani: might be to enumerate all programs in order of increasing size and see which one fits the examples that the user has provided.

108
00:07:59.150 --> 00:08:04.640
Sumit Gulwani: But this algorithm is not going to scale, because even this Dsl has too many programs to enumerate.

109
00:08:05.810 --> 00:08:08.630
Sumit Gulwani: So the real heart of the flash wheel algorithm

110
00:08:09.270 --> 00:08:12.250
Sumit Gulwani: is a technique to do symbolic back propagation

111
00:08:12.870 --> 00:08:14.540
Sumit Gulwani: of examples

112
00:08:15.090 --> 00:08:16.979
Sumit Gulwani: over the structure of the ground.

113
00:08:19.440 --> 00:08:25.219
Sumit Gulwani: And this process also sometimes leads to multiple goals at each recursive step.

114
00:08:26.070 --> 00:08:28.859
Sumit Gulwani: So what we can do is to explore those goals

115
00:08:28.970 --> 00:08:31.379
Sumit Gulwani: in the order of their likelihood to succeed.

116
00:08:31.630 --> 00:08:36.530
Sumit Gulwani: And this can be done using machine learning techniques, and it sometimes even gives us that 10 x speed up.

117
00:08:38.110 --> 00:08:47.799
Sumit Gulwani: So at the end of this process we end up with many, many programs that satisfy the examples of the user has provided. And we need to pick. Now one of those programs.

118
00:08:48.480 --> 00:08:50.769
Sumit Gulwani: And this is done using ranking techniques.

119
00:08:51.070 --> 00:08:53.470
Sumit Gulwani: So we can rank programs based upon

120
00:08:53.600 --> 00:08:56.039
Sumit Gulwani: how simple or small the program is.

121
00:08:56.360 --> 00:08:58.589
Sumit Gulwani: And another thing that we figured out was

122
00:08:58.850 --> 00:09:01.180
Sumit Gulwani: also to leverage output features

123
00:09:01.370 --> 00:09:06.200
Sumit Gulwani: features of the output obtained from executing the program on other inputs.

124
00:09:07.150 --> 00:09:12.700
Sumit Gulwani: So if one program ends up generating outputs that look more or less uniform, then that program should be preferred.

125
00:09:14.430 --> 00:09:15.750
Sumit Gulwani: Now this technique

126
00:09:15.940 --> 00:09:18.019
Sumit Gulwani: has also stood the test of time.

127
00:09:18.950 --> 00:09:23.079
Sumit Gulwani: and this copper 2,011 paper also received the most influential paper award

128
00:09:23.230 --> 00:09:25.390
Sumit Gulwani: at the public 2021 instance.

129
00:09:27.310 --> 00:09:31.139
Sumit Gulwani: and the idea of leveraging a domain specific language

130
00:09:32.010 --> 00:09:33.819
Sumit Gulwani: is now also finding

131
00:09:34.130 --> 00:09:36.900
Sumit Gulwani: it's used with Llms as well.

132
00:09:37.290 --> 00:09:40.519
Sumit Gulwani: So I use Dsl. For efficiency.

133
00:09:41.350 --> 00:09:44.380
Sumit Gulwani: But with Llms. Dsls are quite important

134
00:09:44.430 --> 00:09:50.169
Sumit Gulwani: to ensure that the output from these models adheres to some syntactic or semantic constraints.

135
00:09:50.860 --> 00:09:54.390
Sumit Gulwani: In fact, in August 2024, Openai released an Api

136
00:09:54.650 --> 00:09:56.480
Sumit Gulwani: to allow you to actually do that.

137
00:09:57.680 --> 00:09:59.830
Sumit Gulwani: And in fact, this entire paradigm

138
00:10:00.310 --> 00:10:02.839
Sumit Gulwani: of us generating lots of programs

139
00:10:03.540 --> 00:10:06.359
Sumit Gulwani: and then ranking them, using some extra signals.

140
00:10:06.470 --> 00:10:09.180
Sumit Gulwani: is also very common with Llm. Student.

141
00:10:13.110 --> 00:10:15.360
Sumit Gulwani: Now Flashhell became very popular.

142
00:10:15.790 --> 00:10:17.759
Sumit Gulwani: and my sister.

143
00:10:18.230 --> 00:10:21.219
Sumit Gulwani: who parents are middle schooler figured out

144
00:10:21.390 --> 00:10:24.709
Sumit Gulwani: that flash will make it into middle school computing textbooks.

145
00:10:25.950 --> 00:10:28.920
Sumit Gulwani: But you know what my most heartening accolade was.

146
00:10:30.230 --> 00:10:31.849
Sumit Gulwani: It came from my father.

147
00:10:32.910 --> 00:10:37.160
Sumit Gulwani: who told me that, son. Now I understand your research for the 1st time

148
00:10:39.080 --> 00:10:41.150
Sumit Gulwani: so sadly. My father passed away

149
00:10:41.160 --> 00:10:42.649
Sumit Gulwani: couple of years ago.

150
00:10:42.980 --> 00:10:44.890
Sumit Gulwani: but he had a big influence

151
00:10:45.280 --> 00:10:47.050
Sumit Gulwani: on my scientific journey.

152
00:10:47.990 --> 00:10:52.259
Sumit Gulwani: He always used to nag me. How does your research help common people?

153
00:10:53.340 --> 00:10:57.620
Sumit Gulwani: And that was perhaps my inspiration to get started on this line of work.

154
00:10:59.290 --> 00:11:03.490
Sumit Gulwani: Now, Flash can automate a wide variety of tasks, you know. That's why it's so popular.

155
00:11:03.940 --> 00:11:10.210
Sumit Gulwani: But there are also many tasks that it cannot do such as date transformations or number transformations.

156
00:11:11.390 --> 00:11:18.359
Sumit Gulwani: But the user experience is so inviting that it doesn't prevent people from trying it on tasks that it was never meant for.

157
00:11:18.980 --> 00:11:21.740
Sumit Gulwani: So there was this tweet that went viral

158
00:11:22.490 --> 00:11:26.529
Sumit Gulwani: where someone gave an example. Mapping Dec. To December.

159
00:11:26.910 --> 00:11:30.350
Sumit Gulwani: and then the system maps Ocd. To October.

160
00:11:31.850 --> 00:11:34.299
Sumit Gulwani: Now some people even came to my rescue

161
00:11:34.350 --> 00:11:37.779
Sumit Gulwani: by saying, maybe this is how we should have named months. In the 1st place.

162
00:11:40.440 --> 00:11:42.700
Sumit Gulwani: I even came across this shipping label

163
00:11:43.230 --> 00:11:48.260
Sumit Gulwani: where someone likely incorporated flash fill as part of their process automation.

164
00:11:50.430 --> 00:11:58.220
Sumit Gulwani: But you know, the beautiful thing about this kind of feedback is that it helps inspires the next scientific problems that you might want to go solve.

165
00:11:58.620 --> 00:12:00.370
Sumit Gulwani: And this is exactly what we did.

166
00:12:01.270 --> 00:12:04.040
Sumit Gulwani: So the next version of Lashville

167
00:12:04.570 --> 00:12:09.080
Sumit Gulwani: that we have built actually supports such data number transformations.

168
00:12:10.110 --> 00:12:15.220
Sumit Gulwani: For instance, look at the task here, where in column a, you have a string.

169
00:12:16.880 --> 00:12:26.309
Sumit Gulwani: and the user wants to extract the year and the month out of it, and then convert it into the year quarter format, as you see on the right side in column B,

170
00:12:26.940 --> 00:12:31.830
Sumit Gulwani: and this new system can infer the right transformation from a couple of examples.

171
00:12:32.980 --> 00:12:34.490
Sumit Gulwani: and not only that.

172
00:12:34.740 --> 00:12:41.740
Sumit Gulwani: it will also give you a readable formula in various languages, such as python, or in this case the excel formula language.

173
00:12:42.350 --> 00:12:46.660
Sumit Gulwani: In fact, this was one of the most common asks from flash fill users.

174
00:12:46.870 --> 00:12:51.179
Sumit Gulwani: because they would like to get some transparency into the underlying process.

175
00:12:52.470 --> 00:12:57.610
Sumit Gulwani: So this functionality ships as the formula by example, feature in excel copilot

176
00:13:00.120 --> 00:13:03.219
Sumit Gulwani: one of the videos from an Excel influencer that I saw

177
00:13:03.610 --> 00:13:09.030
Sumit Gulwani: talked about how they find that this is 10 x better than flash film.

178
00:13:09.530 --> 00:13:13.879
Sumit Gulwani: And we often talk about 10 X expressions. So this was very heartening to see

179
00:13:15.830 --> 00:13:17.920
Sumit Gulwani: my most favorite tagline

180
00:13:17.950 --> 00:13:20.290
Sumit Gulwani: in a video that was made for this feature

181
00:13:21.200 --> 00:13:23.220
Sumit Gulwani: is the one on the bottom. Right here

182
00:13:23.930 --> 00:13:27.879
Sumit Gulwani: we have someone very explicitly says that this is not chat gpt.

183
00:13:29.080 --> 00:13:30.979
Sumit Gulwani: So how does this technology work?

184
00:13:31.400 --> 00:13:35.120
Sumit Gulwani: So what we did was we extended the flash full language

185
00:13:35.790 --> 00:13:38.810
Sumit Gulwani: to include date and number transformations

186
00:13:38.910 --> 00:13:44.969
Sumit Gulwani: and many stylized ways of doing string transformations so that we can generate good readable programs.

187
00:13:45.880 --> 00:13:48.579
Sumit Gulwani: But the big challenge when you extend the Dsl

188
00:13:49.280 --> 00:13:51.760
Sumit Gulwani: is that it leads to scalability issues.

189
00:13:52.060 --> 00:13:55.279
Sumit Gulwani: So we had to come up with some very interesting optimizations

190
00:13:55.300 --> 00:13:56.909
Sumit Gulwani: that combine the

191
00:13:57.150 --> 00:13:59.920
Sumit Gulwani: top down search process. That flash will uses

192
00:14:00.140 --> 00:14:02.629
Sumit Gulwani: with some hints from bottom up. Search as well.

193
00:14:03.820 --> 00:14:08.070
Sumit Gulwani: So if you are interested in reading a recent AI paper that is not chat gpt.

194
00:14:08.110 --> 00:14:11.469
Sumit Gulwani: So this one in power? 2023 would be it.

195
00:14:13.030 --> 00:14:28.279
Sumit Gulwani: Now, one question that you might ask is, why not use chat gpt for these kinds of tasks. And actually, it's a great suggestion, because Chat Gpt has a lot of worldly knowledge. And if you ask it, you know, if DC. Goes through December. It will actually tell you that Oct. Goes to October.

196
00:14:29.830 --> 00:14:35.869
Sumit Gulwani: and, in fact, this was the title of my entire keynote that I gave at cab last summer.

197
00:14:36.570 --> 00:14:39.090
Sumit Gulwani: but I will just summarize it in, you know, one slide

198
00:14:40.830 --> 00:14:50.319
Sumit Gulwani: so Gpt 3 can actually do a syntactic transformations of the kind that you see here where you want to extract the 1st 3 characters from the month name.

199
00:14:51.630 --> 00:14:55.869
Sumit Gulwani: However, its bias about the real world. Knowledge is so strong.

200
00:14:56.340 --> 00:14:59.479
Sumit Gulwani: then, if you ask it to extract the 1st 4 characters.

201
00:14:59.680 --> 00:15:04.430
Sumit Gulwani: it cannot even learn to do that, even from 4 examples, it will not not get it right.

202
00:15:06.460 --> 00:15:09.499
Sumit Gulwani: But if you were to give this task to Gpt. 4,

203
00:15:09.670 --> 00:15:11.300
Sumit Gulwani: it will get this right.

204
00:15:12.230 --> 00:15:15.179
Sumit Gulwani: But if you ask him to extract the 1st 5 characters.

205
00:15:15.280 --> 00:15:16.940
Sumit Gulwani: then it fails to do that one.

206
00:15:18.460 --> 00:15:21.599
Sumit Gulwani: and maybe Gpt. 5 will probably solve this problem

207
00:15:23.940 --> 00:15:28.229
Sumit Gulwani: now. Over the last year, you know, after giving the talk

208
00:15:28.770 --> 00:15:31.410
Sumit Gulwani: we have been investigating more closely.

209
00:15:31.710 --> 00:15:37.219
Sumit Gulwani: Why is it that the Llms are not so great at programming by example problems? And what can we do

210
00:15:37.470 --> 00:15:41.300
Sumit Gulwani: to have them solve these problems more effectively.

211
00:15:41.600 --> 00:15:43.180
Sumit Gulwani: So this is what we figured out.

212
00:15:43.680 --> 00:15:49.559
Sumit Gulwani: The programming by example actually requires you to predict the execution semantics of a given program.

213
00:15:50.760 --> 00:15:57.100
Sumit Gulwani: And we know that Llms are not very good at doing arithmetic operations, for instance.

214
00:15:58.060 --> 00:15:58.950
Sumit Gulwani: but

215
00:15:59.060 --> 00:16:03.120
Sumit Gulwani: the way to do arithmetic is to have them generate number sentences.

216
00:16:04.340 --> 00:16:09.340
Sumit Gulwani: and the same kind of phenomenon also extends to more complicated operations that we have in programming languages.

217
00:16:10.040 --> 00:16:11.579
Sumit Gulwani: and the trick here is

218
00:16:11.740 --> 00:16:17.190
Sumit Gulwani: to actually show the exhibition results to the model as it is decoding its output.

219
00:16:17.740 --> 00:16:19.930
Sumit Gulwani: And I'll show you how this works on our next slide.

220
00:16:20.940 --> 00:16:23.710
Sumit Gulwani: And the other challenge in programming by examples is

221
00:16:23.790 --> 00:16:29.900
Sumit Gulwani: that once you have a tricky task where you're manipulating some dirty data, and you need 2 or more examples.

222
00:16:30.030 --> 00:16:33.010
Sumit Gulwani: then it is not really a translation kind of problem

223
00:16:33.370 --> 00:16:35.720
Sumit Gulwani: as you would normally have from natural language.

224
00:16:35.830 --> 00:16:41.800
Sumit Gulwani: It is really more of a search problem because you want to find the common logic that works for multiple examples.

225
00:16:42.390 --> 00:16:47.820
Sumit Gulwani: And here the idea is that let Llm. You know, do the search. So let's see how we can do that.

226
00:16:49.410 --> 00:16:55.900
Sumit Gulwani: So here I have a task where I want to extract the last name of the author from some bibliographic entries.

227
00:16:56.690 --> 00:16:59.499
Sumit Gulwani: and if I give these 4 examples to Gpt. 4.

228
00:16:59.970 --> 00:17:07.370
Sumit Gulwani: It gives me a program which doesn't really do the right job. And, as you can see in the last column, it makes some errors.

229
00:17:08.770 --> 00:17:10.420
Sumit Gulwani: So what can we do instead?

230
00:17:11.069 --> 00:17:13.169
Sumit Gulwani: So that here is that main idea.

231
00:17:13.609 --> 00:17:15.589
Sumit Gulwani: So as Gpt. 4

232
00:17:15.730 --> 00:17:17.560
Sumit Gulwani: is generating the program

233
00:17:17.920 --> 00:17:19.300
Sumit Gulwani: line by line.

234
00:17:19.780 --> 00:17:23.079
Sumit Gulwani: so you take control back after it has generated a line.

235
00:17:23.410 --> 00:17:24.950
Sumit Gulwani: Can you augment that

236
00:17:25.380 --> 00:17:28.789
Sumit Gulwani: with the execution semantics of that line

237
00:17:28.850 --> 00:17:31.659
Sumit Gulwani: on the various inputs that you have in the examples.

238
00:17:31.780 --> 00:17:35.820
Sumit Gulwani: And now, Gpt. 4 is able to see what is the effect of this

239
00:17:36.100 --> 00:17:38.620
Sumit Gulwani: program sentence that it has emitted.

240
00:17:39.030 --> 00:17:41.710
Sumit Gulwani: and then you give the control back to Gpt. 4.

241
00:17:41.870 --> 00:17:46.889
Sumit Gulwani: It will output, underline you, augment it back with the execution of that line.

242
00:17:47.350 --> 00:17:53.380
Sumit Gulwani: And as this process continues, you know, you stop when you hit the outputs in the examples that you have.

243
00:17:53.890 --> 00:17:55.939
Sumit Gulwani: and this actually gives you the correct result.

244
00:17:57.250 --> 00:18:02.250
Sumit Gulwani: Now, in the very same methodology, you can also employ another extremely powerful idea.

245
00:18:03.310 --> 00:18:09.270
Sumit Gulwani: which is that instead of asking the Gpt. 4 to give you one candidate for the next line.

246
00:18:10.070 --> 00:18:13.769
Sumit Gulwani: you ask it to generate multiple candidates for the next line.

247
00:18:14.680 --> 00:18:18.220
Sumit Gulwani: and then you can combine all of those candidates together

248
00:18:18.540 --> 00:18:20.519
Sumit Gulwani: as one program segment

249
00:18:20.980 --> 00:18:23.140
Sumit Gulwani: and add it to the program.

250
00:18:24.510 --> 00:18:29.510
Sumit Gulwani: And now the model can decide which of these lines it wants to use

251
00:18:29.580 --> 00:18:32.020
Sumit Gulwani: for its further computation downstream.

252
00:18:32.390 --> 00:18:34.419
Sumit Gulwani: So this is a very simple way

253
00:18:34.920 --> 00:18:36.560
Sumit Gulwani: to convert Llm.

254
00:18:37.060 --> 00:18:38.539
Sumit Gulwani: Into a search engine.

255
00:18:41.520 --> 00:18:43.419
Sumit Gulwani: So what you've seen up till now

256
00:18:43.600 --> 00:18:47.610
Sumit Gulwani: is examples being a very effective modality

257
00:18:47.830 --> 00:18:55.480
Sumit Gulwani: for doing string transformations. So turns out there are other kinds of tasks also which are amenable to example based intent specification.

258
00:18:56.762 --> 00:18:59.930
Sumit Gulwani: I'll show you an instance of table extraction next.

259
00:19:01.870 --> 00:19:05.689
Sumit Gulwani: So here is a task taken from a data science class

260
00:19:06.120 --> 00:19:11.099
Sumit Gulwani: where the instructor asks the students to take a semi structure text file that you see here

261
00:19:11.840 --> 00:19:13.589
Sumit Gulwani: and convert it into a

262
00:19:13.980 --> 00:19:15.699
Sumit Gulwani: proper Csv format.

263
00:19:16.700 --> 00:19:20.430
Sumit Gulwani: and the instructor provides the students a script to build on top of.

264
00:19:21.570 --> 00:19:28.059
Sumit Gulwani: Now let me show you how this experience can be made much more delightful using programming by example interface.

265
00:19:29.100 --> 00:19:31.279
Sumit Gulwani: So this is the same file that I've loaded

266
00:19:31.380 --> 00:19:32.720
Sumit Gulwani: in my playground.

267
00:19:33.470 --> 00:19:40.270
Sumit Gulwani: And now all that I do is to give you examples of the various fields I want to extract. So let's say, I want to extract the championship name.

268
00:19:40.910 --> 00:19:43.120
Sumit Gulwani: So I select one example.

269
00:19:43.470 --> 00:19:47.090
Sumit Gulwani: and then I highlight another instance of that field.

270
00:19:47.480 --> 00:19:52.980
Sumit Gulwani: and after I provide 2 instances, the system is smart enough now to learn a program

271
00:19:53.030 --> 00:19:58.029
Sumit Gulwani: and use that program to extract other instances of that field efficiently from this file.

272
00:19:59.070 --> 00:20:01.519
Sumit Gulwani: Now suppose I want to extract a different field.

273
00:20:02.440 --> 00:20:05.110
Sumit Gulwani: say that here I give one example.

274
00:20:05.450 --> 00:20:08.969
Sumit Gulwani: and now one example suffices for the underlying system

275
00:20:09.190 --> 00:20:10.639
Sumit Gulwani: to infer my intent.

276
00:20:11.630 --> 00:20:14.080
Sumit Gulwani: Now, let's say I also want to extract the winning score.

277
00:20:14.680 --> 00:20:16.530
Sumit Gulwani: So when I give one example.

278
00:20:17.410 --> 00:20:22.259
Sumit Gulwani: the system actually makes a mistake in the 3rd record here, because it's in a different format.

279
00:20:23.430 --> 00:20:28.439
Sumit Gulwani: Now, I can fix this example. I can give the right output for the 3rd row.

280
00:20:28.540 --> 00:20:31.900
Sumit Gulwani: and the system will then convert to the intent that I have in mind.

281
00:20:32.780 --> 00:20:34.930
Sumit Gulwani: But what if this mistake

282
00:20:34.960 --> 00:20:37.730
Sumit Gulwani: was somewhere deep within your long file?

283
00:20:38.860 --> 00:20:47.370
Sumit Gulwani: In fact, if you are programming it yourself. You know all bets are off, because you might not even notice that such a mistake actually occurred, and you might get wrong results in your analysis.

284
00:20:48.350 --> 00:20:52.069
Sumit Gulwani: But in case of programmable example, there's something very beautiful that you can do.

285
00:20:53.710 --> 00:21:01.340
Sumit Gulwani: So you take top ranked programs that a system has generated from few examples, and you run all of them in parallel

286
00:21:01.730 --> 00:21:03.000
Sumit Gulwani: and see

287
00:21:03.060 --> 00:21:07.310
Sumit Gulwani: if they have a discrepancy in their outputs on some future record.

288
00:21:07.410 --> 00:21:13.389
Sumit Gulwani: And if so, you point that out to the user. So in this case, if you take the top 2 end programs and run them.

289
00:21:13.580 --> 00:21:18.270
Sumit Gulwani: you'll notice that those programs actually produce different outputs on record 3.

290
00:21:18.300 --> 00:21:24.240
Sumit Gulwani: And this is exactly what you show to the user. And you ask the user, do you want out this output or the other output or something else.

291
00:21:24.400 --> 00:21:28.369
Sumit Gulwani: And in this case no, I want something else, which is 16. Dash 7.

292
00:21:28.500 --> 00:21:31.549
Sumit Gulwani: And when I do that I actually get the correct results.

293
00:21:32.940 --> 00:21:38.789
Sumit Gulwani: So again, this technology works exactly along the same methodology that I showed you for flash fill.

294
00:21:38.980 --> 00:21:41.840
Sumit Gulwani: except that the domain specific language is different here.

295
00:21:42.490 --> 00:21:44.700
Sumit Gulwani: But what the demonstration showed

296
00:21:44.890 --> 00:21:49.350
Sumit Gulwani: was the importance of helping the user identify the right examples.

297
00:21:49.520 --> 00:21:52.500
Sumit Gulwani: especially when the data might be dirty and it might be big.

298
00:21:53.750 --> 00:21:55.959
Sumit Gulwani: And this idea of asking the user.

299
00:21:56.030 --> 00:22:01.460
Sumit Gulwani: oh, on this row? 37. Do you mean output one or output 2. Or do you want to provide another output

300
00:22:01.660 --> 00:22:03.409
Sumit Gulwani: is taken from this paper

301
00:22:03.670 --> 00:22:08.980
Sumit Gulwani: on distinguishing Inputs, which also received the most influential paper award recently.

302
00:22:09.390 --> 00:22:14.169
Sumit Gulwani: and the other ways to also help people figure out potential discrepancies

303
00:22:14.290 --> 00:22:21.589
Sumit Gulwani: by analyzing the patterns in the input row, or even patterns in the output that is generated from picking some specific program.

304
00:22:25.450 --> 00:22:28.689
Sumit Gulwani: So this demonstration that I showed you

305
00:22:28.880 --> 00:22:30.949
Sumit Gulwani: or custom text files

306
00:22:31.310 --> 00:22:38.790
Sumit Gulwani: can also be applied to semi-structure documents of different kinds, such as web pages, Pdf. Json, and so forth.

307
00:22:39.520 --> 00:22:45.209
Sumit Gulwani: And many of these technologies actually ship as data connectors inside the Microsoft product called power Bi.

308
00:22:45.770 --> 00:22:48.810
Sumit Gulwani: and they've been ranked as top connectors in in surveys.

309
00:22:49.730 --> 00:22:52.269
Sumit Gulwani: But the most interesting thing we figured out was

310
00:22:52.400 --> 00:23:02.179
Sumit Gulwani: that these technologies had very high usage during Covid times, and people would try to mash up different messy data sets to build up important dashboards.

311
00:23:06.100 --> 00:23:09.789
Sumit Gulwani: Okay, now, let me move on to a different way of expressing intent.

312
00:23:11.660 --> 00:23:16.259
Sumit Gulwani: So when we build these data connectors, these technologies for extracting tables.

313
00:23:16.970 --> 00:23:19.370
Sumit Gulwani: the product team challenged us

314
00:23:19.930 --> 00:23:26.039
Sumit Gulwani: and told us that. Oh, we don't have time to build these rich user experiences that you're talking about.

315
00:23:27.540 --> 00:23:31.339
Sumit Gulwani: Can you actually do such table extraction tasks

316
00:23:31.550 --> 00:23:33.269
Sumit Gulwani: from 0 examples.

317
00:23:34.190 --> 00:23:36.459
Sumit Gulwani: And I thought they had gone some gone crazy

318
00:23:36.810 --> 00:23:44.649
Sumit Gulwani: because even giving one to 2 examples, you know, such a tremendous boost for the user as opposed to writing the entire parsing script.

319
00:23:44.960 --> 00:23:47.380
Sumit Gulwani: But how on earth can you do it with 0 examples?

320
00:23:48.150 --> 00:24:01.179
Sumit Gulwani: But when I thought more about it. I realized that if I show you a semi-structure document you can pretty much figure out what the various fields are. You don't need to provide me one or 2 examples of the various tens of fields that might be there in the document.

321
00:24:01.550 --> 00:24:04.480
Sumit Gulwani: So this is what inspired us to develop technologies

322
00:24:04.740 --> 00:24:08.630
Sumit Gulwani: that can understand the user's intent for the underlying table

323
00:24:08.870 --> 00:24:15.900
Sumit Gulwani: by just looking at the raw table and without requiring the users to give some examples. So let me show you an experience

324
00:24:16.030 --> 00:24:20.310
Sumit Gulwani: of what might be possible with these technologies.

325
00:24:20.900 --> 00:24:24.109
Sumit Gulwani: So this is a video recorded by my colleague Arjun.

326
00:24:24.970 --> 00:24:47.299
Sumit Gulwani: a table in a Pdf. Here. This one is about rice production. I don't know much about agriculture, but I do know a lot about the pain of pasting Pdf tables into excel. We get all the data into one column, and it's a huge pain fixing it here we have omnitable, which is a clipboard extension built on top of pro's ingestion technology. And look at that omnitable has done the right thing.

327
00:24:47.550 --> 00:25:02.479
Sumit Gulwani: And why? Just text from Pdf, you can directly copy Pdfs from the file explorer and omnitable will read and parse the file. I really like this case, as it has figured out that the last bits of the page are shaped differently, and it's handled it really well.

328
00:25:02.950 --> 00:25:18.470
Sumit Gulwani: But Pdfs are just the beginning. Omnitable can handle many formats, web pages, malformed Csvs, Json files and many other textual formats. Here is me copying a list of movies from Imdb that I want to watch later.

329
00:25:18.520 --> 00:25:27.180
Sumit Gulwani: And it's figured out all the right details automatically. And here's 1 case where I'm trying to figure out what concerts I can see this summer.

330
00:25:27.250 --> 00:25:37.739
Sumit Gulwani: and the web page here is not even formatted as a table, but omnitable, has identified some tabular data inside, and the table that was pasted is exactly what I wanted.

331
00:25:37.820 --> 00:25:47.100
Sumit Gulwani: Tables are everywhere, but copying tabular data is currently hard and convoluted omnitable can act as a bridge. That makes this process easier.

332
00:25:48.860 --> 00:25:54.519
Sumit Gulwani: So you can see how such technologies can enable a smart copy paste experience now.

333
00:25:54.740 --> 00:26:01.040
Sumit Gulwani: And the key idea here is to actually infer and guess the examples which the user is not providing anymore

334
00:26:01.080 --> 00:26:07.400
Sumit Gulwani: and then use programming by example techniques. And today, if the data is quite noisy.

335
00:26:07.790 --> 00:26:09.940
Sumit Gulwani: we can actually even use Llms

336
00:26:10.090 --> 00:26:16.470
Sumit Gulwani: which understand the semantics of real world to actually infer what these examples might even be.

337
00:26:17.820 --> 00:26:23.100
Sumit Gulwani: But now let me actually show you another way where the user can express intent

338
00:26:23.722 --> 00:26:33.050
Sumit Gulwani: through temporal context. So what you saw was a special context. But sometimes the users intent is also hidden inside temporal context as to what they have been doing in the recent past.

339
00:26:34.060 --> 00:26:36.170
Sumit Gulwani: So suppose I'm watching a class.

340
00:26:36.420 --> 00:26:38.699
Sumit Gulwani: and I have written this.

341
00:26:39.590 --> 00:26:47.360
Sumit Gulwani: Now I can let a co-pilot take over, and it will actually complete the code for me. Right? So very impressive stuff.

342
00:26:48.790 --> 00:26:49.600
Sumit Gulwani: But

343
00:26:49.940 --> 00:26:52.999
Sumit Gulwani: what if I'm working on this code fragment?

344
00:26:53.830 --> 00:26:58.830
Sumit Gulwani: It would be hard for someone to guess what my intent here would be.

345
00:26:59.990 --> 00:27:04.070
Sumit Gulwani: However, what if you observe what I was doing

346
00:27:04.190 --> 00:27:05.690
Sumit Gulwani: a few minutes ago.

347
00:27:06.520 --> 00:27:09.479
Sumit Gulwani: So I had this other program segment.

348
00:27:10.080 --> 00:27:16.930
Sumit Gulwani: where I replaced an expression that was converting Fahrenheit into centigrade by a

349
00:27:17.050 --> 00:27:20.039
Sumit Gulwani: a call to a new method that I've just defined.

350
00:27:21.170 --> 00:27:24.939
Sumit Gulwani: and then I do a similar kind of refactoring at another place.

351
00:27:25.810 --> 00:27:27.809
Sumit Gulwani: And if you saw me doing this.

352
00:27:27.900 --> 00:27:31.920
Sumit Gulwani: then you can potentially guess what I want to do in the bottom code fragment.

353
00:27:31.970 --> 00:27:33.939
Sumit Gulwani: which is to do a similar conversion.

354
00:27:35.120 --> 00:27:38.800
Sumit Gulwani: So this is called learning the user's intent from

355
00:27:38.810 --> 00:27:40.380
Sumit Gulwani: temporal context.

356
00:27:40.700 --> 00:27:48.219
Sumit Gulwani: And the way you do this is that the user provides some examples. And then the programming by example, synthesizer will learn a transformation program

357
00:27:48.400 --> 00:27:53.400
Sumit Gulwani: for doing such poor transformations. And then you apply this program to give the user what they want.

358
00:27:54.350 --> 00:27:58.220
Sumit Gulwani: So let me show you the 1st experience that we designed around such a technology.

359
00:27:59.170 --> 00:28:00.860
Sumit Gulwani: So there are 4 boxes

360
00:28:00.980 --> 00:28:02.230
Sumit Gulwani: and the user

361
00:28:02.390 --> 00:28:07.250
Sumit Gulwani: copies and paste all the expressions they want to transform in the top left box.

362
00:28:07.870 --> 00:28:12.009
Sumit Gulwani: Then they give me an example of the transformation in the 2 bottom boxes.

363
00:28:12.030 --> 00:28:16.710
Sumit Gulwani: and then they press the magic button and all the expressions, you know, get transformed appropriately.

364
00:28:17.720 --> 00:28:22.829
Sumit Gulwani: But there's a big problem with this user experience, because it involves

365
00:28:22.870 --> 00:28:26.540
Sumit Gulwani: lot of windows, switches and copy paste.

366
00:28:27.740 --> 00:28:36.640
Sumit Gulwani: And, in fact, and in the flow user would not even realize to use such a tool when they start doing the task manually.

367
00:28:37.110 --> 00:28:40.229
Sumit Gulwani: And I guarantee you that if this is the experience we released

368
00:28:40.270 --> 00:28:43.329
Sumit Gulwani: people would not really have used this functionality.

369
00:28:44.850 --> 00:28:47.299
Sumit Gulwani: So what do we do instead?

370
00:28:49.050 --> 00:28:51.839
Sumit Gulwani: So this team T-shirt shows, you know, our solution.

371
00:28:52.190 --> 00:28:56.650
Sumit Gulwani: And the idea is to watch constantly what the user is doing.

372
00:28:58.040 --> 00:29:02.720
Sumit Gulwani: and from their potentially noisy sequence of keystrokes

373
00:29:02.960 --> 00:29:04.630
Sumit Gulwani: we try to infer

374
00:29:04.810 --> 00:29:06.790
Sumit Gulwani: any repetitive examples.

375
00:29:06.940 --> 00:29:09.330
Sumit Gulwani: And once we find those instances.

376
00:29:09.680 --> 00:29:13.910
Sumit Gulwani: then we produce suggestions for the transformation from those examples.

377
00:29:15.710 --> 00:29:18.590
Sumit Gulwani: Now, this looks a bit like clippy.

378
00:29:19.210 --> 00:29:26.909
Sumit Gulwani: because the tool is always going to hand, raise and give you suggestions about what you want to do next. And you don't want this to become Tp for code.

379
00:29:28.090 --> 00:29:36.570
Sumit Gulwani: So, for instance, if the user changes. You know, 2 integers to strings. You don't want to suggest to the user to replace all instances of integer to string.

380
00:29:37.200 --> 00:29:42.600
Sumit Gulwani: People will lose trust in your feature, and will turn off your feature. In fact, this is what happened with an initial release

381
00:29:42.610 --> 00:29:45.069
Sumit Gulwani: of this of this functionality.

382
00:29:45.640 --> 00:29:54.730
Sumit Gulwani: and we learned the hard way that it is very important to strike the right precision, recall trade off, and do the right ranking of the underlying suggestions.

383
00:29:54.790 --> 00:30:01.549
Sumit Gulwani: and for this we used many different signals, even including where the user's cursor location was.

384
00:30:02.330 --> 00:30:10.320
Sumit Gulwani: And with that we went from people writing posts on how to turn off this feature to a lot of social media love.

385
00:30:10.790 --> 00:30:16.250
Sumit Gulwani: But let me actually share with you my most heart heartening accolade for this functionality.

386
00:30:17.640 --> 00:30:27.099
Sumit Gulwani: So I'm going to play a video from the ability summit at Microsoft, where you get to hear about the pains that this developer has the challenges that she has

387
00:30:32.310 --> 00:30:37.169
Sumit Gulwani: the one in 4 computer users will develop a chronic overuse injury.

388
00:30:37.210 --> 00:30:41.460
Sumit Gulwani: Having one hand just means. I ran into it twice as fast.

389
00:30:42.220 --> 00:30:51.359
Sumit Gulwani: Suddenly every click sent pins and needles shooting up my arm. From the minute I woke up to the moment I fell asleep. If I could fall asleep

390
00:30:51.770 --> 00:30:53.799
Sumit Gulwani: the pain would not stop.

391
00:30:54.200 --> 00:31:00.079
Sumit Gulwani: I tried switching to using my shoulders and my nub more to type, and my wrist got a little bit better.

392
00:31:00.100 --> 00:31:04.989
Sumit Gulwani: But 6 months later I developed chronic tendonitis in both of my shoulders.

393
00:31:05.350 --> 00:31:11.270
Sumit Gulwani: By that point my arms were so bad I couldn't wash my own hair or hold a cup.

394
00:31:14.060 --> 00:31:20.309
Sumit Gulwani: and every doctor, I saw agreed that the root of the problem was the thing I loved the most.

395
00:31:20.850 --> 00:31:22.070
Sumit Gulwani: By computer

396
00:31:23.010 --> 00:31:28.570
Sumit Gulwani: career computer use requires a lot of repetitive gestures. I learned

397
00:31:28.630 --> 00:31:43.640
Sumit Gulwani: more than any one part of the human body is designed to do long term. A paraplegic who uses voice control will develop vocal strain. We've all probably met someone in our field whose hands or lower back hurts at the end of the workday or

398
00:31:43.920 --> 00:31:50.810
Sumit Gulwani: hurts, and some of it helped a little. But no matter what mouse you use, you still have to click it.

399
00:31:51.270 --> 00:32:04.699
Sumit Gulwani: Okay, now let me fast forward to the segment where she talks about. But now I have to repeat this 10 more where she talks about a couple of features that she uses every day that have helped alleviate the pain that she goes through

400
00:32:05.180 --> 00:32:07.170
Sumit Gulwani: or times. Ugh!

401
00:32:07.620 --> 00:32:16.729
Sumit Gulwani: But look at what happens now. I do that edit a second time, and on the 3rd time intellicode is going to do something amazing.

402
00:32:18.320 --> 00:32:22.669
Sumit Gulwani: It automatically suggests that same edit on the next line.

403
00:32:23.010 --> 00:32:26.030
Sumit Gulwani: Then, if I press the preview all shortcut.

404
00:32:27.690 --> 00:32:37.780
Sumit Gulwani: are you seeing this? It applies the edit to every spot in my file that matches that pattern, and lets me take all of those edits by just pressing tab

405
00:32:37.950 --> 00:32:46.870
Sumit Gulwani: for me. This is huge. It took fewer gestures to edit 10 lines than it would take to just get the cursor onto the next line without intellicode.

406
00:32:47.270 --> 00:32:56.080
Sumit Gulwani: And while this is great for me because typing is physically difficult. If you have a focus-based disability like Adhd, a tool like this could help you stay on track

407
00:32:56.680 --> 00:32:59.889
Sumit Gulwani: another spot. Us developers get bogged down in is

408
00:33:00.260 --> 00:33:04.959
Sumit Gulwani: so I hope you know, this inspired you to consider developing AI technologies.

409
00:33:05.050 --> 00:33:10.480
Sumit Gulwani: but also enhancing accessibility as opposed to only focusing on productivity.

410
00:33:12.007 --> 00:33:19.200
Sumit Gulwani: Now let me move on to the part of my talk. Where we will talk about a different way that the user expresses intent.

411
00:33:19.670 --> 00:33:22.669
Sumit Gulwani: not through examples, not through natural language.

412
00:33:22.850 --> 00:33:25.969
Sumit Gulwani: but by doing the task themselves.

413
00:33:26.440 --> 00:33:28.850
Sumit Gulwani: and then expecting AI

414
00:33:28.920 --> 00:33:33.550
Sumit Gulwani: to assist them to fix what might have potentially gone wrong.

415
00:33:35.860 --> 00:33:40.989
Sumit Gulwani: So one common complaint that we often see

416
00:33:41.250 --> 00:33:43.080
Sumit Gulwani: on excel health forums

417
00:33:43.520 --> 00:33:47.589
Sumit Gulwani: is, people will share the formulas that they have

418
00:33:48.170 --> 00:33:51.129
Sumit Gulwani: and the errors that it leads to

419
00:33:51.420 --> 00:33:54.489
Sumit Gulwani: when they try to run the A formula.

420
00:33:54.760 --> 00:33:57.389
Sumit Gulwani: and then they request an expert on the Health Forum

421
00:33:57.740 --> 00:33:59.539
Sumit Gulwani: to help them fix this formula.

422
00:34:00.060 --> 00:34:04.979
Sumit Gulwani: Now, in many cases the formula is just a little bit. No, away from being correct.

423
00:34:05.200 --> 00:34:06.899
Sumit Gulwani: A small edit distance away.

424
00:34:07.700 --> 00:34:12.169
Sumit Gulwani: But the syntax is often very cryptic, and the compiler error message, not very helpful.

425
00:34:12.480 --> 00:34:18.490
Sumit Gulwani: However, the expert is often able to look at this and provide the right fix, and we call this the problem of last mile repair.

426
00:34:19.270 --> 00:34:26.099
Sumit Gulwani: So I thought, if the expert can do such fixes, maybe we can also develop an AI technology to do these fixes.

427
00:34:27.139 --> 00:34:33.049
Sumit Gulwani: So in this case. The fix is, you know, just inserting these 2 characters that you see at the bottom.

428
00:34:35.270 --> 00:34:37.490
Sumit Gulwani: So here is how our workflow looks like.

429
00:34:37.989 --> 00:34:40.689
Sumit Gulwani: So given the Z buggy formula.

430
00:34:41.460 --> 00:34:49.659
Sumit Gulwani: We do some enumeration of all potential local repairs, small local repairs to the buggy formula

431
00:34:50.270 --> 00:34:52.329
Sumit Gulwani: and to speed up this process

432
00:34:52.480 --> 00:34:55.289
Sumit Gulwani: we leverage a neural module

433
00:34:55.480 --> 00:34:58.749
Sumit Gulwani: which predicts where the error might be. In the 1st place.

434
00:34:59.630 --> 00:35:05.280
Sumit Gulwani: And then, once we get a bunch of repairs, potential repairs, we use another Ml. Module

435
00:35:05.400 --> 00:35:10.400
Sumit Gulwani: to rank these repairs, and then we pick the top one and show it to the user.

436
00:35:13.710 --> 00:35:15.910
Sumit Gulwani: Now let me talk about

437
00:35:16.520 --> 00:35:19.370
Sumit Gulwani: a very important application domain

438
00:35:19.830 --> 00:35:21.490
Sumit Gulwani: for program repair.

439
00:35:21.790 --> 00:35:26.120
Sumit Gulwani: And this is an area that has not yet been disrupted by AI,

440
00:35:27.530 --> 00:35:29.140
Sumit Gulwani: the area of education.

441
00:35:31.880 --> 00:35:37.340
Sumit Gulwani: So the picture that you see here is not a stock picture, it is actually taken by a colleague at partner.

442
00:35:38.170 --> 00:35:42.289
Sumit Gulwani: These are the students waiting in the corridors of the office hours

443
00:35:42.440 --> 00:35:44.450
Sumit Gulwani: of an Intro programming course.

444
00:35:45.180 --> 00:35:47.349
Sumit Gulwani: and the line extends around the corner.

445
00:35:48.150 --> 00:35:50.429
Sumit Gulwani: and you know what the last student is thinking.

446
00:35:50.850 --> 00:35:54.680
Sumit Gulwani: Will I get a few minutes with that PA to discuss the doubts that I have.

447
00:35:55.510 --> 00:36:02.640
Sumit Gulwani: I think the big challenge with education has been that we are still stuck with this centuries old model, where one teacher is teaching

448
00:36:02.650 --> 00:36:06.929
Sumit Gulwani: 30 students in a classroom, and students are not able to get personalized feedback.

449
00:36:08.420 --> 00:36:12.249
Sumit Gulwani: In fact, the demand for programming courses is 5 times more.

450
00:36:12.640 --> 00:36:16.049
Sumit Gulwani: Then the availability and often due to lack of tas.

451
00:36:16.290 --> 00:36:22.440
Sumit Gulwani: Another sad statistics that we see is that the 80% dropout amongst us minority students.

452
00:36:22.810 --> 00:36:25.429
Sumit Gulwani: likely, because the tech stack is not adequate.

453
00:36:26.610 --> 00:36:29.580
Sumit Gulwani: What students need is not a sample solution.

454
00:36:30.120 --> 00:36:38.910
Sumit Gulwani: but feedback on how their broken solution, which reflects their thought process can be extended into a correct solution.

455
00:36:40.180 --> 00:36:43.430
Sumit Gulwani: And I think we have a huge opportunity at our hands right now

456
00:36:43.700 --> 00:36:51.339
Sumit Gulwani: to develop a technology that can give feedback to students just like a human ta would give by progressively revealing different kinds of hints.

457
00:36:51.790 --> 00:36:55.019
Sumit Gulwani: We have decades of work in program analysis and verification.

458
00:36:55.420 --> 00:37:01.900
Sumit Gulwani: And now we're only seeing a disruption that large language models have brought with respect to how good they are at programming.

459
00:37:02.420 --> 00:37:05.280
Sumit Gulwani: And we have the potential to fine tune these models

460
00:37:05.620 --> 00:37:07.930
Sumit Gulwani: with the students learning journeys

461
00:37:08.420 --> 00:37:15.539
Sumit Gulwani: because the students all across the world year after year, are struggling with the similar kinds of confusions. And this data can be collected

462
00:37:15.650 --> 00:37:17.650
Sumit Gulwani: to actually fine tune these models.

463
00:37:19.350 --> 00:37:22.230
Sumit Gulwani: So let me show you some work that we have done in this space.

464
00:37:22.690 --> 00:37:24.230
Sumit Gulwani: So what you see here

465
00:37:24.460 --> 00:37:32.230
Sumit Gulwani: is an incorrect program for the palindrome problem from the student. It has all different kinds of errors, syntax errors, semantic errors.

466
00:37:32.310 --> 00:37:36.179
Sumit Gulwani: and we're able to fix all of them, as you see in the second column.

467
00:37:37.740 --> 00:37:48.560
Sumit Gulwani: and this fix does not happen in just one call to the Llm. But we do these fixes in an iterative manner, just like how the student would have probably done it themselves.

468
00:37:48.970 --> 00:37:51.130
Sumit Gulwani: So 1st we fix syntax errors.

469
00:37:51.190 --> 00:37:52.810
Sumit Gulwani: and we do that one by one.

470
00:37:52.840 --> 00:37:57.620
Sumit Gulwani: because fixing a syntax error can lead to change in the next set of errors that you might have to fix.

471
00:37:59.220 --> 00:38:04.110
Sumit Gulwani: And then, once the program can execute, we look at the semantic discrepancy

472
00:38:04.700 --> 00:38:07.049
Sumit Gulwani: on some test cases that might be available

473
00:38:07.760 --> 00:38:09.560
Sumit Gulwani: and ask the Llm to

474
00:38:09.690 --> 00:38:11.310
Sumit Gulwani: consider making

475
00:38:11.450 --> 00:38:13.789
Sumit Gulwani: making similar changes in the student's phone.

476
00:38:16.330 --> 00:38:22.809
Sumit Gulwani: And then once we get multiple candidates on the Llm. We pick the one that is closest to the student solution.

477
00:38:23.470 --> 00:38:28.799
Sumit Gulwani: So similar kinds of process can actually lead to fixing the students code.

478
00:38:29.720 --> 00:38:34.819
Sumit Gulwani: But the interesting thing is that the students do not really need a fix to their submitted code.

479
00:38:35.270 --> 00:38:37.280
Sumit Gulwani: What the need are hints

480
00:38:37.460 --> 00:38:41.590
Sumit Gulwani: that will help increase their learning, enhance their learning experiences.

481
00:38:42.830 --> 00:38:52.510
Sumit Gulwani: So let me show you the work that we did recently in order to provide some good hints to students, and this will also be a good exercise into showing some tricks around prompt engineering.

482
00:38:54.540 --> 00:38:56.659
Sumit Gulwani: So we asked the Llm. To give a

483
00:38:56.670 --> 00:38:59.289
Sumit Gulwani: short hint to the student.

484
00:39:00.190 --> 00:39:03.399
Sumit Gulwani: and we describe what a hint should be. It should not be too detailed.

485
00:39:04.180 --> 00:39:07.830
Sumit Gulwani: so that the student is able to think.

486
00:39:08.600 --> 00:39:11.600
Sumit Gulwani: and it should not be too abstract, so that it is constructed.

487
00:39:13.940 --> 00:39:14.700
Sumit Gulwani: but

488
00:39:14.940 --> 00:39:22.169
Sumit Gulwani: just asking the Llm. To provide a hint to a student's buggy solution is not very effective. So, for instance, this is a broken

489
00:39:22.410 --> 00:39:25.290
Sumit Gulwani: program from the student which has this error.

490
00:39:25.660 --> 00:39:32.110
Sumit Gulwani: and if you ask Gpt Port to generate a hint here, it will actually give you something that is not very meaningful.

491
00:39:33.070 --> 00:39:39.390
Sumit Gulwani: So one trick is to enhance the prompt here with respect to asking the model to do some chain of thought, reasoning.

492
00:39:40.030 --> 00:39:46.790
Sumit Gulwani: and the way you do that is by asking the model to actually 1st describe the bug in detail and the fix

493
00:39:47.060 --> 00:39:51.159
Sumit Gulwani: and use that to actually generate this hint, which is not very concealing

494
00:39:53.403 --> 00:39:57.630
Sumit Gulwani: but even before generating the fix

495
00:39:58.366 --> 00:40:03.500
Sumit Gulwani: we pass in more information, such as the failing test case.

496
00:40:04.330 --> 00:40:09.929
Sumit Gulwani: and probably even a fixed version of the buggy program that was computed using the technology that I showed you earlier.

497
00:40:10.910 --> 00:40:13.510
Sumit Gulwani: So with this kind of prompt augmentation.

498
00:40:13.800 --> 00:40:17.009
Sumit Gulwani: the tool is now able to generate the right hint.

499
00:40:18.170 --> 00:40:24.600
Sumit Gulwani: And now, if you look at the results, we went from 66% precision to 82% precision.

500
00:40:25.740 --> 00:40:30.570
Sumit Gulwani: But this is still not a good number for this tool to be deployed in classrooms.

501
00:40:30.610 --> 00:40:38.650
Sumit Gulwani: because you would likely not send her child to a teacher, that is only 82% correct. You want the accuracy to be much higher in nineties.

502
00:40:39.140 --> 00:40:44.710
Sumit Gulwani: So let me show you what we did in order to increase the appreciation, and the trick is to actually reduce the recall.

503
00:40:45.390 --> 00:40:48.629
Sumit Gulwani: and the idea is to understand

504
00:40:49.700 --> 00:40:51.359
Sumit Gulwani: what might be needed

505
00:40:51.750 --> 00:40:55.770
Sumit Gulwani: to prevent bad hints from being shown to the student.

506
00:40:57.980 --> 00:41:00.089
Sumit Gulwani: So our key insight there is

507
00:41:00.660 --> 00:41:03.350
Sumit Gulwani: that we ask the model

508
00:41:03.910 --> 00:41:15.239
Sumit Gulwani: to take the feedback that was generated by the by the model and use that feedback to fix the buggy program. So in some sense, model is going to be simulating the student.

509
00:41:18.540 --> 00:41:25.300
Sumit Gulwani: And since we are focusing on only one bug. We ask the model to resolve the remaining bugs by itself.

510
00:41:25.790 --> 00:41:27.750
Sumit Gulwani: so that we can then check

511
00:41:28.090 --> 00:41:29.490
Sumit Gulwani: whether the

512
00:41:29.590 --> 00:41:34.910
Sumit Gulwani: fix that the system did by looking at that explanation was actually a correct fix or not.

513
00:41:38.280 --> 00:41:45.589
Sumit Gulwani: however, the model might be too smart, and might be able to still generate the right fix, even if the explanation is bad.

514
00:41:47.030 --> 00:41:50.750
Sumit Gulwani: and the trick here is to provide an escape mechanism to the model.

515
00:41:51.650 --> 00:41:52.540
Sumit Gulwani: to

516
00:41:52.760 --> 00:42:00.720
Sumit Gulwani: point out if the explanation is indeed bad as opposed to just trying to look at that explanation and still trying to fix the program.

517
00:42:04.060 --> 00:42:05.670
Sumit Gulwani: And then we use a

518
00:42:05.920 --> 00:42:11.559
Sumit Gulwani: less powerful model for doing this. So hints are generated using Gpt. 4.

519
00:42:11.820 --> 00:42:14.709
Sumit Gulwani: But whether that hint was good enough or not

520
00:42:14.960 --> 00:42:17.820
Sumit Gulwani: is tested, using Gpt. 3.5,

521
00:42:20.550 --> 00:42:23.789
Sumit Gulwani: and since the process might be stochastic.

522
00:42:23.960 --> 00:42:26.820
Sumit Gulwani: What we do is we run n different trials.

523
00:42:27.420 --> 00:42:29.429
Sumit Gulwani: and then we say, a hint is good

524
00:42:29.920 --> 00:42:34.229
Sumit Gulwani: only if a certain fraction of these trials lead to success.

525
00:42:36.070 --> 00:42:40.299
Sumit Gulwani: And furthermore, we also see what the effect is

526
00:42:40.320 --> 00:42:43.760
Sumit Gulwani: if the hint is not shown to Gpt. 3.5,

527
00:42:43.860 --> 00:42:45.830
Sumit Gulwani: and in this case we require

528
00:42:46.140 --> 00:42:51.719
Sumit Gulwani: that the delta between when the hint is shown and not shown, is again significant.

529
00:42:53.340 --> 00:42:55.499
Sumit Gulwani: And once you go through this workflow

530
00:42:55.690 --> 00:43:04.299
Sumit Gulwani: we find that the performance really boosts up a lot close to 95% precision at the cost of reducing recall.

531
00:43:04.990 --> 00:43:10.250
Sumit Gulwani: And here's a table that shows the importance of each and every trick that I showed you.

532
00:43:10.630 --> 00:43:15.479
Sumit Gulwani: So if you don't use one of those tricks, you know, the performance will actually drop. Drop down.

533
00:43:19.740 --> 00:43:23.679
Sumit Gulwani: Okay. Now, let me talk about the last application

534
00:43:23.950 --> 00:43:25.640
Sumit Gulwani: in the case of repair.

535
00:43:25.750 --> 00:43:28.630
Sumit Gulwani: and this is for helping developers

536
00:43:28.810 --> 00:43:30.450
Sumit Gulwani: debug their code.

537
00:43:32.210 --> 00:43:36.419
Sumit Gulwani: Debugging is one of the most time consuming tasks for developers.

538
00:43:36.830 --> 00:43:38.420
Sumit Gulwani: So over the last year

539
00:43:38.680 --> 00:43:43.850
Sumit Gulwani: we have brought in co-pilot support to assist developers with debugging their code.

540
00:43:43.880 --> 00:43:46.679
Sumit Gulwani: So now, when you get a runtime exception in your code

541
00:43:46.760 --> 00:43:51.760
Sumit Gulwani: you will see this option to ask Copilot where copilot will give you an explanation

542
00:43:51.820 --> 00:43:55.060
Sumit Gulwani: of the potential error in your code, and also suggest a fix.

543
00:43:56.090 --> 00:44:01.490
Sumit Gulwani: So let me show you what was the 1st solution that we built for such a functionality.

544
00:44:02.500 --> 00:44:06.460
Sumit Gulwani: So like everyone else, we focused on doing

545
00:44:06.910 --> 00:44:08.719
Sumit Gulwani: prompt engineering

546
00:44:09.140 --> 00:44:16.320
Sumit Gulwani: and putting in all kind of debugging information as context for the Llm.

547
00:44:19.230 --> 00:44:25.539
Sumit Gulwani: And we noticed that there was only 25% success rate around our user study.

548
00:44:26.590 --> 00:44:28.390
Sumit Gulwani: And the reason for this was

549
00:44:28.610 --> 00:44:30.900
Sumit Gulwani: that Copilot was

550
00:44:32.010 --> 00:44:40.149
Sumit Gulwani: trying to leap towards providing a fix to the user, even when it did not know what the real error was.

551
00:44:40.960 --> 00:44:49.349
Sumit Gulwani: so it would make wrong assumptions, and will try to quickly close the conversation with the user by saying, This is how you should actually fix it.

552
00:44:51.390 --> 00:44:56.390
Sumit Gulwani: And this again gets back to the problem of hallucinations. If the model does not know something.

553
00:44:56.720 --> 00:44:59.589
Sumit Gulwani: it has been trained to still respond.

554
00:45:00.390 --> 00:45:04.340
Sumit Gulwani: So there's 1 very simple but powerful idea that you can use

555
00:45:04.620 --> 00:45:08.509
Sumit Gulwani: to prevent this from happening, or to less lower the chances of this happening.

556
00:45:10.030 --> 00:45:12.230
Sumit Gulwani: So instead of asking the model.

557
00:45:12.580 --> 00:45:14.319
Sumit Gulwani: Give me a fix

558
00:45:14.570 --> 00:45:16.760
Sumit Gulwani: to this buggy program.

559
00:45:17.190 --> 00:45:18.700
Sumit Gulwani: You ask the model.

560
00:45:19.620 --> 00:45:21.270
Sumit Gulwani: Do you know

561
00:45:21.650 --> 00:45:26.680
Sumit Gulwani: the answer to this challenge? Do you know what the issue might be.

562
00:45:26.820 --> 00:45:28.730
Sumit Gulwani: And do you know the fix to this problem?

563
00:45:29.440 --> 00:45:31.109
Sumit Gulwani: And if the model says

564
00:45:31.870 --> 00:45:36.900
Sumit Gulwani: No, I do not need any more investigation. I know how to solve it.

565
00:45:37.130 --> 00:45:39.830
Sumit Gulwani: Then you let it get back to the user.

566
00:45:40.130 --> 00:45:41.609
Sumit Gulwani: But if the model says.

567
00:45:41.710 --> 00:45:44.390
Sumit Gulwani: Yes, I need to do some more investigation.

568
00:45:44.670 --> 00:45:49.650
Sumit Gulwani: then you ask the model, do you need to involve the user and have some interaction with the user?

569
00:45:50.320 --> 00:45:55.549
Sumit Gulwani: And if the model says No, then let the model figure out this more information from the context

570
00:45:55.670 --> 00:45:57.590
Sumit Gulwani: and repeat this process.

571
00:45:57.920 --> 00:46:05.160
Sumit Gulwani: Or you go to the user and ask some questions of the user in a collaborative way to help the user figure out this issue.

572
00:46:06.460 --> 00:46:13.119
Sumit Gulwani: So with this kind of agentic workflow, we were able to get much better performance for debugging tasks.

573
00:46:13.810 --> 00:46:19.740
Sumit Gulwani: So this gan diagram shows the effect of using this agentic pattern

574
00:46:19.870 --> 00:46:22.829
Sumit Gulwani: versus using the vanilla copilot chat experience

575
00:46:23.460 --> 00:46:26.129
Sumit Gulwani: on various tasks that we give to the developer.

576
00:46:27.920 --> 00:46:32.330
Sumit Gulwani: So on the right side. What you would notice is that there are more solid bars

577
00:46:32.770 --> 00:46:37.600
Sumit Gulwani: as opposed to slanted bars, indicating more use of

578
00:46:38.020 --> 00:46:40.720
Sumit Gulwani: these technologies by the users.

579
00:46:41.930 --> 00:46:45.090
Sumit Gulwani: And another interesting thing that you'll notice on the right side

580
00:46:45.690 --> 00:46:49.200
Sumit Gulwani: is that the blue bars occur before the green bars.

581
00:46:49.620 --> 00:46:52.049
Sumit Gulwani: meaning that the tool is

582
00:46:52.220 --> 00:46:56.969
Sumit Gulwani: 1st trying to localize the problems, working with the developer

583
00:46:57.390 --> 00:46:59.639
Sumit Gulwani: and then trying to provide a fix

584
00:46:59.970 --> 00:47:05.280
Sumit Gulwani: as opposed to just leaping to a fix without even figuring out properly what might be wrong.

585
00:47:06.750 --> 00:47:13.760
Sumit Gulwani: And this leads to 88% success, which is big, you know, compared to the baseline of 25%.

586
00:47:13.890 --> 00:47:20.419
Sumit Gulwani: And here are some verbatims, and my favorite is the second one that we got from one of the users in the user study.

587
00:47:20.830 --> 00:47:28.219
Sumit Gulwani: where the user says that now they feel more empowered to be able to resolve a similar issue on their own.

588
00:47:28.390 --> 00:47:33.390
Sumit Gulwani: So you can see that these kinds of experiences can also aid learning experiences. For people.

589
00:47:35.150 --> 00:47:37.579
Sumit Gulwani: Okay, so this is the last segment of my talk.

590
00:47:37.740 --> 00:47:47.050
Sumit Gulwani: where I will talk about the most common form of intent, which is natural language. It's very versatile, and today Llms. Can even interpret it.

591
00:47:48.700 --> 00:47:56.360
Sumit Gulwani: However, if this natural language occurrence is about a task that the Llm. Does not know how to do well, because it's probably has not seen much of a training data.

592
00:47:57.400 --> 00:48:01.330
Sumit Gulwani: Then one fascinating capability that these large input models offer

593
00:48:01.380 --> 00:48:04.070
Sumit Gulwani: is that ability. For in context, learning.

594
00:48:04.730 --> 00:48:06.730
Sumit Gulwani: and this can be facilitated

595
00:48:06.750 --> 00:48:08.760
Sumit Gulwani: by giving a good example

596
00:48:08.790 --> 00:48:11.560
Sumit Gulwani: or some good part of the documentation to the model.

597
00:48:12.300 --> 00:48:17.530
Sumit Gulwani: So this is some space that my team has been doing some work recently, and I'll share some insights from there.

598
00:48:19.760 --> 00:48:26.950
Sumit Gulwani: So suppose I want to convert this natural language appearance on the left side into the SQL. Code fragment on the right side.

599
00:48:28.490 --> 00:48:33.410
Sumit Gulwani: Now, when we were trying this task, using Gpt. 3, we figured out it could not do that

600
00:48:33.830 --> 00:48:38.709
Sumit Gulwani: so the trick was, what kind of example can we pick from our example bank.

601
00:48:38.760 --> 00:48:42.569
Sumit Gulwani: so that the model can actually do this translation?

602
00:48:44.190 --> 00:48:49.649
Sumit Gulwani: And the most intuitive idea is to pick an example from the example bank.

603
00:48:50.100 --> 00:48:52.110
Sumit Gulwani: whose natural language

604
00:48:52.430 --> 00:48:56.879
Sumit Gulwani: part of the input is very similar to the natural language that the user has provided.

605
00:48:58.280 --> 00:49:04.839
Sumit Gulwani: But if you do that in this case you will notice that the model actually gives you the wrong output

606
00:49:04.860 --> 00:49:07.650
Sumit Gulwani: because it refers to a column that does not exist.

607
00:49:09.150 --> 00:49:13.340
Sumit Gulwani: Instead, there is another example in our database in our example bank.

608
00:49:13.970 --> 00:49:15.100
Sumit Gulwani: where

609
00:49:15.500 --> 00:49:20.650
Sumit Gulwani: the quote part of the example, the output part of the example, is actually very similar

610
00:49:20.670 --> 00:49:23.180
Sumit Gulwani: to the output that we desire on the right side.

611
00:49:24.150 --> 00:49:28.910
Sumit Gulwani: And if we pass this example instead, then the model can do this task.

612
00:49:29.740 --> 00:49:35.149
Sumit Gulwani: And now the challenge is, how on earth will you figure out that you should pick this example

613
00:49:35.900 --> 00:49:38.699
Sumit Gulwani: at the bottom, as opposed to the one at the top.

614
00:49:39.650 --> 00:49:41.229
Sumit Gulwani: and the trick there is

615
00:49:41.240 --> 00:49:46.220
Sumit Gulwani: to simply fine tune the model right fine tune this sentence word model

616
00:49:47.080 --> 00:49:52.260
Sumit Gulwani: to redefine the similarity of 2 utterances

617
00:49:52.910 --> 00:49:56.610
Sumit Gulwani: to match the similarity of their respective code.

618
00:49:58.910 --> 00:50:01.110
Sumit Gulwani: and to make this process efficient.

619
00:50:01.420 --> 00:50:04.610
Sumit Gulwani: you want to only pick those examples

620
00:50:04.740 --> 00:50:11.479
Sumit Gulwani: where natural language similarity is high, but code. Similarity is low, because those are the things you want to unlearn.

621
00:50:11.770 --> 00:50:15.480
Sumit Gulwani: or it is the other way around. And these are the things that you want to learn.

622
00:50:15.730 --> 00:50:20.520
Sumit Gulwani: as opposed to picking out all quadratic pairs of examples which might be very expensive.

623
00:50:21.460 --> 00:50:27.059
Sumit Gulwani: And then one other optimization that we did later on was to replace the sentence, but

624
00:50:27.430 --> 00:50:29.290
Sumit Gulwani: by the

625
00:50:29.320 --> 00:50:32.109
Sumit Gulwani: embedding slayer of these large language models

626
00:50:32.750 --> 00:50:42.380
Sumit Gulwani: which will allow you to even handle out of distribution language, because these models are very strong in the language, understanding abilities, and you simply fine tune by adding few dense layers.

627
00:50:42.510 --> 00:50:44.200
Sumit Gulwani: At the end of this embeddings.

628
00:50:46.670 --> 00:50:51.539
Sumit Gulwani: Now, one interesting optimization relates to, how do we define code similarity?

629
00:50:52.550 --> 00:50:56.639
Sumit Gulwani: So in this scenario, I want to map this natural language attendance

630
00:50:57.160 --> 00:51:00.910
Sumit Gulwani: to this expression that you see on the right side

631
00:51:01.556 --> 00:51:07.290
Sumit Gulwani: but this one is actually incorrect because it is not using the right regular expression syntax. What we really want

632
00:51:07.480 --> 00:51:09.370
Sumit Gulwani: is the one that you see at the bottom

633
00:51:09.770 --> 00:51:11.129
Sumit Gulwani: where you have dot. Dot.

634
00:51:12.090 --> 00:51:14.840
Sumit Gulwani: Now, I might have 2 examples in my example. Bank.

635
00:51:16.110 --> 00:51:17.689
Sumit Gulwani: The one at the top

636
00:51:18.870 --> 00:51:21.509
Sumit Gulwani: is actually very close

637
00:51:21.520 --> 00:51:24.780
Sumit Gulwani: to the intended output that you see on the bottom right.

638
00:51:25.270 --> 00:51:28.730
Sumit Gulwani: You just have to replace one comma by a dot dots.

639
00:51:30.270 --> 00:51:35.640
Sumit Gulwani: And if you define code similarity based on edit distance. This is the example you will likely end up picking.

640
00:51:35.920 --> 00:51:37.579
Sumit Gulwani: but then it will not really help.

641
00:51:38.760 --> 00:51:41.270
Sumit Gulwani: Instead, if you pick the example at the bottom.

642
00:51:42.010 --> 00:51:44.139
Sumit Gulwani: then it works out.

643
00:51:46.000 --> 00:51:48.800
Sumit Gulwani: But now, to pick the example at the bottom.

644
00:51:49.070 --> 00:51:54.970
Sumit Gulwani: The important thing is that port similarity should not be defined, based on edit distance, but based on some properties

645
00:51:55.150 --> 00:51:57.960
Sumit Gulwani: such as what method, name is being used.

646
00:51:58.140 --> 00:52:01.330
Sumit Gulwani: or whether or not a regular expression is being used or not.

647
00:52:02.620 --> 00:52:05.989
Sumit Gulwani: and the advantage of using properties for defining

648
00:52:06.510 --> 00:52:10.640
Sumit Gulwani: code. Similarity is that it makes the example selection also very granular.

649
00:52:10.950 --> 00:52:15.099
Sumit Gulwani: So you can keep picking multiple examples until they cover all the properties

650
00:52:15.190 --> 00:52:17.370
Sumit Gulwani: that the system predicts.

651
00:52:19.630 --> 00:52:22.040
Sumit Gulwani: and then I'll share one final optimization here.

652
00:52:22.270 --> 00:52:27.289
Sumit Gulwani: So in this case, if you want to go from the natural language on the left to the code snippet on the right side.

653
00:52:27.990 --> 00:52:30.799
Sumit Gulwani: picking the example that I have in my example bank.

654
00:52:31.270 --> 00:52:34.560
Sumit Gulwani: The best example still does not help me.

655
00:52:35.080 --> 00:52:37.609
Sumit Gulwani: because the example does not really tell me

656
00:52:37.630 --> 00:52:39.839
Sumit Gulwani: that the 3rd argument

657
00:52:40.110 --> 00:52:43.319
Sumit Gulwani: actually corresponds to the count.

658
00:52:45.083 --> 00:52:49.909
Sumit Gulwani: But I know which method is important. So now I can go and look into the documentation

659
00:52:50.220 --> 00:52:55.339
Sumit Gulwani: to get more information about this method, which actually also talks about the 3rd argument.

660
00:52:55.630 --> 00:53:01.270
Sumit Gulwani: And when I additionally pass in this documentation to the model, it is actually able to do the right job.

661
00:53:04.250 --> 00:53:20.839
Sumit Gulwani: So I have one other trick for enhancing prompt engineering. But I guess I'm running out of time, so I'll skip this part. But at a very high level. This is about using the model itself to refine its own prompt instructions

662
00:53:21.570 --> 00:53:25.610
Sumit Gulwani: after having it to self-reflect on its failures.

663
00:53:30.530 --> 00:53:38.400
Sumit Gulwani: Now we have seen a natural language as a a very important modality for describing users intent.

664
00:53:38.600 --> 00:53:42.349
Sumit Gulwani: but turns out that natural language is also an interesting modality for output.

665
00:53:42.390 --> 00:53:44.569
Sumit Gulwani: because, you know, one very common

666
00:53:44.760 --> 00:53:50.120
Sumit Gulwani: use of large language models is even not as much as generating programs. But to explaining programs.

667
00:53:50.900 --> 00:53:53.239
Sumit Gulwani: We also saw the use case of hint generation.

668
00:53:53.280 --> 00:53:58.440
Sumit Gulwani: And then one important question that arises is, what constitutes a good explanation or hint.

669
00:54:00.240 --> 00:54:05.950
Sumit Gulwani: In fact, natural language is also an important modality for conversing with the user to resolve the ambiguity.

670
00:54:06.210 --> 00:54:08.340
Sumit Gulwani: and here it is important to understand

671
00:54:08.830 --> 00:54:14.569
Sumit Gulwani: what constitutes a good conversation with the user. Because if you can do that, then we can also try to improve that quality.

672
00:54:15.460 --> 00:54:22.089
Sumit Gulwani: And here our key insight was to look into principles of effective communication and social sciences, literature.

673
00:54:22.310 --> 00:54:26.729
Sumit Gulwani: and we figured out these Grecian magazines of communication.

674
00:54:27.940 --> 00:54:33.200
Sumit Gulwani: and when we encode this kind of information in the prompts. We actually see

675
00:54:33.380 --> 00:54:43.710
Sumit Gulwani: that the quality of communication with the user actually goes quite up. And it is these kinds of exams that you can also use to evaluate the quality of the communication.

676
00:54:46.110 --> 00:54:48.420
Sumit Gulwani: And, in fact, you know one of the most

677
00:54:49.250 --> 00:54:53.619
Sumit Gulwani: interesting aspects of using natural language as conversation

678
00:54:54.670 --> 00:54:57.799
Sumit Gulwani: is to design these multi-agent workflows

679
00:54:58.840 --> 00:55:00.530
Sumit Gulwani: with this conversation

680
00:55:00.680 --> 00:55:06.390
Sumit Gulwani: or this natural language, is used to describe a plan as to how to solve a given problem.

681
00:55:07.150 --> 00:55:10.949
Sumit Gulwani: and even to generate a review of the output of an Llm.

682
00:55:11.760 --> 00:55:20.449
Sumit Gulwani: This is the most happening area in the world of AI these days, you know, I did not have time to get into this. This is a completely separate topic of its own accord.

683
00:55:22.573 --> 00:55:23.416
Sumit Gulwani: But

684
00:55:24.320 --> 00:55:32.730
Sumit Gulwani: But the most and then most interesting aspect about about this is that now you're not writing one prompt. What you're doing is you're writing multiple, different prompts.

685
00:55:32.950 --> 00:55:37.030
Sumit Gulwani: one for planning, one for reviewing and so forth.

686
00:55:39.420 --> 00:55:45.640
Sumit Gulwani: And now, if prompting is what you know becomes our new programming paradigm, or at least you know AI programming paradigm.

687
00:55:46.050 --> 00:56:01.310
Sumit Gulwani: Then we saw some instances of how to actually construct such prompts. In the 1st place, but one big opportunity that we have is also on how to maintain these prompts, and this is the opportunity to define the software engineering equivalent in this new programming paradigm.

688
00:56:01.990 --> 00:56:11.879
Sumit Gulwani: for instance, what parts of a prompt are really leading to a given buggy behavior, or if I were to change this part of the prompt which tests are likely to break.

689
00:56:13.690 --> 00:56:15.860
Sumit Gulwani: Okay. So now let me conclude.

690
00:56:15.930 --> 00:56:20.999
Sumit Gulwani: So that number one idea I wanted to share in this talk was around building neurosymbolic systems.

691
00:56:21.150 --> 00:56:30.060
Sumit Gulwani: Now, the symbolic, logical reasoning based techniques are the ones that can give you validation and correctness, but the big challenge there is that they require very domain specific investments.

692
00:56:30.200 --> 00:56:33.589
Sumit Gulwani: On the other hand, Llms have very broad applicability.

693
00:56:33.610 --> 00:56:36.040
Sumit Gulwani: and I believe their combination can be potent.

694
00:56:36.930 --> 00:56:47.050
Sumit Gulwani: Now, one way to combine them is to use a neural model. First, st for instance, to figure out the examples and then the user symbolic model to generate the program from those examples.

695
00:56:47.240 --> 00:56:49.540
Sumit Gulwani: or you can even use them the other way around.

696
00:56:50.670 --> 00:56:53.940
Sumit Gulwani: But you can also combine them in more intricate ways.

697
00:56:54.600 --> 00:56:58.890
Sumit Gulwani: So the way I showed you how the programming by example, architecture of flash field works.

698
00:56:59.020 --> 00:57:05.760
Sumit Gulwani: where models can actually be used to provide hints about which path in the search to explore.

699
00:57:05.950 --> 00:57:08.159
Sumit Gulwani: or even for ranking.

700
00:57:09.220 --> 00:57:17.380
Sumit Gulwani: and it can even be the other way around, where the symbolic model actually provides hint to the newer model which we saw in the execution guided search.

701
00:57:19.770 --> 00:57:25.209
Sumit Gulwani: In fact, this is the talk that I would highly recommend for you to watch. It was given by Noram Brown at Uw

702
00:57:25.230 --> 00:57:27.049
Sumit Gulwani: a few days ago.

703
00:57:27.150 --> 00:57:31.259
Sumit Gulwani: where he talks about the importance of search at inference time.

704
00:57:32.120 --> 00:57:35.790
Sumit Gulwani: and our ability to generate such trajectories

705
00:57:36.270 --> 00:57:40.319
Sumit Gulwani: can actually be used to fine tune the underlying models in the 1st place.

706
00:57:41.780 --> 00:57:53.309
Sumit Gulwani: and then the second final idea I wanted to leave you with is that there are many different forms of expressing intent which might be more natural than natural language in respective settings.

707
00:57:53.480 --> 00:57:55.750
Sumit Gulwani: you know, examples being one of them.

708
00:57:55.780 --> 00:58:00.449
Sumit Gulwani: and sometimes the user's intent is also hidden inside that temporal context as I showed you.

709
00:58:00.980 --> 00:58:08.990
Sumit Gulwani: And user interactivity is key to enabling reliability and correctness for sophisticated long running tasks.

710
00:58:09.250 --> 00:58:12.550
Sumit Gulwani: And here I showed you this, investigate and respond agentic pattern

711
00:58:12.830 --> 00:58:17.430
Sumit Gulwani: which really prompted. When should Llm. Be talking to the user

712
00:58:17.460 --> 00:58:22.809
Sumit Gulwani: and existing maximums of communication which talk about how to interact with the user.

713
00:58:23.670 --> 00:58:29.530
Sumit Gulwani: So I will stop here. This is my final summary slide, and would be happy to take on any questions.

714
00:58:32.020 --> 00:58:55.510
Dilma Da Silva: Thank you so much. Sumit. I certainly super enjoyed your talk, and I am so jealous that you found a way of explaining to your father what you did. This is so powerful to all of us. With my Phd. Thesis in operating system I really struggled to get a way of communicating.

715
00:58:55.510 --> 00:59:05.130
Dilma Da Silva: And and I was so very inspired by the origin story for Flash. But you know, you're trying to solve a problem from a real user. So thank you. Thank you for that.

716
00:59:05.706 --> 00:59:10.723
Dilma Da Silva: We have a here a few questions. The 1st one will exercise your

717
00:59:11.120 --> 00:59:24.560
Dilma Da Silva: guessing of context. Because, the the question came at the middle of the talk, when I noticed could be a few minutes after it. It was when you were talking about

718
00:59:24.560 --> 00:59:45.060
Dilma Da Silva: repetitive tasks that users do, and learning from it, and you motivated all of us to look at accessibility as another dimension that we should communicate. So it was on on that part of the talk with the question and I'll read is, how could this technology be transferable to emerging emergency services?

719
00:59:45.070 --> 00:59:54.060
Dilma Da Silva: So if something comes, you tell us, otherwise we can just wait for the the person can now ask now, a little more generically, but anything

720
00:59:54.130 --> 00:59:56.700
Dilma Da Silva: about applying

721
00:59:56.740 --> 01:00:03.320
Dilma Da Silva: the pieces of reduction of repetitive work to emergency service. Anything comes to mind.

722
01:00:04.680 --> 01:00:14.980
Sumit Gulwani: Yes. So I would say that temporal context is one aspect that is today missing from the data that these large language models have been trained on.

723
01:00:15.080 --> 01:00:21.490
Sumit Gulwani: So these argument models have been trained on a snapshot of the word that exists at a given point of time.

724
01:00:22.310 --> 01:00:26.380
Sumit Gulwani: and there is lot of value in leveraging the signals

725
01:00:26.600 --> 01:00:28.499
Sumit Gulwani: that have happened.

726
01:00:29.399 --> 01:00:29.910
Sumit Gulwani: Before

727
01:00:30.180 --> 01:00:32.839
Sumit Gulwani: now these signals can be of 2 different kinds.

728
01:00:32.960 --> 01:00:38.969
Sumit Gulwani: One is for the very specific instance that you are trying to solve.

729
01:00:39.460 --> 01:00:44.080
Sumit Gulwani: whereas I showed that if you want to edit code

730
01:00:44.620 --> 01:00:49.449
Sumit Gulwani: it would be good to see what other edits you have done in the very recent past.

731
01:00:50.570 --> 01:00:53.499
Sumit Gulwani: But this kind of temporal context

732
01:00:53.600 --> 01:00:56.319
Sumit Gulwani: scales beyond that as well.

733
01:00:56.560 --> 01:00:59.510
Sumit Gulwani: And this gets you into the category of personalization

734
01:01:00.260 --> 01:01:03.639
Sumit Gulwani: of the AI model.

735
01:01:03.920 --> 01:01:05.770
Sumit Gulwani: knowing more about you.

736
01:01:06.170 --> 01:01:08.300
Sumit Gulwani: how you have done things in the past.

737
01:01:08.420 --> 01:01:11.509
Sumit Gulwani: maybe even. How have other users, you know, done things in the past.

738
01:01:12.020 --> 01:01:17.079
Sumit Gulwani: and that I also believe is going to be one of the next. You know big AI frontiers.

739
01:01:17.950 --> 01:01:23.369
Sumit Gulwani: So I don't know the details. You know, of the application that you have in mind.

740
01:01:24.382 --> 01:01:26.910
Sumit Gulwani: But I'm absolutely confident

741
01:01:27.010 --> 01:01:30.139
Sumit Gulwani: that temporal context is the key.

742
01:01:30.570 --> 01:01:37.219
Sumit Gulwani: either for making the current instance more effective or from there's this space of personalization.

743
01:01:38.500 --> 01:01:42.130
Sumit Gulwani: and our opportunity to get there

744
01:01:42.380 --> 01:01:47.609
Sumit Gulwani: would be to collect this kind of data which has this temporal aspect

745
01:01:48.200 --> 01:01:55.169
Sumit Gulwani: and then use it to train foundational models with this kind of data, or to fine tune existing models.

746
01:01:56.150 --> 01:01:57.250
Dilma Da Silva: Okay, very good.

747
01:01:57.711 --> 01:02:13.980
Dilma Da Silva: The second question came when you were talking about visual studio supporting, debugging, of helping the user to debug. So it was about copilot pro providing fixes for the users. And the question is that

748
01:02:14.100 --> 01:02:15.700
Dilma Da Silva: so the the

749
01:02:16.600 --> 01:02:30.340
Dilma Da Silva: the these techniques can be applied if these techniques can be applied to across many domains of programming language, procedural function, object, oriented scripting and logic programming language, for instance.

750
01:02:31.131 --> 01:02:37.449
Dilma Da Silva: The person who asked the question has been converting Python and Perl code into rust

751
01:02:37.830 --> 01:02:43.990
Dilma Da Silva: are similar constructs available in Ros for each of these language types, so that kind of transformation.

752
01:02:44.870 --> 01:02:48.169
Sumit Gulwani: Yes, so I'll break down that question into 2 parts

753
01:02:48.538 --> 01:02:57.339
Sumit Gulwani: one is, what would it do to do? A. What would it take to do a more effective job on different kinds of debugging problems.

754
01:02:57.410 --> 01:03:03.139
Sumit Gulwani: And the second question would be about the specific migration scenario that you might be interested in.

755
01:03:03.850 --> 01:03:08.759
Sumit Gulwani: So on the 1st part what I focused on was

756
01:03:08.780 --> 01:03:10.590
Sumit Gulwani: the importance of

757
01:03:10.630 --> 01:03:19.289
Sumit Gulwani: letting the Llm. Figure out more information if it believes that it does not have the right information to answer the question.

758
01:03:20.250 --> 01:03:22.680
Sumit Gulwani: and when it figures out more information.

759
01:03:22.870 --> 01:03:26.890
Sumit Gulwani: it can either use the tools that are made available to it.

760
01:03:27.280 --> 01:03:32.350
Sumit Gulwani: such as look at the results of a program analysis, or look at the errors

761
01:03:32.900 --> 01:03:34.810
Sumit Gulwani: in a given watch window

762
01:03:35.994 --> 01:03:40.725
Sumit Gulwani: or it can go back to the user and ask them some

763
01:03:41.300 --> 01:03:46.550
Sumit Gulwani: more details that are not evident to the Llm. Even from the context.

764
01:03:47.620 --> 01:03:52.340
Sumit Gulwani: In fact, you know, I think of it. Another analogy that I have is, you know, what happens in a courtroom

765
01:03:53.020 --> 01:03:57.040
Sumit Gulwani: when the judge is presented with the various facts. The judge does not

766
01:03:57.110 --> 01:03:59.180
Sumit Gulwani: announce a decision immediately.

767
01:03:59.690 --> 01:04:01.700
Sumit Gulwani: The judge investigates

768
01:04:02.190 --> 01:04:14.220
Sumit Gulwani: when they get one piece of investigation. Then, based upon that, they might actually investigate even more and figure out more questions until they feel confident that they have all the right information, then to be able to pass on the judgment.

769
01:04:14.480 --> 01:04:22.569
Sumit Gulwani: And it's very important to be able to make the retrieval of this information iterative and also involve the user in doing this right? So this is what I focused on.

770
01:04:22.690 --> 01:04:28.949
Sumit Gulwani: Now, what I did not talk about was that, of course, you know, there are some other domain specific aspects that are very important.

771
01:04:29.410 --> 01:04:33.880
Sumit Gulwani: So one aspect was, how do users themselves

772
01:04:34.120 --> 01:04:35.320
Sumit Gulwani: go about

773
01:04:36.030 --> 01:04:40.570
Sumit Gulwani: doing this kind of debugging and fixes, and that workflow that they use

774
01:04:40.760 --> 01:04:43.780
Sumit Gulwani: is also important to capture in this process.

775
01:04:43.850 --> 01:04:52.989
Sumit Gulwani: So one thing that we captured at a very high level. Was that in order to suggest a fix you 1st need to localize. So if you look at the entire literature

776
01:04:53.020 --> 01:05:14.659
Sumit Gulwani: on program repair. You will see that one of the 1st pillars there is that you need to 1st understand what the issue is, then localize where the bug might be, and then suggest a repair. So we do to incorporate this kind of Gendic workflow also into our agent architecture. And now you can start making it more specific. So if you're looking at functional languages or declarative languages, and so on.

777
01:05:14.660 --> 01:05:35.329
Sumit Gulwani: It would be good to take a do, some case, study, or or some, you know, understanding of, how do typical developers go about fixing these bugs and see if you can encode that process inside this agentic architecture that I that I showed you right? So I talked about very general terms. But there's definitely opportunity, for we specialize these agentic architectures

778
01:05:35.590 --> 01:05:39.120
Sumit Gulwani: to incorporate the workflows that humans have. So that's 1

779
01:05:39.360 --> 01:05:48.609
Sumit Gulwani: but the other specific. You know. The second part, you know, that I would make out of your question is for the very specific application of migration, such as you want to convert C to rust.

780
01:05:48.920 --> 01:06:01.709
Sumit Gulwani: or you want to migrate your code from an old version of Java to a new version of Java, or from an old version of.net to a new version of.net. So these are very sophisticated tasks, longer running tasks

781
01:06:01.770 --> 01:06:04.830
Sumit Gulwani: might take months, you know, for people to actually do.

782
01:06:05.180 --> 01:06:07.799
Sumit Gulwani: In fact, you know, a lot of work today runs on cobol

783
01:06:07.820 --> 01:06:22.079
Sumit Gulwani: because that's verified software, right? You know our airplanes, and so on. It would take quite an effort to modernize these things. And now, with the power of Llms, we can start, you know, dreaming and thinking about this kind of software debt that we have in the society.

784
01:06:22.970 --> 01:06:24.015
Sumit Gulwani: And

785
01:06:25.080 --> 01:06:43.319
Sumit Gulwani: And the most important thing I would say here is involving the user. So when you have a sophisticated, longer running task, you know, errors at each step are continue to get multiplied, and you will end up with completely wrong translations or the system might get stuck in a completely wrong path.

786
01:06:43.520 --> 01:07:01.789
Sumit Gulwani: So it is very important to involve user in the process. In this agentic workflows. It is very important to be able to backtrack, to employ some kind of search, and it is very important to have some kind of validation and verification loops as well. This is probably the place where you can bring on. You know some of our work in the space of program analysis and verification.

787
01:07:02.810 --> 01:07:06.739
Sumit Gulwani: This is also the place where

788
01:07:07.240 --> 01:07:29.310
Sumit Gulwani: this logical reasoning based techniques can actually be combined. You know, with this with these neural techniques. So this is this is probably the most happening thing that is in the world of AI today is multi agents. And if you're interested to learn more, you can actually watch out for the leaderboard on the suite bench benchmark. So that's 1 place where many organizations are competing against trying to solve these sophisticated software engineering tasks.

789
01:07:31.160 --> 01:07:37.720
Dilma Da Silva: So the next question comes from Greg Hager. He's the lead here at size. In other words, my boss

790
01:07:37.790 --> 01:07:47.649
Dilma Da Silva: and Greg says very interesting work and examples. One question he has is how this will evolve as language models evolve. For example.

791
01:07:47.730 --> 01:08:05.470
Dilma Da Silva: an updated Llm. May produce different outputs, for example, than its potentially weaker predecessor. Do you have any ideas? How we, managing versioning of hints and prompts in the underlying Llm. Based technology itself.

792
01:08:06.390 --> 01:08:09.079
Sumit Gulwani: So that's a a great question.

793
01:08:09.980 --> 01:08:11.010
Sumit Gulwani: And

794
01:08:11.200 --> 01:08:20.000
Sumit Gulwani: at a philosophical level, what I'm continuing to tell myself and make my choices as I pick up problems to solve

795
01:08:20.370 --> 01:08:26.189
Sumit Gulwani: is to be very intentional about picking problems where you're not facing these headwinds.

796
01:08:26.210 --> 01:08:39.289
Sumit Gulwani: You know, you think that is a problem with the models. You work on some problem only to figure out that your work is no longer relevant, because new version of the model, you know, does not need that it already, you know, solves that problem.

797
01:08:39.550 --> 01:08:58.769
Sumit Gulwani: You know, you want to be picking up problems where you have tailwinds. So maybe you have a solution that does not work very well today. But your approach likely stands a test of time. And then, as models evolve, you know, your solution just automatically keeps getting better. So it's easier said than done. Because this is the constant, you know, worry on my mind as we

798
01:08:58.800 --> 01:09:01.880
Sumit Gulwani: as we continue to do our own research

799
01:09:02.140 --> 01:09:06.940
Sumit Gulwani: in this, in this space. Now, now, more directly answering your question

800
01:09:07.362 --> 01:09:14.609
Sumit Gulwani: one of the problems that we have inside releasing these technologies inside real products is exactly around this model migration.

801
01:09:14.710 --> 01:09:16.789
Sumit Gulwani: So we spend a lot of time.

802
01:09:16.800 --> 01:09:21.640
Sumit Gulwani: please, inside Microsoft to deal with issues like, you know, responsible. AI

803
01:09:22.936 --> 01:09:27.040
Sumit Gulwani: making sure there are no security issues and and so forth.

804
01:09:27.880 --> 01:09:35.170
Sumit Gulwani: And when. And we also want to make sure that we are able to use the best of the models. So when new models come in, we want to migrate to those models.

805
01:09:36.260 --> 01:09:42.260
Sumit Gulwani: and invariably what happens is that all of your benchmarking that you have

806
01:09:42.359 --> 01:09:44.219
Sumit Gulwani: suddenly breaks down

807
01:09:44.250 --> 01:09:46.230
Sumit Gulwani: as soon as the model switch happens.

808
01:09:46.420 --> 01:09:53.649
Sumit Gulwani: And this is a place where you need to. Now rethink your prompts. So today I would say the prompting is still a very flaky business.

809
01:09:54.483 --> 01:09:58.689
Sumit Gulwani: I touched a little bit upon it when I talked about prompt maintenance.

810
01:09:58.740 --> 01:10:06.180
Sumit Gulwani: and one of the places where it hits us hard is actually, when model migration happens when you have to migrate to a different model.

811
01:10:06.500 --> 01:10:08.200
Sumit Gulwani: it might be a more powerful model.

812
01:10:08.380 --> 01:10:13.089
Sumit Gulwani: But the way you wrote the prompt may not be very effective, and you might get worse performance.

813
01:10:13.300 --> 01:10:19.529
Sumit Gulwani: So now you are in a situation where you have to figure out how to update the prompt so that you get better performance with the new model.

814
01:10:19.820 --> 01:10:25.370
Sumit Gulwani: And this is the opportunity for us to build, you know, some good software engineering practices here

815
01:10:26.270 --> 01:10:28.650
Sumit Gulwani: you might have a huge benchmark.

816
01:10:28.840 --> 01:10:35.530
Sumit Gulwani: but then, now, if you make a small change in the prompt, if you have to run through your entire benchmark, it's very expensive, also very time consuming.

817
01:10:35.720 --> 01:10:46.819
Sumit Gulwani: How can we get some kind of quick feedback as to what part of the prompt. Should you change in order to fix this regression issue, or if you were to change this part of the prompt, what tests might possibly break.

818
01:10:47.240 --> 01:10:54.279
Sumit Gulwani: And one interesting idea in this front might be to look at the attention mechanism of these models.

819
01:10:54.580 --> 01:10:56.359
Sumit Gulwani: So if you can get heat, map

820
01:10:56.470 --> 01:11:01.550
Sumit Gulwani: of what aspects of the prompt is the model paying more attention to

821
01:11:01.810 --> 01:11:10.989
Sumit Gulwani: when it tries to do a certain kind of task, then it might give you some hints on which parts of the tests are more sensitive, to which parts of the prompt

822
01:11:11.200 --> 01:11:16.609
Sumit Gulwani: and this kind of stuff might help you. Be do more agile software engineering.

823
01:11:17.120 --> 01:11:17.820
Sumit Gulwani: Not

824
01:11:18.305 --> 01:11:33.605
Sumit Gulwani: maybe some of these problems will go away once we get to even better and better models where these models are not as flaky with respect to how we write prompts. But today this is not the case. And I say that this is a big opportunity for us to think about our software engineering in this new world of

825
01:11:35.150 --> 01:11:46.519
Dilma Da Silva: Okay, thank you. You know, we have comments from the audience about being a brilliant session, excellent responses, and insightful presentation. We also have a question.

826
01:11:46.580 --> 01:11:54.539
Dilma Da Silva: how can we ensure fairness of metadata for any model? Are we following fair principles.

827
01:11:56.300 --> 01:11:59.419
Sumit Gulwani: So Dilma did not get the question. Fairness of what.

828
01:11:59.610 --> 01:12:01.330
Dilma Da Silva: Fairness of metadata.

829
01:12:02.580 --> 01:12:04.800
Sumit Gulwani: I see? Yes.

830
01:12:04.800 --> 01:12:13.120
Dilma Da Silva: So how can we ensure fairness of metadata for any model? And then are we following fair principles.

831
01:12:15.360 --> 01:12:19.740
Sumit Gulwani: So I feel that fairness can be built at different layers.

832
01:12:20.190 --> 01:12:29.299
Sumit Gulwani: So one is that these innovations, as they keep happening around in the world. You know, companies like, you know, big companies like open AI are constantly watching.

833
01:12:29.818 --> 01:12:37.570
Sumit Gulwani: Which of these techniques is successful, and it becomes actually part of the model itself. So that is one layer that you that you actually get

834
01:12:38.430 --> 01:12:44.699
Sumit Gulwani: the second layer where you can incorporate any of these responsible AI concerns

835
01:12:44.840 --> 01:12:52.920
Sumit Gulwani: would be in trying to develop some kind of filter when the users query comes in. So that would be the second layer of defense that you can put in.

836
01:12:52.920 --> 01:12:53.510
Dilma Da Silva: Hmm.

837
01:12:53.800 --> 01:12:58.290
Sumit Gulwani: And then the 3rd layer of defense would be once the output is actually generated.

838
01:12:59.110 --> 01:13:07.400
Sumit Gulwani: So again, the answer to this, you know, will be very specific for different kinds of domains where you're looking for this responsible AI aspects which I believe are extremely important.

839
01:13:07.900 --> 01:13:15.770
Sumit Gulwani: And but generally answer that I can give is that there are 3 different layers where you can, where where this can be affected. Right? One is probably out of your control.

840
01:13:15.840 --> 01:13:16.889
Sumit Gulwani: which is

841
01:13:16.900 --> 01:13:22.220
Sumit Gulwani: the the companies organizations that actually develop these models. So they take care of it.

842
01:13:22.840 --> 01:13:41.950
Sumit Gulwani: Thing that are in your control is to put a filter when the users input comes in and the query comes in, and then you can decide, how you want to transform, if at all, before passing into the model, and then the 3rd layer of defense would be when the output comes out, and then you figuring out whether it meets the responsibility, criterion or not.

843
01:13:42.100 --> 01:14:03.479
Sumit Gulwani: Now to evaluate whether it meets the responsible AI criteria or not to develop these filters that I'm talking about, especially in the Pre. Pre. When the users query comes in, or or when the output from the model comes in, you can itself use Llm. Also, right? So there's a lot of work that is going on in in even train these Llms. To be able to pass a judgment on on responsible AI aspects.

844
01:14:04.540 --> 01:14:22.150
Dilma Da Silva: Okay, we only have now 1 min. But there is a very interesting question. I'm you know. I'll read it. You probably won't have time to really go into detail. But it is about incorporating AI models into code that it's the code itself may be stochastic.

845
01:14:22.250 --> 01:14:33.600
Dilma Da Silva: So when you're doing that in the programming process. So one example, you're using a computer visual model in an application with the computer, visual model itself may not always provide correct output.

846
01:14:33.760 --> 01:14:38.850
Dilma Da Silva: and which itself may be evolving its capabilities over time.

847
01:14:40.270 --> 01:14:44.859
Sumit Gulwani: Yes, so very fascinating observation. And what I will say is that

848
01:14:45.560 --> 01:14:54.140
Sumit Gulwani: in this new world of AI programming, the best perspective that you can take is that you have a very powerful piece of technology at hand, which is these, you know, large language models.

849
01:14:54.270 --> 01:15:01.560
Sumit Gulwani: but they are not a very reliable form of computation, so very powerful, but not 100% reliable and approximately correct.

850
01:15:01.610 --> 01:15:06.680
Sumit Gulwani: And this is the place where I feel neurosymbolic methods hold the key to doing things.

851
01:15:07.100 --> 01:15:14.120
Sumit Gulwani: So if you are able to evaluate the quality of the output of a given call to the large image model.

852
01:15:14.160 --> 01:15:19.839
Sumit Gulwani: if you're able to reflect on it, and this can become the basis of doing some kind of search and backtracking.

853
01:15:20.730 --> 01:15:41.429
Sumit Gulwani: So in domains where you have the generative verification gap. No, this is a slide that I showed you the Youtube slide. The talk from this Openai researcher that was given at Uw actually goes into this phenomenon that whenever you have a domain where validation is relatively easy compared to generation.

854
01:15:41.530 --> 01:15:52.169
Sumit Gulwani: you have a very interesting situation to do, search and backtracking and reflection on, whether it was correct or not, but neurosymbolic solutions, you know, would be the key there, and search and backtracking.

855
01:15:52.680 --> 01:16:02.690
Dilma Da Silva: Very good. Thank you so much for the fascinating talk. We're really very grateful for the Nsf people around in the talk. We will be, have the opportunity to talk more to

856
01:16:02.820 --> 01:16:09.399
Dilma Da Silva: Dr. Sumit Gawani, in 15 min, but for the rest of the audience, thank you for being here.

857
01:16:09.500 --> 01:16:16.100
Dilma Da Silva: Thank you so much sumit, and you should be hearing the big claps coming, you know, from all people watching. Thank you.

858
01:16:16.100 --> 01:16:21.219
Sumit Gulwani: And thank you everyone for the wonderful questions. I really enjoyed this interaction, and thanks Lima, for the opportunity.

859
01:16:21.220 --> 01:16:23.399
Dilma Da Silva: Thank you. Thank you. Everyone. Bye.

860
01:16:24.010 --> 01:16:24.670
Sumit Gulwani: Bye.

