克里斯Bledsoe:At Workday, we understand the importance of a thriving developer ecosystem, which is why continuing to deliver innovative capabilities for those building on the Workday platform is a key priority. On this episode of the Workday Podcast, we're going to talk about how we're doing that. I'm your host, Chris Bledsoe, and I'm joined by Shane Luke, Vice President of Machine Learning at Workday, who's going to share insight into how Workday is delivering innovation for developers with machine learning. Thanks for joining me today, Shane. Before we dive in, can you please share a bit about your background and your current role at Workday?
Shane Luke:是的,你的赌注。我的背景技术y for my entire career. and I've done technology development across a number of different domains from hardware, low-level software, all the way up to high-level applications. I actually started my career in the gaming industry, building really high-performance games, tier-one console games, which was super fun, online platforms there. It was awesome. I actually took a little hiatus from my career to go back and do a master's degree. And that was actually in AI. When I did it, it was really because I thought, hey, I'm at the point where I won't be able to just go do things for intellectual sake anymore. I won't be able to just go have fun learning. I'm going to be learning on the job, but I'm going to be working. So at the time, I never thought it would be something that I could actually earn an honest living at. And so it's kind of a thrill for me that that's what I do. The last 7 or 8 years of my career have been focused very much on artificial intelligence and in particular machine learning to achieve artificial intelligence. So that's a little bit about me. I've worked in large companies and startups. I did something similar to what I'm doing at Workday at Nike prior to this. I was in a couple of startups that had acquisitions of various levels of success. But, I'm here, not on a yacht in the Mediterranean, so, you know, I'm still working and really enjoying applying AI and machine learning at Workday.
克里斯Bledsoe:Well, that's fantastic, okay, I just got to ask. So you said you did gaming. What kind of games? Would we know of any? Or would any of our audience practically go, hey, that was a wicked cool game?
Shane Luke:Well, you would know them because I mostly worked on sports games in EA, so EA Sports titles, which were super fun. I played basketball in college, and I worked on basketball. NBA Live was my first game that I worked on. So you probably heard of that.
克里斯Bledsoe:Yeah.
Shane Luke:I worked on the FIFA franchise for a little bit and FIFA Street, and World Cup 2006. And so, yeah, I did a lot in games. And I actually moved from doing games directly as the online gaming revolution took hold, which was sort of like a subset of this whole, like, dot com Internet explosion. I led a team that built parts of the online platform for EA. And so that's what I did in gaming, and it was super fun. You would have heard of those games whether you play sports games or not, so yeah.
克里斯Bledsoe:That's fun. That's really cool. So one of the things that comes up is that how does Workday's platform-first approach to AI and ML differ from the traditional approach of using ML as a feature?
Shane Luke:Yeah, for sure. So I would contrast the way I would think about ML and AI as an academic with what we do inside Workday. And I think most companies do it a little bit more like what you did in a research setting, which is you have a problem, you think about how you might solve that problem with data, then you spend a lot of time in the research component solving that problem with a pretty fixed dataset. You know, we do training. We call it training and tests. So we have a dataset. We split it. We do training with some. Then we test it with others. And when that works, you're like, okay, I've done something good with my machine learning. Let me go apply that. I'll make a product out of it. Well, what you've done there probably only does that one thing well. It might not even do it well when you put it into a real product in the real world because you might get data that's not clean or is out of what we call out of distribution with, with the data you trained on. But if it does, it probably does that one thing well. We knew from the get-go at Workday that it wasn't really going to work to expand this and make it a core technology that we could use if we did it that way. We'd do some good. We could deliver features to customers, and we could keep adding incrementally to the list of things that we had, but we weren't gonna add exponentially to that list. And so the purpose of the platform approach is that. The implementation of the platform approach is kind of hard. You would always say, hey, of course, hey, Chris, would you rather have a single feature or would you rather have a platform? You're gonna say I'd rather have a platform, of course. But machine learning is not easy to build into a platform that really generalizes well. It's one of the reasons why you see such excitement when you see, like, these large language models doing really general things well. Because it usually doesn't. You can see that in, you know, self-driving, and the joke is it's always ten years away, right?
Shane Luke:And so we had to kinda try to find clever ways to be able to generalize machine learning with all the sorts of variation that you'd get across customers. Right? We have this unified data model, which is awesome. So that's a great starting point for us to do this. But customers use everything in a different way. They come from different industries, different geographies. They have vastly different employee counts across different verticals. And so a lot of the research that we've done was to abstract a lot of those differences away into what we call ML core components. And these things are core parts of our platform. They're built off of the Workday data model, but they're created by us to run in parallel to it, and we do machine learning on those. So they're learned data representations that we build that are meant to generalize across broad swaths of the product. And the investments we made in that, it took a lot of research. The tools have evolved a lot over this time too, so we're faster at it now than we were then. But we've been doing that for the last two and a half, three years, and now it's starting to pay off. And that's really at the core of our platform. The other part of the platform is the automation that we've built. It's nice to think, hey, we're going to have hundreds of thousands of features. I want that. We're going to do that. That's my plan, I'll do it or die trying. But you need software to manage that. If we needed to scale the people that were touching it, all parts of that along with thousands of features, well, that would be very difficult to do. So we had to, in parallel, build up an automation platform that really handles all of this, things like monitoring them, deploying them, rolling them back if they're not working, all that kind of stuff. And so that's, that's really what we've done with the platform.
克里斯Bledsoe:Yeah, one of the things that's always struck me is that our customers have their own set of data that they want to do this type of analysis on, right?
Shane Luke: Right.
克里斯Bledsoe: But the challenge is that's just your data, right?
Shane Luke:Right.
克里斯Bledsoe:And you probably want, like you said, a large data model to be able to figure that out and understand it and actually drive information and decisions and insight that you wouldn't be able to see. Have we looked at stuff like that, like, whether, you know, combining data from multiple obfuscates so that people don't know what it is so that they could do those kinds of questions and that kind of understanding?
Shane Luke:是的,我们有,我们在相对的早期阶段of this. The nice thing about early stages in this space is that they don't last as long as the early stages did in previous generations of technology, right? Early stage doesn't mean, hey, let's put this in our release that's in, you know, the late '20s. Like, we're gonna do this pretty quickly. One of the ideas around mixing data with this obfuscation away is the idea of a data clean room where you can bring data in, but the people who are using it for various insights, they can get aggregations, but they can't get down into the details. So data clean rooms are a fairly well-known space. It's difficult to do well. It takes some effort. But, it's something that we could do. Aside from that, the way we built the platform, you can get a lot of this without having to have a clean room where a customer is interacting with it. Because, you know, we deliver global models that can be trained across many customers without commingling data. So we can get the benefit of a lot of these global models. In some cases, you don't need customer data to do this. in Skills Cloud, we train Skills Cloud as a global thing. It's the set of skills in the world. We take that to be sort of true of the world, not only Workday. And then we can tune it down to a particular customer. So we start with this big global model, and then we go down to the customer level, you know, what really matters to you. Somebody from my team was talking about how you could essentially map skills for your organization against Skills Cloud skills and see the relationships between those. So it's an example of getting that benefit.
克里斯Bledsoe:Well, I really like it too, because part of the machine learning APIs that we're exposing through Workday Extend and other technologies is super helpful. And I think that people need to be inspired. Right?
Shane Luke:Right.
克里斯Bledsoe:'Cause sometimes people are like, I know I have a problem that I need to solve, but I don't know how to go about it in the way that it's going to be efficient and quick and fast, right?
Shane Luke:Right.
克里斯Bledsoe:So I guess my question is more around when you've talked earlier about our structure and how we're doing this. So I understand there's, like, four key architectural AI and ML differentiators for developers.
Shane Luke: Right.
克里斯Bledsoe:And can you maybe amplify a little bit upon that?
Shane Luke:Yeah, so one is the data. We've talked a lot about that. So I won't dwell on it, but the-- I think the fact that we have really good data quality, and data quality scales ML better than data quantity. And it's kind of a myth that quantity is the, is the be-all-end-all for machine learning. Yes, quantity with quality, yes, that's better. But things are not always equal. And in the real world, you don't always have data quality. We have that. We also have a lot of quantity, but the quality is really the differentiator. So that's one. And in addition to that, for data, the level of stewardship that we offer, we manage your data with high-reliability software. You know, machine learning can be scary. And so one big differentiator is, I think we can make it not so scary because your data is very secure. We manage it with high-reliability software, like I said, that is gonna adhere to the same level of standards you were used to when you decided to go, you know, into the cloud with Workday. So that's one big thing. Another one is the core components I'd mentioned. So most companies just don't have that. You know, they have approached MLS features, at least in the enterprise space. And it's easier to do that. The enterprise space has a lot of variance in how customers configure and use your software. And so it makes that challenging, so we have that as well. Federated learning is one that we didn't touch on earlier, and the idea of federated learning is that you can essentially move this learning, the training of models, right?
克里斯Bledsoe: Mm-hmm.
Shane Luke:You hear a lot about machine learning models learning or training, kinda synonymous. Typically that's done in a central environment, a large central environment. And we have those. We have two of them, one in Europe and one in the US. But we have a model as well now where we can actually train online right alongside your tenant. And so we could have a global model, let's say, from a number of customers who have contributed data to our central environments and are maybe in early adopter programs and innovative customers. For a customer that might be constrained around where their data can go, we can actually deploy that model locally for them, train it locally on their data, and then that cycle can repeat. And so being able to have this distributed system that can bring your data to where it has to be for you to be compliant with, maybe, government regulations or your own internal policies, whatever it might be, that's another differentiator that we've built into the platform over time.
克里斯Bledsoe:That's really interesting. So are you seeing in the marketplace interest in terms of sharing these m-machine learning data models with one another? Right? So let's say I'm in the same industry, right, whether I'm in entertainment or whether I'm in packaging goods and stuff like that, I would think there'd be a lot of overlap where it's like, hey, we could learn together and, and cr-create a value in doing that.
Shane Luke:Yeah, I think there is. And, there's a little bit of both, candidly. You know, there are, there are customers who are like, hmm, what we have, benefiting that, you know, bright competitor, and I won't name any names there, but you could probably guess at what some of those might be. but in general, there is, I think, also a desire to say, hey, like, it benefits everyone if we are getting better-trained models that make better predictions. And, you know, you getting better predictions and me getting better predictions are both good. And so yeah, there is. There is a lot of sensitivity around that because you have to be able to do that without the possibility of having co-mingling of data, a data breach, or a data leak. And so we're pretty cautious about how we do it. But the federated learning paradigm I mentioned is really good for that. So that's one nice way to be able to get the benefit of training amongst many different clients without, without commingling of data.
克里斯Bledsoe:Fantastic. Can you give us examples of how Workday supports the developer ecosystem using AI and ML capabilities? And why is this such a unique opportunity for developers today?
Shane Luke:Yeah, so the biggest way we do it is with Extend. We're all very excited because there are a bunch of services that are new, and they're across a broader range of capabilities than we've ever had before. And that's a relatively small subset of the capabilities we actually have in-house, you know.
Shane Luke:Mm-hmm.
Shane Luke:You know, we're stepping this in phases, obviously. That's going to grow. And so by far, the biggest play we have today is in exposing capabilities in Extend. And the thought there is that it's really good to make use of what we have. For the most part, some customers might want to do completely their own machine learning. And that's totally fine. Right? We have some very technically sophisticated customers that are going to do that. A lot of our customers don't, but they want the capabilities. And the things that they might want aren't, you know, exactly what we would want to vertically integrate in a product. And they may apply only to a few customers, or the details of what you need might vary across customers, and th-that's a great way for the developer community to bridge that gap. Right? We see that across lots of other vibrant developer communities. And so we're empowering them that way. The other thing that we're considering doing, candidly, going a level down from application developers to people who actually work in machine learning, is becoming much more vocal and active in the open-source community. We leverage open-source, of course. We can contribute back when, when we do that. but we'd like to be more active contributors in that. And so that enables developers to be able to make use of if we solve a problem that other people could benefit from, some of those things we can open-source and allow them to be able to do it. We haven't done it yet. We're, we're thinking about it, and it's kind of a new muscle for us, but that's another way we're thinking about empowering the developer community.
克里斯Bledsoe:So if you're a developer and you're like, you know, I've heard about this AI/ML, and I'm excited to use the APIs that we're providing as a part of our solutions deck, what's a good way for somebody to learn and grow in this? Right? Because we hear stuff about generative AI and generative GPT and all that. Well I wanna learn, where is this headed? Or what can I learn, and what would be a good way to do that?
Shane Luke:确定。所以我认为不同的人。这取决于一个little bit on your learning style, but I'll give some general principles that I think are really useful. One is it's a little bit hard to filter the signal from the noise. So you could hear a lot of people who claim to be experts telling you things, whether it's in a YouTube video, a Medium article, podcast, whatever it might be, right? so put some effort into that and, you know, whether it's using trusted networks, whether it's using, sort of community monitoring these things with likes and comments and stuff like that, but the amount of material that's available at every level, like down to videos that are, like, code-level videos. Andrej Karpathy is a, kind of a famous guy in the AI community, has a video where he essentially goes through, like, a line-by-line implementation of a model like the GPT models.
克里斯Bledsoe:Hmm.
Shane Luke:所以这可能是太深对有些人来说,但是we're talking about the developer community here, and we're talking about people that might use APIs. So you can go down to that level, right, and see an expert actually who is a researcher at OpenAI and worked on it actually talking through how this is implemented in Python. So, you have access to resources like that. And I think th-the, you know, picking the level that matters for you is really important. The other thing is that, more and more quickly, we're seeing applications released that are leveraging these capabilities. So if you want to be at a higher level where you're not model-building, you're looking at more of like, what should I do with it; where is this going from interaction paradigms or applications in traditional apps that are now going to leverage these capabilities? I think just keeping your eyes wide open for all of these new things as they come out and actually using them and playing with them is a really good way to just learn, like, what would I want to do. The APIs themselves are quite easy to use. You know, anybody who's a, you know, competent developer can read the documentation and go use them in Extend. And, and where, and where that's not true, I'm sure the Extend team wants the feedback about that, and they'll make them better. So you know, I think that that's really powerful. I mean, for example, I think this is a good way to make an example. So generative AI bursts on the scene, everybody's excited. Our team's excited. We all work in AI and machine learning all day, every day, but people are still really excited about this. And people stood up demos in about a day, working demos that were getting real outputs with a UI on top of them. We did a quick hackathon. It was a two-day hackathon. I think we had 35 demos after that. And these are prototypes that really work. You know, it's real data going in, real data coming back out. So yeah, I think all those resources like the speed with which you can get up to speed on them are really good. And the other thing is nobody knows exactly where it's going. Right?
克里斯Bledsoe: Right.
Shane Luke: So we'll see.
克里斯Bledsoe:So one of the things that seems to come up often is trustworthiness. Right?
Shane Luke:Yeah.
克里斯Bledsoe:How do you ensure that the content that you're building and what you're getting out of the AI model is trustworthy and valuable?
Shane Luke:So this is an open area of research, frankly. And I think that that's one important thing. It's one of the reasons why we wouldn't just kind of willy-nilly release--we could do that. I mean, we have that capability in-house. We could absolutely put a chat window in there that you could interact with large language models, put a prompt in, and get something back. But we wouldn't do that quite yet because we have to be at a point where we have to mitigate those things. So one is it's going to take some time because prompt injection, which is this idea that sort of an adversarial or skilled user on the other end could put an input into, into a prompt that would sort of game the output of the LLM to make it do something that the developer didn't intend. That is a fairly hard problem to mitigate, and there are different solutions to it, but people are smart, and they quickly adapt any solution that's out there, and they know what's there.
Shane Luke:这很像安全,那么,哟u have adaptive adversaries on the other side. On the content moderation side, those tools are all for the output, so for the LLM not to put out a toxic output or what-whatever a case might-- hallucination, whatever the case might be. That's improving. There are anti-hallucination technologies that are rapidly evolving there. So that's improving as well. But it's developing. So I think that one of the things is we do need to let that develop. Just like, you know, people are excited about the capabilities, so there's a lot of focus on that. Now that we can see some of the downsides of the capability, more focus is going on mitigating those, and we want to get the benefit of that. It takes time and it takes effort and it takes people thinking about the problem and developing software to manage the problem. I don't think it's a huge problem, right? So I think it's a very important problem, but the frequency with which it would be something, like, really bad is not that high. So it's, that's the type of thing, it's like a high risk, in terms of the effect that would happen, but it's not that high of a risk in terms of it actually occurring. Right? And there are some existing ways to mitigate it. So that's, that's important. Certainly testing is important as well. I think when you think about testing, testing to me is the last stop. So the first stop is design. So you want to design the system for success. You want to design it so that it's unlikely to produce an output you don't want, and then you make sure it doesn't with testing. And that is something we've put a lot of focus and emphasis on inside Workday. It is a developing area for everyone that's doing this, whether it's in consumer, whether it's, you know, in enterprise, no matter who it is. and regulation around it is also coming around and, to a degree, have to be a bit reactive to that. I mean, we're helping to form it, so we are leading out, but whatever decisions get made, there are things that we do have to take account of when we're developing product.
克里斯Bledsoe:So does that affect the biases here, right? 'Cause that's what you typically hear about in the news. Like, what are the biases that are being introduced [inaudible] these models that you're creating?
Shane Luke:是啊。的事就是偏见mething that is very subjective in a lot of ways. There are cases where it's not subjective is the extremely clear case. Right? If you take the boundary case. way out, well, everybody knows, you know, okay, well that-- clearly that was not-- and I won't even go into what those could be, but something that's just obviously a red flag. And if you take the very benign cases where it doesn't seem like it's doing anything, it's not dealing with people or anything controversial, well, okay, that's, th-- but it's kind of the fuzzy area in the middle that's getting worked out a little bit right now in terms of what's acceptable and what isn't. We have some standards that come from outside of the area, things in employment law, for instance, that help guide us, you know, in things like, disparate impact and in employment selection and things like that. but I do think it's more fuzzy than it is clear, and a lot of what's happening will help clear that up. Again, going back to this idea of, like, design and then testing, design is the most important thing. People are smart, no matter how smart ChatGPT seems, it's not smart. People are smart. The people that made it are very smart. And the people who make the products inside Workday are very smart. And they're good at designing systems that are unlikely to produce this. And you do still have to check, but, I think that the most important thing is we model these out, if we think anything is going to be risky, with smart people who have done this before and have deep understanding. They all have PhDs and have studied it, and that's what they do. And then we design them for success. We design them to be as safe as possible, and then we check them to ensure that. Yeah.
克里斯Bledsoe: That's great. Finally, can you discuss the Workday Extend AI and ML services and APIs that are on the roadmap for GA in the near future and how these capabilities will be surfaced for our developers?
Shane Luke:Yeah, so we have APIs that are across a number of domains. And I don't know what the timelines are going to be exactly. They were discussed, and that's up to the Extend team to do. But the domains that we're looking at are things like document intelligence. So this is a great one in the business space, right? Whether you're in HR or whether you're in finance, there are lots of documents, and those documents have important information for your business, and they can be difficult, a-and burdensome or lab-labor-intensive in order to process and manage. So one is there. And you can imagine things like, so we do today, OCR to take an unstructured thing like an image or, or a PDF or whatever and turn it into structured text that could go into your Workday business flows. We do insights off of documents. We're w-working on content summarization as well if you wanted to summarize a whole bunch of contracts, for instance. And that's a great use case for a large language model. So that's one big domain. And we have a few that are in, you know, they're here at the conference this week that are going to be on the GA path at some point, and in particular, in OCR and insights and documents. anomaly detection is another area. And I think it's a labs API that we have now. So it's early-stage. That's an important one, especially in Financials. So if you're looking at planning or financials, detecting anomalies, we have a product today that's vertically integrated, a couple of them that do anomaly detection, Expense Protect. Journal Insights is another one. Skills is already there. It was there last year. That's going to be on a march to GA for sure. Lots of clear applications and skills that are really powerful. Forecasting, that's another one, and that's, that's one that's great. I mean, it's business planning in both HR and finance. You might be planning, in the demo that Vivek showed, it was vacation hours, you know, what's that going to look like, right? It could be your forecast for sales or your forecast for revenue. So those are the areas that we have them in. I think, when I'm talking to the Extend team and what our plans are, we want to be fairly flexible about what the GA path looks like, just because we are in an experimental phase. And events like this are great for all of us to see what people do with them, you know. Where does it break? Where are there features that we could add? Where is it super successful and we should double or triple down on or accelerate? So, that's kind of what the path looks like.
克里斯Bledsoe:这太棒了。So is there anything else that you think would be valuable for our developers? Right? So if somebody who's not only familiar with this and they're looking at these APIs and trying to identify how to leverage this in building out applications, is there anything else you think that would be useful for our developers to know and be aware of?
Shane Luke:The one thing I would say, and I say this because I have some experience with folks maybe being a bit, like, trepidatious about using, ML 'cause it's maybe like a black box or it doesn't seem, you know, doesn't hit home, they're not sure how to use it, is the best way to get into it is to just try it. Right? Called the API. This is for developers. Called the API.
克里斯Bledsoe:Yeah.
Shane Luke:See what data you get back. What could I do with that data? Right? Look at an example, if there's a reference app that's out there, what they did with it, and really start to think about it even if it looks really far away from your use case, right? An ML service, for instance, that does anomaly detection, you might be looking at a financial application, but there might be an application in something completely different than that, that, that's actually relevant to you, maybe booking of PTO or something like that, whatever it might be. So, you really can think about this very broadly. And it's very approachable because at the end of the day, when you're working with ML services like this, you don't need to really know anything about machine learning. Intentionally, that's been abstracted away to just give you a capability that you can put into your applications, a predictive capability.
克里斯Bledsoe:这太棒了。非常好,谢谢你。我意图lly appreciate your time today. you've been listening to the Workday Podcast with our guest Shane Luke. If you enjoyed what you've heard today, be sure to follow us wherever you listen to your favorite podcasts. And remember, you can find all of our entire catalog at workday.com/podcasts. I'm your host Chris Bledsoe, and I hope you have a great work day. Thank you so much for joining us.
Shane Luke:Thanks for having me.