Disclaimer : The purpose of this blog post is to share my thoughts or concerns regarding the policy framework for Artificial Intelligence (AI). I have more questions than any solid recommendation(s) and this might be due to lack of my knowledge on the subject area or complete failure to understand the framework and its implications. I am keen to hear your views on the subject.
With a self driven Uber car killing an Arizona pedestrian Elaine Herzberg, lot of new questions/concerns came out in both social and print media — “What was the cause the accident? Who should be held responsible for the accident? How insurance should work in such instances?” While legal and insurance companies are still looking confused over the issue and enforcement colleagues are keeping their silence on traffic and overall safety situation — the so called “Data Scientist/AI Junkies” are struggling to hide behind the curtains as fingers are being pointed towards them as they were the ones who have built autonomous car model in the first place.
After this incident, the AI policy for autonomous vehicles has gained eye balls and all the right attention, however, the AI is not just about this sexy driver-less cars. AI is now embedded fully into ourlives— ordering a pizza to booking a cab or a flight is all AI enabled discussions. Whether it is our interaction with Siri or Alexa, whether we are buying a favourite product from our favourite store — all decisions have AI as a common denominator. It is defining and redefining our behaviours, our interaction with things, people and places — it is defining new behavioural economics.
Amazon Alexa has perfect ears and a friendly voice for my 6 year old. He consumes pretty much what Alexa has to offer — whether it is his favorite music, joke or story; he is dependent upon his new found buddy. As a parent, folks are generally anxious and interested to know the kind of the friends their kids hang out with, what all social or environmental factors influence or shape up their behaviours? What will happen if Alexa start playing content which is inappropriate for my 6 year old? The memories of “Blue Whale” games are still fresh for most of us as parents.
I am not scared of Technology and intend is not to scare you either, the point here is that we need to seriously think of AI and its implications so that we can frame a policy which prepare us & guide our way in the “new behaviour space”.
Data is life blood of businesses and key ingredient for AI recipes. Rather than being glued to the old ways of thinking, we need to look at data with a whole new prospective which can enables us to build a strong foundation for our AI policy framework.
Interesting enough, we are seeing localization of data emerging as a big ticket item on CIO agenda especially in BFSI domain. Reserve Bank of India mandate on Data Privacy and Localization has further increased the seriousness and spend on this regime. Is that a sufficient measure from policy prospective? I am not debating about the physical storage of the data or its proximity to the origin of its generation/ownership. However, looking it purely from physcialization presents very myopic view of looking at the problem statement. Whether the data is physicalized in Sinapore, US or India based Data Centers, the question still remains how vulnerable are our data — both at rest and in motion?
With the adoption of Generative Networks (GANs), lot of data has been fabricated and generated by machines using pre-trained models and transfer learning techniques. These models, which are trained globally and are backed by strong support from community of researchers, universities, governments and programmers, are leveraged to solve business problems across the globe. In such cases what kind of localization policy should we adopt as the knowledge sits within the connected world?
In order to put things in perspective, let’s try to understand the scenario using one example from the real world. I am a consultant based in India, who is hired by his client to build a model which can classify the rooftop and then calculate the size of rooftop to estimate the cost of fabricating solar panel for their end clients. In order to calculate the rooftop size, I am using CNN based pre-trained model from YOLO. The underlying model is trained for roof top identification using data captured by researchers in US and other parts of the world. Will leveraging this pre-trained model and transfer the learning be allowed under localization of data act?
In transfer learning models, typically last few hidden layers are adjusted for weights to meet use case requirements — who should own the Intellectual Rights of that model? Will it be person adjusting the last few layers or the company/team which build the model in the first place? How can we provide “rights management” discipline in the world of Artificial Intelligence?
In the world where Corporate Darwinism is taken over by digital net space, enterprises are witnessing emergence of self-serviced, long-tailed business environment where different constituencies of shared interest co-exists and flourish across the world — what will it mean under localization act?
Let’s zoom into the data which is stored in so-called “local data stores or data centers” and review it using data security lens. When we talk about data security we generally talk about 2 key aspects of data — a) data or information security at rest and b) data or information security in motion.
Logical access management processes and access controls, along with data tokenization routines make sure that right level of data is viewed by people with right level of access using right tools and applications. It also ensures that there is a proper lineage of all the data elements. Can these principles be applied on deep neural nets where the features are derived by neural nets and not extracted or identified by human agents? In order for these models to work they need lot of fine grain feature rich data to be provided as input. Sometime, it become very hard to describe what goes under the hood of all the hidden layers — if that is the case — how our traditional lineage tools will produce lineage maps and graphs?
Now, let’s look at data in motion, the traditional approaches relies on secured data transmission between source and destination. The existing view takes computer nodes as an end point and tries to secure those. With IoT and connected objects including people — how can trust be established for secured transmission? How can I be sure that my data is transmitted securely over the wire or all connected devices? Even before that, do I know who all are transmitting what all information? How can I be certain that my personal data for which I have authorized Model A is not transferred or put into some use by Model B which acts as a downstream to Model A? Facebook Data and Cambridge Analytica use case is a perfect example of this.
Let’s deep dive further into these models and see what goes under the hood. I might use a neural net to compose a song or produce music. This ML model uses lyrics from million plus songs across the globe and then tries to create a new lyrics based on the knowledge it has captured by identifying patterns across corpus of million songs that I provided. This type of music and compositions are already becoming main-stream now. In such scenario, who should get the royalty of the composition — all million lyricists, the neural net model or the client who has asked me to produce lyrics for them? Again, how “rights management” will work in these scenarios?
These federated models sits on edge computing devices and either they access copy of model from central repository or download and run it locally. These are few open questions which still need to be addressed by Copyright & Intellectual Property law.
AI industry in India is in infancy, while few are trying to define the standards, still there is a huge gap between what information or knowledge already exists within the reach of a firm operating in this industry. Information Capture, Knowledge Codification and Information Dissemination are key areas where Government needs to build a policy framework in order to communicate with the Industry and its players so as to reduce information asymmetries in AI marketplace?
Indian government published two AI roadmaps — the Report of Task Force on Artificial Intelligence by the AI Task Force constituted by the Ministry of Commerce and Industry and the National Strategy for Artificial Intelligence by Niti Aayog. The National Roadmap for Artificial Intelligence by NITI Aayog proposes the creation of a National AI marketplace that is comprised of a data marketplace, data annotation marketplace, and deployable model marketplace/solutions marketplace. The question is how we can be built a federated data lake where companies, researchers and developers can access the data and use it to drive innovation and achieve economic growth? How can we bring about transparency and regulate the development of algorithms — which are traceable, auditable, and explainable?
India has still not completed its digitization journey, the data fabric and online registries for common services such as Medical, Health , Education, Welfare etc are missing — how can be build a foundation for strong data driven economy and decision making process? Without solid Intellectual Property laws and adequate spend in research and developments — can AI or its policy be of any use?