Fellow Mac Developers,
Maybe it’s time (pun intended) we came up with more creative menubar icons?
Fellow Mac Developers,
Maybe it’s time (pun intended) we came up with more creative menubar icons?
[ In a previous post we looked at some of the capabilities of Siri, Apple's intelligent personal assistant. Now let's venture a little deeper into how integration with third-party services might work.
I have to point something out that should be obvious: I have no inside knowledge of Apple's product plans. This note is purely based on the publicly demonstrated features of Siri sprinkled with a dose of speculation.
Although I had hoped Apple would offer deeper Siri integration with apps in iOS6, what they ended up showing during the WWDC 2012 keynote presentation was far more incremental: new back-end data services (sports scores, movie reviews, and local search) and the ability to launch apps by name. ]
(Image via Apple)
In the previous section we ended by noting which services Siri currently integrates with and how. But how might third-parties, at some point, integrate into Siri as first-class participants?
Before we step into the deep end we should distinguish between several categories of services that Siri currently supports:
For any service to tie-in with Siri there are two ways to do the integration tango:
This means Apple would have to publish a set of web APIs and data protocols and validate that third parties have implemented them correctly. What’s more the third parties may have to provide a guarantee of service since to the end-user any service failure would likely be blamed on Siri (at least, until it starts quoting Bart Simpson).
But Siri is a little different. Just following an interface specification may not be enough. It needs to categorize requests into sub-domains so it can narrowly focus its analytic engine and quickly return a relevant response. To do so it first needs to be told of a sub-domain’s existence. This means maintaining some sort of registry for such sub-domains.
For Siri to be able to answer questions having to do with say, real estate, an app or web-service needs to tell Siri that henceforth any real-estate related questions should be sent its way. Furthermore, it needs to provide a way to disambiguate between similar sounding requests, say, something in one’s calendar vs. a movie show time. So Siri will have to crack open its magic bottle a little and show developers how they can plug themselves into its categorization engine.
In the Android world we see a precedence for this where applications can register themselves as Intents (for example, able to share pictures). At runtime an Android app can request that an activity be located that can handle a specific task. The system brings up a list of registered candidates and asks the user to pick one then hands over a context to the chosen app:
This registry is fairly static and limited to what’s currently installed on the device. There has been an effort to define Web Intents but that’s more of a browser feature than a dynamic, network-aware version of Intents.
It should be noted that Google Now under the latest JellyBean version of Android has opted for a different, more Siri-like approach: queries are sent to server and are limited to a pre-defined set of services. We may have to wait for Android Key Lime Pie or Lemon Chiffon Cake (or something) before individual apps are allowed to integrate with Google Now.
Siri has proven adept at remembering necessary contextual data. Tell it to send a message to your ‘wife’ and it will ask you who your wife is. Once you select an entry in your addressbook it remembers that relationship and can henceforth handle a task with the word ‘wife’ in it without a hitch.
What this means, however, is that if a similar user-friendly tack is taken, the first app to plant its flag in a specific domain will likely be selected as the ‘default’ choice for that domain and reap the benefits till the end of time. It will be interesting to see if Apple will allow for applications to openly declare domains of their own or create a sort of fixed taxonomy (for example, SIC codes) within which apps must identify themselves. It’ll also be interesting to see if the appstore review process will vet apps for category ‘poaching’ or even classic cybersquatting.
It’s likely such a categorization will be implemented as application meta-data. In the case of web-services it will likely appear as HTML metadata, or a fixed-name plist file that Siri will look for. On the client side it’ll likely show up the same way Apple requires apps to declare entitlements today — in the application plist.
So for a web server to become Siri compiliant, it’s likely it will need to:
Currently Siri is integrated with a few back-end services hand-picked by Apple and baked into iOS. To truly open up third-party access iOS would have to allow these services to be registered dynamically, perhaps even registered and made available through the app store as operating system extensions. This could potentially give rise to for-pay Siri add-ons tied into subscription services.
’nuff said. Let’s move on to…
What if you have an app that lets you compose and send custom postcards, or order coffee online, or look up obscure sports scores? How could apps like this be driven and queried by Siri?
To see an example of this let’s look at how Siri integrates with the weather app. When we ask it a weather related question (“How hot will it be tomorrow around here?”):
To answer this question it has to go through a number of steps:
Weather is one of those services where the search can be performed directly from Siri’s back-end service or initiated on the device — either through the existing stock weather app or a code framework with a private API.
Imagine where your app might fit in if the user were to make a similar voice inquiry intended for it.
For steps #1, #2, and #5, and #6 to work, Siri has to have a deep understanding of your app’s capabilities and tight integration with its inner workings. It also needs to be able to perform voice to text conversion, categorize the domain, and determine if it’s best handled by your app before actually launching your app. What’s more it would need to to send the request to your app without invoking your user-interface!
In other words for Siri to work with third party apps, apps would have to provide a sort of functional map or dictionary of services and to provide a way for Siri to interact with them in a headless fashion.
It’s called AppleScript and it’s been an integral part of Mac desktop apps since the early 1990′s and System 7.
For desktop apps to be scriptable they have to publish a dictionary of actions they can perform in a separate resource that the operating system recognizes without having to invoke the actual app. The dictionary lists the set of classes, actions, and attributes accessible via scripting.
For Siri to access the internals of an app a similar technique could be used. However, there would have to be additional data provided by the app to help Siri better match the natural language needs in the user’s language.
The dictionary would have to exist outside the app so the OS could match the user query to an app’s capabilities quickly without having to launch the app beforehand. It would likely be ingested by the OS whenever a new app is installed on the device just as the application plist is handled today. Siri in iOS6 will have the ability to launch an app by name. It is likely it obtained the app name via the existing launch services registry on the device.
But a dictionary alone is not enough. Siri would need to interact with an app at a granular level. iOS and Objective-C already provide a mechanism to allow a class to implement a declared interface. It’s called a protocol and iOS is packed full of them. For an app to become Siri compliant it would have to not only publish its dictionary but also expose classes that implement a pre-defined protocol so its individual services can be accessed by Siri.
The implication of making apps scriptable will be profound for the developer community. To be clear I don’t think the scripting mechanism will be AppleScript as it exists today. The driver of this integration mechanism will be Siri so any type of automation will have to allow apps to function as first-class citizens in the Siri playground. This means natural language support, intelligent categorization, and contextual data — capabilities beyond what regular AppleScript can provide. But given Apple’s extensive experience with scripting there is a well-trodden path for them to follow.
Once an app can be driven by Siri there’s no reason why it couldn’t be also driven by an actual automated script or a tool like Automator. This would allow power-users to create complex workflows that cross multiple apps as they can do today under Mac OS. If these workflows themselves could be drivable by Siri then a raft of mundane, repetitive tasks could be managed easily via Siri’s voice-driven interface.
Let’s also consider that if apps are made scriptable it may be possible for them to invoke each other via a common interface. This sort of side-access and sharing between apps is currently available on the Mac but is sorely lacking in iOS. If allowed, it would open up a whole world of new possibilities.
During a conversation over beers with a fellow developer it was pointed out that Apple has no incentive to open up Siri to third-parties. Given the competitive comparisons drawn between Google Now and Siri I’m hopeful Apple will move more quickly now.
See… it shouldn’t be long
“The way [advertising] works is, we give away the product for free, then lure advertisers with the promise of connecting them to millions of people who hate to pay for things.”
Long before there was the iPhone and Android, there was PalmOS. It had a simple interface, could browse the web and access email, and you could download and install apps developed by third-parties. But then something happened, and Palm started resting on its laurels. The Treo was a big hit, but there was a long stretch of time where each new Treo was only very slightly different than the last one.
Many users and developers faithfully waited for Palm to come up with the successor to the Treo line. But alas, it took years and years and then Steve showed the iPhone and everything changed. Most of us moved on to the iPhone and Android and never looked back.
When webOS and the Pre finally came out, it was too little, too late. I faithfully signed up for their developer kit and went to a couple of their developer-only events down in Sunnyvale, but I came away thinking they had missed the boat. At the last developer day I figured they might shower those die-hards who had bothered to show up with free developer devices, to let them go out and evangelize the platform. But even then they didn’t get it. They raffled off a single tablet and most everyone left after lunch.
Eventually the company got sold to HP and then slowly fell apart (ignonimously with the $99 fire sale).
The Verge has posted a detailed post-mortem of the history of webOS. It’s an entertaining read, in a rubber-necking roadside accident sort of way. But I can’t help but feel sad at how it all rolled out.
An interesting quote confirmed what most people suspected, that Apple’s huge cash hoard is helping it get first priority on the best parts so any other device manufacturer not called Samsung is going to get left using second-rate parts:
At that time, Apple was almost singlehandedly dominating the smartphone supply chain and it took an enormous commitment — the kind of commitment that only a giant like HP could offer — to tip the scale. “We told HP we needed better displays [for the Pre 3]. They’d come back and say, ‘Apple bought them all. Our suppliers tell us we need to build them a factory if we want the displays’ and they weren’t willing to put the billion dollars upfront to do that,” one source said. “The same thing happened with cameras. We’d pick a part, turns out Apple picked the same part. We were screwed left and right.” Without HP’s full financial support to buy its way into relevance, Palm was essentially left to pick from the corporate parts bin — a problem that would strike particularly hard later on with the TouchPad.
From a technical point of view, I think it was an interesting experiment using webKit for the entire UI. I personally think they could have taken an alternate path but I applaud them for having a go at it. Someone had to try it and see if it would work.
The Enyo interface, though, was an abomination. It was an an abstraction on top of two levels of abstraction and the fact that a developer had to manually tweak JSON (!) made it obvious it was a design cul-de-sac.
It’s sad to see Palm end up this way after what surely must have been heroic amounts of engineering effort. I still own and wear a tattered Handspring t-shirt and at the bottom of my filing cabinet is an old Palm III, a color Handspring Visor, and a bunch of Treos I really should get rid of, but for some reason can’t.
Apple’s Siri personal assistant was one of the main advertised features of the iPhone 4S. It is a marriage of speaker-indepent voice-recognition technology and domain-aware AI. What sets it apart is how it’s integrated with back-end and on-device services and how it maintains context across multiple requests so it feels like you’re having a conversation with an intelligent assistant instead of barking orders at a robot minion.
Once you get past the immediate thrill of having your phone understand your commands, talk back, and crack jokes, you quickly realize how much more powerful this all could be if Siri could do more than the relatively limited range of tasks it can currently perform.
That is, until you discover that Siri is a walled garden.
The entire iOS platform is predicated on the premise that third parties can add new features to the stock system provided by Apple. But here we have a seemingly closed system in which third-parties can not participate. As usual, Apple is mumm about whether a Siri SDK will ever be released.
But the question of how third-party apps could participate in Siri has been nagging at me since I first got my hands on an iPhone 4S and started playing with it. As I thought about it more I realized that for a third-party app to play in the Siri playground, there would have to be some profound changes in the way we develop iOS apps.
To first realize the scope of what Siri could do we should walk through a few use-cases of its current capabilities.
Let’s take the case of creating an appointment. We invoke Siri by holding down the home button which brings up a standard modal interface and turns on the microphone. We speak the words “make lunch appointment with John tomorrow at noon.” Siri goes into processing mode, then generates a voice-synthesized message asking which “John” you would like to schedule a meeting with. The list of possible particpants has been garnered from the addressbook. You respond by saying, say, “John Smith”. Siri then displays a mini-calendar (which closely resembles a line entry in the standard calendar), makes note of any other conflicts, then asks for you to confirm. You respond ‘yes’ and the event has been entered into your calendar (and distributed out via iCloud or Exchange).
Let’s break down what Siri just did:
First, let’s consider whether Siri does voice-to-text recognition on the device itself or uses server resources. This is easy to verify. Put the phone into airplane mode and Siri can no longer operate so it’s likely it uses the server to do some (or all) of its recognition.
But how did Siri recognize that this was a calendar-related request (vs. a navigation one). It appears that once the voice data has been converted to text form, that the system categorizes the request into domains. By categorizing into well-defined domains the range of actions can be quickly narrowed. Having support for a sufficiently rich set of primitives within each domain creates the illusion of freedom to ask for anything. But of course, there are limits. Use an unusual word or phrase and it soon becomes obvious the vocabulary range has boundaries.
Once the action has been determined (e.g. schedule event) a particular set of slots need to be filled (who, when, where, etc.) and for each slot Siri apparently maintains a context and engages in a dialog until a minimum viable data set has been properly obtained. For example, if we ask it to ‘make an appointement with John Smith’ Siri will ask ‘OK… when’s the appointment?’ and if we say ‘tomorrow’ it will come back and ask ‘What time is your appointment?’ Once you’ve narrowed it down to ‘noon’ it can proceed to make the appointment with the calendar service.
By all appearances it maintains a set of questions that require an answer in order for it to accomplish a task and it iterates until those answers are obtained. Note that in this context a place was not required but if offered it is used to fill out the location field of a calendar entry. Some slots are required (who and when) but others are optional (where). It should also be noted that in its current state, Siri can easily be confused if given superfluous data:
Make lunch appointment with John Smith at noon tomorrow in San Francisco wearing a purple suit.
The actual event is recognized as Purple suite with John Smith with the location marked as San Francisco wearing.
Once all the required slots have been filled the actual act of creating a calendar entry is pretty straightforward since iOS offers an internal API for accessing the calendar database.
But Siri is not limited to just calendar events. As of first release the list of services available via Siri include:
Several of these services (e.g. weather, stocks, Find My Friends, notes, and WolframAlpha) do not advertise a publicly accessible API. This means that for Siri to work with them, the service had to be modified so it could have an integration point with Siri.
It is the nature of this integration point which would be of interest to software developers and in the next post we’ll explore it a little more to see how Apple might integrate with third-parties if it chose to open up Siri to other apps.
Clayton Miller makes a good case that every OS tries to ‘own’ a shape as a means of creating a strong visual identify. A parallel can be drawn to how online services become strongly associated with certain letters of the alphabet.
Old timers may remember that in the early days of UUCP the ! exclamation mark (also called the bang path) was used to designate the address route from a sender to a receiver. It was common practice to tell someone you could be reached at site!foovax!machine!username.
The @ (at-sign) superceded the bang path and was for years associated with an email address (i.e. user@domain). Then Twitter came along and adopted it as a designator for an individual (ie. @twitterhandle). Along the way Twitter also turned the # (hashtag) into a common designator for a topic abbreviation.
Google’s Plus service has just opened its doors and you can guess at the scope of Google’s ambitions by the way they’re using one of the most common punctuation marks: the + (plus sign). This is used throughout the service as a designator for a user profile. In a note posted to the Plus service Andy Hertzfeld references a number of his colleagues that also worked on the Plus service.
Notice how links to an individual’s profile are shown. There is ubiquitous use of the + character through other parts of the service as well:
Even the URL for the service gets in on the act: http://www.google.com/+
It’s obvious this is a conscious choice on Google’s part to create a strong association and help adopt the letter for itself. You’ll know the grab has been successful when + First Lastname starts popping up on business cards the same way @username started showing up with no explanation needed.
To complete the process Google will have to take two more steps:
It looks like the we’re in the early stages of the battle of the profiles. Trying to own the + sign is a smart move. Watch for other services as they try to mark their territory on the keyboard.
I’ve got dibs on?ramin.
2010 was a pretty incredible year both professionally and personally. My iPhone app ended up winning Macworld Best of Show a few weeks after release. I was part of the team that developed the iPad Slot Machine app which won “Coolest App” award at the first iPadDevCamp (that’s my Tinker Toy iPad stand pictured in this article, btw).
The best event by far, however, was our small family expanding by a happy, healthy, rambunctious +1.
The rest of the year was mostly a blur of WWDC, iOSDevcamps, and building cool iOS projects for clients. I did end up making time for a side project and I hope it will turn into a main project in the coming year. It’s not quite ready yet but I’ve shown it privately here in Silicon Valley and had a lot of positive feedback.
I don’t mean to build it up too much but it’s my biggest project yet, going back and revisiting some of the original ideas behind web technology and relating it to the mobile world. I’m calling it Transom. There’s still some work left but I hope to be able to share it openly in 2012.
As far as New Year’s resolutions go, I’d like to make time to post more often and not only on coding matters but also broader topics of technology trends, its impact on our lives, and about subjects I think need to be talked about. I’m excited about the coming year — about life and tech and society — and hope to do a decent job of conveying that enthusiasm.
See you on the flip side and Happy New Year!
I’m not an antenna engineer but it seems like the simplest solution would be to move the ‘gap’ between the two antennae where it’s unlikely to be covered through casual touch. This would work for both left and right-handed users.
From a hand-placement point-of-view the ideal spot would be in the middle of the cable connector but that could introduce structural issues with the plug.
It’s time, once again, for the annual love-fest that is WWDC and that starts next week. I’ve been to each one since the iPhone launch (I know, that makes me a relative newbie) but having spent a good chunk of the past two decades living in San Francisco, I figured I’d combine tech-tips for first-time attendees with social things to do for out-of-towners.
OK, so much for the conference itself. What about the social life?
Here’s a good reference for organized after-session gatherings: http://wwdcparties.com. On Tuesday between 7:30 and 10:30 pm, however, a lot of people will likely head over to the Apple Design Awards ceremony. A large number will stick around for the Stump the Chumps Experts session. On Thursday night Apple throws a bash in Yerba Buena Gardens. You’ll need your WWDC badge to get in. If you’re of drinking age you’ll need to get a wrist-band at Moscone before heading over. Pace yourself on the booze if you plan on hitting any other places afterward. You get all the food and beer you can consume plus a (surprise) live musical act. And there’s still Friday’s sessions left.
If you get tired of eating pre-packaged sandwiches and are hankering for something slightly different, here are a few places within easy walking distance of Moscone. However, no guarantee they’ll get you in and out in time for the afternoon sessions:
The swill they serve at WWDC may be fine for getting you over your previous night’s hangover, but you owe it to yourself to get some decent coffee while in town. Good places within walking distance:
For some reason after a day of technical brain-melding a lot of people are extra-primed to kick back and partake of adult beverages. Go figure. If that’s what you’re looking to do, here’s a number of local hangouts where you are sure to bump into fellow WWDC attendees:
Want to pretend you’re a local? Some are a bit of a walk, but if you get in with a group of fellow developers and want to get out of the Folsom/Howard zone, here are a few places to try (in no particular order):
Obviously I’m leaving a lot out, but this should give you a decent starting point. Feel free to post any corrections or favorites in the comments. Hope to see you all at WWDC. If anyone wants to look me up and say hi, I’m @raminf on Twitter.
Update: Added a few more places — for those in a hurry to get back to hacking and getting their mind blown.