Talk Funnel

Ramin Firoozye's (occasional) Public Whisperings

iWork and free apps

It’s the pre-Holiday/post-WWDC season and iOS and Mavericks updates have arrived with a flurry of new announcements including free versions of the iWork apps. Now you can get an OS upgrade AND a decent productivity suite — for free.

One of Apple’s big selling points is the ability to seamlessly exchange and edit files across iOS and Macs, as well as support for live multi-person web-based editing. This pitches them squarely against Microsoft Office, Office 365, and Google Docs.

To get to this point they had to do a total re-write of existing iWork apps so they would have feature parity across iOS, Mac, and the web. And by rewriting they ended up leaving out some features out of the already capable desktop version.

As you might imagine this didn’t go over so well with some highly vocal people. Apple, feeling the heat, uncharacteristically put out a press release promising future enhancements — pretty much taking a page from the Final Cut Pro X playbook.

Merging the source code between iOS and Mac means they get feature parity. Doing so means they can have the same document accessible across Mac, iOS, and the web. However, it means they had to leave out certain features, notably AppleScript.

Applescript has been an omni-present power-user feature that has been around for quite some time. Taking it out of the flagship office suite can mean either:

  1. Apple is dropping support for scripting and automation, or
  2. We’ll be getting AppleScript across all platforms — specifically, iOS.

I’m betting on #2.

Doing so offers several benefits:

  • Apple will get even more parity between their two platforms.
  • iOS apps can be automated.
  • Nobody else supports tight scripting of the mobile workflow.
  • If developers change their apps to support scripting you would get (practically for free) Siri integration with installed apps!

If it was me I’d put AppleScript on iOS support high up on the priority list of features to announce in the next WWDC. Along with a new iWork update. And Siri Everywhere™.

Pushing out iWork in its current half-baked form worked well with the ‘all OS and apps are free’ story. That sticks it to Microsoft in a big way, makes Macs and iOS devices all the more attractive to the BYOD crowd, and lets Apple have a horse in the race against Google Apps.

But I believe the end-goal should be scripting on iOS and deep Siri integration.

It would open up a whole new way of interacting with our mobile devices.

Menubar Icon Madness

Menubarclocks

Fellow Mac Developers,

Maybe it’s time (pun intended) we came up with more creative menubar icons?

Siri and third-party apps: Part II

[ In a previous post we looked at some of the capabilities of Siri, Apple's intelligent personal assistant. Now let's venture a little deeper into how integration with third-party services might work.

I have to point something out that should be obvious: I have no inside knowledge of Apple's product plans. This note is purely based on the publicly demonstrated features of Siri sprinkled with a dose of speculation.

Although I had hoped Apple would offer deeper Siri integration with apps in iOS6, what they ended up showing during the WWDC 2012 keynote presentation was far more incremental: new back-end data services (sports scores, movie reviews, and local search) and the ability to launch apps by name. ]

Apple Siri(Image via Apple)

In the previous section we ended by noting which services Siri currently integrates with and how. But how might third-parties, at some point, integrate into Siri as first-class participants?

Before we step into the deep end we should distinguish between several categories of services that Siri currently supports:

  • On device with a public API (i.e. calendar, alarms, location-based services).
     
  • On device without a public API (notes, reminders).
  • Those that require access to server-side features (Yelp, WolframAlpha, sports scores).

First… 

The Server-side

For any service to tie-in with Siri there are two ways to do the integration tango:

  1. Comply with an interface that Siri expects (for example, a set of RESTful actions and returned JSON data).
  2. Offer a proxy service or glue handlers to bridge the gap between what Siri expects and what the service/app offers.

This means Apple would have to publish a set of web APIs and data protocols and validate that third parties have implemented them correctly. What’s more the third parties may have to provide a guarantee of service since to the end-user any service failure would likely be blamed on Siri (at least, until it starts quoting Bart Simpson).

But Siri is a little different. Just following an interface specification may not be enough. It needs to categorize requests into sub-domains so it can narrowly focus its analytic engine and quickly return a relevant response. To do so it first needs to be told of a sub-domain’s existence. This means maintaining some sort of registry for such sub-domains.

For Siri to be able to answer questions having to do with say, real estate, an app or web-service needs to tell Siri that henceforth any real-estate related questions should be sent its way. Furthermore, it needs to provide a way to disambiguate between similar sounding requests, say, something in one’s calendar vs. a movie show time. So Siri will have to crack open its magic bottle a little and show developers how they can plug themselves into its categorization engine.

In the Android world we see a precedence for this where applications can register themselves as Intents (for example, able to share pictures). At runtime an Android app can request that an activity be located that can handle a specific task. The system brings up a list of registered candidates and asks the user to pick one then hands over a context to the chosen app:

Android intentsThis registry is fairly static and limited to what’s currently installed on the device. There has been an effort to define Web Intents but that’s more of a browser feature than a dynamic, network-aware version of Intents.

It should be noted that Google Now under the latest JellyBean version of Android has opted for a different, more Siri-like approach: queries are sent to server and are limited to a pre-defined set of services. We may have to wait for Android Key Lime Pie or Lemon Chiffon Cake (or something) before individual apps are allowed to integrate with Google Now.

Context is King

Siri has proven adept at remembering necessary contextual data. Tell it to send a message to your ‘wife’ and it will ask you who your wife is. Once you select an entry in your addressbook it remembers that relationship and can henceforth handle a task with the word ‘wife’ in it without a hitch.

What this means, however, is that if a similar user-friendly tack is taken, the first app to plant its flag in a specific domain will likely be selected as the ‘default’ choice for that domain and reap the benefits till the end of time. It will be interesting to see if Apple will allow for applications to openly declare domains of their own or create a sort of fixed taxonomy (for example, SIC codes) within which apps must identify themselves. It’ll also be interesting to see if the appstore review process will vet apps for category ‘poaching’ or even classic cybersquatting.

It’s likely such a categorization will be implemented as application meta-data. In the case of web-services it will likely appear as HTML metadata, or a fixed-name plist file that Siri will look for. On the client side it’ll likely show up the same way Apple requires apps to declare entitlements today — in the application plist.

So for a web server to become Siri compiliant, it’s likely it will need to:

  1. Respond to a pre-defined Siri web interface and data exchange format.
     
  2. Provide metadata to help Siri categorize the service into specific subdomains, and
     
  3. Optionally: offer some sort of Quality of Service guarantee by showing that they can handle the potential load.

Currently Siri is integrated with a few back-end services hand-picked by Apple and baked into iOS. To truly open up third-party access iOS would have to allow these services to be registered dynamically, perhaps even registered and made available through the app store as operating system extensions. This could potentially give rise to for-pay Siri add-ons tied into subscription services. 

’nuff said. Let’s move on to…

The Client Side

What if you have an app that lets you compose and send custom postcards, or order coffee online, or look up obscure sports scores? How could apps like this be driven and queried by Siri?

To see an example of this let’s look at how Siri integrates with the weather app. When we ask it a weather related question (“How hot will it be tomorrow around here?”):

Siri example - How hot will it be tomorrow around here

To answer this question it has to go through a number of steps:

  1. Translate the voice query to text.
     
  2. Categorize the query as weather related (“hot”).
     
  3. Find our current location via location services (“around here”).
     
  4. Determine when “tomorrow” might be, given the user’s current locale.
     
  5. Relay a search to the weather server for data for the particular date and location. It understood ‘how hot’ as looking for the ‘high’ temperature.
     
  6. Present the data in a custom card to the user. Note that it responded with both a text response as well as a visual one — with ‘tomorrow’ highlighted in the results.

Weather is one of those services where the search can be performed directly from Siri’s back-end service or initiated on the device — either through the existing stock weather app or a code framework with a private API.

Imagine where your app might fit in if the user were to make a similar voice inquiry intended for it.

  1. Siri might need to be given hints to perform speaker-independent voice-to-text analysis. It appears that at least part of this conversion is done on the server (turn networking off and Siri stops working) so wherever the actual text conversion code resides will need to have access to these hints (or voice templates).
     
  2. To categorize the text of the request as belonging to your subdomain Siri would need to know about that subdomain and its association to your app.
     
  3. Location would be obtained via on-device location services.
     
  4. Time would be obtained via on-device time and locale settings.
     
  5. The request would have to be relayed to your app and/or back-end service, in some sort of normalized query form and expect to get the data back in a similar form.
     
  6. The result could be presented to the user either via spoken text or shown on a graphic card. If it is to be presented as text there has to be enough context so Siri can answer follow-up questions (“How hot will it be tomorrow?” “How about the following week?”). If it is to be shown in a visual card format then the result has to be converted into a UIView either by Siri or your code.

For steps #1, #2, and #5, and #6 to work, Siri has to have a deep understanding of your app’s capabilities and tight integration with its inner workings. It also needs to be able to perform voice to text conversion, categorize the domain, and determine if it’s best handled by your app before actually launching your app. What’s more it would need to to send the request to your app without invoking your user-interface!

In other words for Siri to work with third party apps, apps would have to provide a sort of functional map or dictionary of services and to provide a way for Siri to interact with them in a headless fashion.

Sound familiar?

It’s called AppleScript and it’s been an integral part of Mac desktop apps since the early 1990′s and System 7.

AppleScript editor

For desktop apps to be scriptable they have to publish a dictionary of actions they can perform in a separate resource that the operating system recognizes without having to invoke the actual app. The dictionary lists the set of classes, actions, and attributes accessible via scripting. 

For Siri to access the internals of an app a similar technique could be used. However, there would have to be additional data provided by the app to help Siri better match the natural language needs in the user’s language.

The dictionary would have to exist outside the app so the OS could match the user query to an app’s capabilities quickly without having to launch the app beforehand. It would likely be ingested by the OS whenever a new app is installed on the device just as the application plist is handled today. Siri in iOS6 will have the ability to launch an app by name. It is likely it obtained the app name via the existing launch services registry on the device.

But a dictionary alone is not enough. Siri would need to interact with an app at a granular level. iOS and Objective-C already provide a mechanism to allow a class to implement a declared interface. It’s called a protocol and iOS is packed full of them. For an app to become Siri compliant it would have to not only publish its dictionary but also expose classes that implement a pre-defined protocol so its individual services can be accessed by Siri. 

The implication of making apps scriptable will be profound for the developer community. To be clear I don’t think the scripting mechanism will be AppleScript as it exists today. The driver of this integration mechanism will be Siri so any type of automation will have to allow apps to function as first-class citizens in the Siri playground. This means natural language support, intelligent categorization, and contextual data — capabilities beyond what regular AppleScript can provide. But given Apple’s extensive experience with scripting there is a well-trodden path for them to follow.

Once an app can be driven by Siri there’s no reason why it couldn’t be also driven by an actual automated script or a tool like Automator. This would allow power-users to create complex workflows that cross multiple apps as they can do today under Mac OS. If these workflows themselves could be drivable by Siri then a raft of mundane, repetitive tasks could be managed easily via Siri’s voice-driven interface.

Let’s also consider that if apps are made scriptable it may be possible for them to invoke each other via a common interface. This sort of side-access and sharing between apps is currently available on the Mac but is sorely lacking in iOS. If allowed, it would open up a whole world of new possibilities.

During a conversation over beers with a fellow developer it was pointed out that Apple has no incentive to open up Siri to third-parties. Given the competitive comparisons drawn between Google Now and Siri I’m hopeful Apple will move more quickly now.

When will Apple open Siri to third-party apps

See… it shouldn’t be long :-)

In-app advertising

“The way [advertising] works is, we give away the product for free, then lure advertisers with the promise of connecting them to millions of people who hate to pay for things.”

From PROSPECTUS FOR SILICON VALLEY’S NEXT HOT TECH IPO, WHERE NOTHING COULD POSSIBLY GO WRONG.

 

WebOS postmortem

Long before there was the iPhone and Android, there was PalmOS. It had a simple interface, could browse the web and access email, and you could download and install apps developed by third-parties. But then something happened, and Palm started resting on its laurels. The Treo was a big hit, but there was a long stretch of time where each new Treo was only very slightly different than the last one.

Many users and developers faithfully waited for Palm to come up with the successor to the Treo line. But alas, it took years and years and then Steve showed the iPhone and everything changed. Most of us moved on to the iPhone and Android and never looked back.

When webOS and the Pre finally came out, it was too little, too late. I faithfully signed up for their developer kit and went to a couple of their developer-only events down in Sunnyvale, but I came away thinking they had missed the boat. At the last developer day I figured they might shower those die-hards who had bothered to show up with free developer devices, to let them go out and evangelize the platform. But even then they didn’t get it. They raffled off a single tablet and most everyone  left after lunch.

Eventually the company got sold to HP and then slowly fell apart (ignonimously with the $99 fire sale).

The Verge has posted a detailed post-mortem of the history of webOS. It’s an entertaining read, in a rubber-necking roadside accident sort of way. But I can’t help but feel sad at how it all rolled out.

An interesting quote confirmed what most people suspected, that Apple’s huge cash hoard is helping it get first priority on the best parts so any other device manufacturer not called Samsung is going to get left using second-rate parts:

At that time, Apple was almost singlehandedly dominating the smartphone supply chain and it took an enormous commitment — the kind of commitment that only a giant like HP could offer — to tip the scale. “We told HP we needed better displays [for the Pre 3]. They’d come back and say, ‘Apple bought them all. Our suppliers tell us we need to build them a factory if we want the displays’ and they weren’t willing to put the billion dollars upfront to do that,” one source said. “The same thing happened with cameras. We’d pick a part, turns out Apple picked the same part. We were screwed left and right.” Without HP’s full financial support to buy its way into relevance, Palm was essentially left to pick from the corporate parts bin — a problem that would strike particularly hard later on with the TouchPad.

From a technical point of view, I think it was an interesting experiment using webKit for the entire UI. I personally think they could have taken an alternate path ;-) but I applaud them for having a go at it. Someone had to try it and see if it would work.

The Enyo interface, though, was an abomination. It was an an abstraction on top of two levels of abstraction and the fact that a developer had to manually tweak JSON (!) made it obvious it was a design cul-de-sac.

It’s sad to see Palm end up this way after what surely must have been heroic amounts of engineering effort. I still own and wear a tattered Handspring t-shirt and at the bottom of my filing cabinet is an old Palm III, a color Handspring Visor, and a bunch of Treos I really should get rid of, but for some reason can’t.

Siri and third-party apps – Part I

Siri on iPhone 4SApple’s Siri personal assistant was one of the main advertised features of the iPhone 4S. It is a marriage of speaker-indepent voice-recognition technology and domain-aware AI. What sets it apart is how it’s integrated with back-end and on-device services and how it maintains context across multiple requests so it feels like you’re having a conversation with an intelligent assistant instead of barking orders at a robot minion.

Once you get past the immediate thrill of having your phone understand your commands, talk back, and crack jokes, you quickly realize how much more powerful this all could be if Siri could do more than the relatively limited range of tasks it can currently perform.

That is, until you discover that Siri is a walled garden.

The entire iOS platform is predicated on the premise that third parties can add new features to the stock system provided by Apple. But here we have a seemingly closed system in which third-parties can not participate. As usual, Apple is mumm about whether a Siri SDK will ever be released.

But the question of how third-party apps could participate in Siri has been nagging at me since I first got my hands on an iPhone 4S and started playing with it. As I thought about it more I realized that for a third-party app to play in the Siri playground, there would have to be some profound changes in the way we develop iOS apps.

To first realize the scope of what Siri could do we should walk through a few use-cases of its current capabilities.


Let’s take the case of creating an appointment. We invoke Siri by holding down the home button which brings up a standard modal interface and turns on the microphone. We speak the words “make lunch appointment with John tomorrow at noon.” Siri goes into processing mode, then generates a voice-synthesized message asking which “John” you would like to schedule a meeting with. The list of possible particpants has been garnered from the addressbook. You respond by saying, say, “John Smith”. Siri then displays a mini-calendar (which closely resembles a line entry in the standard calendar), makes note of any other conflicts, then asks for you to confirm. You respond ‘yes’ and the event has been entered into your calendar (and distributed out via iCloud or Exchange).

Let’s break down what Siri just did:

  1. It put up a system-modal dialog.
  2. It captured some audio for processing.
  3. It processed the audio and converted it to text (often without previous training to adapt to a specific person’s voice).
  4. It managed to elicit enough context form the text to determine the request had to do with a calendar action.
  5. It extracted the name of an individual out of the text and matched it against the addressbook.
  6. It determined that the request was not specific enough and responded with a request for clarification. This request contained more data (list of other Johns) based on a search of the addressbook.
  7. It converted the request to a synthesized voice.
  8. The response was again recorded and processed for text recognition. It should be noted that one of the marvels of Siri is by going through a dialog, where each individual response is considered in the context of previous requests. When Siri offers a list of Johns, your response is ‘John Smith.’ Taken out of context, it would not make any sense, but Siri maintained context from a previous request and handled the response properly.
  9. Once confirmed, the event was scheduled with the calendar service.
  10. The system modal dialog can now be dismissed.

First, let’s consider whether Siri does voice-to-text recognition on the device itself or uses server resources. This is easy to verify. Put the phone into airplane mode and Siri can no longer operate so it’s likely it uses the server to do some (or all) of its recognition.

But how did Siri recognize that this was a calendar-related request (vs. a navigation one). It appears that once the voice data has been converted to text form, that the system categorizes the request into domains. By categorizing into well-defined domains the range of actions can be quickly narrowed. Having support for a sufficiently rich set of primitives within each domain creates the illusion of freedom to ask for anything. But of course, there are limits. Use an unusual word or phrase and it soon becomes obvious the vocabulary range has boundaries.

Once the action has been determined (e.g. schedule event) a particular set of slots need to be filled (who, when, where, etc.) and for each slot Siri apparently maintains a context and engages in a dialog until a minimum viable data set has been properly obtained. For example, if we ask it to ‘make an appointement with John Smith’ Siri will ask ‘OK… when’s the appointment?’ and if we say ‘tomorrow’ it will come back and ask ‘What time is your appointment?’ Once you’ve narrowed it down to ‘noon’ it can proceed to make the appointment with the calendar service.

By all appearances it maintains a set of questions that require an answer in order for it to accomplish a task and it iterates until those answers are obtained. Note that in this context a place was not required but if offered it is used to fill out the location field of a calendar entry. Some slots are required (who and when) but others are optional (where). It should also be noted that in its current state, Siri can easily be confused if given superfluous data:

Make lunch appointment with John Smith at noon tomorrow in San Francisco wearing a purple suit.

The actual event is recognized as Purple suite with John Smith with the location marked as San Francisco wearing.

Once all the required slots have been filled the actual act of creating a calendar entry is pretty straightforward since iOS offers an internal API for accessing the calendar database.

But Siri is not limited to just calendar events. As of first release the list of services available via Siri include:

  • Phone
  • iPod/Music
  • Messages
  • Calendar
  • Reminders
  • Maps and Directions
  • Email
  • Weather
  • Stocks
  • Clock, Alarm, and Timer
  • Address Book
  • Find My Friends
  • Notes
  • Web Search
  • Search via WolframAlpha
  • Nearby services and reviews via Yelp

Several of these services (e.g. weather, stocks, Find My Friends, notes, and WolframAlpha) do not advertise a publicly accessible API. This means that for Siri to work with them, the service had to be modified so it could have an integration point with Siri.

It is the nature of this integration point which would be of interest to software developers and in the next post we’ll explore it a little more to see how Apple might integrate with third-parties if it chose to open up Siri to other apps.

 

User Interface Rorschach Text

Job’s done. Fix it or leave it alone?

 

Bathroom

Image via Reddit.

Owning a Character

Clayton Miller makes a good case that every OS tries to ‘own’ a shape as a means of creating a strong visual identify. A parallel can be drawn to how online services become strongly associated with certain letters of the alphabet.

Old timers may remember that in the early days of UUCP the ! exclamation mark (also called the bang path) was used to designate the address route from a sender to a receiver. It was common practice to tell someone you could be reached at site!foovax!machine!username.

The @ (at-sign) superceded the bang path and was for years associated with an email address (i.e. user@domain). Then Twitter came along and adopted it as a designator for an individual (ie. @twitterhandle). Along the way Twitter also turned the # (hashtag) into a common designator for a topic abbreviation.

Google’s Plus service has just opened its doors and you can guess at the scope of Google’s ambitions by the way they’re using one of the most common punctuation marks: the + (plus sign). This is used throughout the service as a designator for a user profile. In a note posted to the Plus service Andy Hertzfeld references a number of his colleagues that also worked on the Plus service.

Pluspost

Notice how links to an individual’s profile are shown. There is ubiquitous use of the + character through other parts of the service as well:

Plus Header

Even the URL for the service gets in on the act: http://www.google.com/+

It’s obvious this is a conscious choice on Google’s part to create a strong association and help adopt the letter for itself. You’ll know the grab has been successful when + First Lastname starts popping up on business cards the same way @username started showing up with no explanation needed.

To complete the process Google will have to take two more steps:

  1. Allow a Google profile to have a real name instead of the current numeric designator. My Google profile is: https://plus.google.com/100474999684775059354. It should be https://google.com/+/ramin firoozye or at the very least https://plus.google.com/ramin firoozye. Naturally, there will be contention for common names, but that’s an unavoidable problem on just about every service. Both LinkedIn and Facebook support shortened URLs with names pointing to public profiles.
  2. Modify the Google search engine so names prefixed with a + are automatically looked up in the Google Plus or Profile service. The Google search box is everywhere. It should be easy to look someone up by simply entering +name in the search box. This might be a little tough since + is also a boolean operator in the Google search world. But they could make an exception for a single-word search if it starts with a +.

It looks like the we’re in the early stages of the battle of the profiles. Trying to own the + sign is a smart move. Watch for other services as they try to mark their territory on the keyboard.

I’ve got dibs on?ramin.

2011 New Year’s Resolutions

2010 was a pretty incredible year both professionally and personally. My iPhone app ended up winning Macworld Best of Show a few weeks after release. I was part of the team that developed the iPad Slot Machine app which won “Coolest App” award at the first iPadDevCamp (that’s my Tinker Toy iPad stand pictured in this article, btw).

The best event by far, however, was our small family expanding by a happy, healthy, rambunctious +1.

The rest of the year was mostly a blur of WWDC, iOSDevcamps, and building cool iOS projects for clients. I did end up making time for a side project and I hope it will turn into a main project in the coming year. It’s not quite ready yet but I’ve shown it privately here in Silicon Valley and had a lot of positive feedback.

I don’t mean to build it up too much but it’s my biggest project yet, going back and revisiting some of the original ideas behind web technology and relating it to the mobile world. I’m calling it Transom. There’s still some work left but I hope to be able to share it openly in 2012.

As far as New Year’s resolutions go, I’d like to make time to post more often and not only on coding matters but also broader topics of technology trends, its impact on our lives, and about subjects I think need to be talked about. I’m excited about the coming year — about life and tech and society — and hope to do a decent job of conveying that enthusiasm.

See you on the flip side and Happy New Year!

Simple Solution to iPhone 4 Antenna Problem

Before:

iphone4-before.png

After:

iphone4-after.png

I’m not an antenna engineer but it seems like the simplest solution would be to move the ‘gap’ between the two antennae where it’s unlikely to be covered through casual touch. This would work for both left and right-handed users.

From a hand-placement point-of-view the ideal spot would be in the middle of the cable connector but that could introduce structural issues with the plug.