Lessons learned at Google Cloud Next 17

March 15, 2017

Maurice M Manning

The Google Cloud Next 17 conference took place at the Moscone Center in San Francisco on Wednesday through Friday of last week. An overwhelming number of courses, technologies, and project releases were presented; this article lists 100 announcements from the conference, across GoogleCloud offerings. The schedule is available on the conference website and videos of the keynotes and sessions are on the Google Cloud YouTube channel. While I have experience with several of Amazon’s cloud technologies, acquired in the course of previous projects to construct distributed systems, I have had limited exposure to the Google cloud technologies. Research IT sent me to the Next conference in order to assess Google’s offerings in this space.

I planned my agenda around technologies most applicable in a research environment. I also outlined a list of questions for Google engineers based on my experience prototyping on AppEngine, EarthEngine, and Apache Beam, as well as using the Drive API in Jupyter notebooks. Given all that was on offer, I am sure that I only took in a fraction of all the available information. This article highlights some of the ideas and capabilities that I found most interesting and most likely to address challenges faced by our research partners.

Google announced a number of advancements to the GSuite Drive capabilities which will soon be available to Berkeley users via the campus GSuite for Education license. Team Drive is a new product that enables data to be owned by and shared among a team instead of by an individual team member. Permissions to files owned by the team can be refined to restrict editing, commenting, or deleting. File Stream is a new product that makes files available on demand instead of using the mirroring model to synch with your local disk storage -- sidestepping one of the problems people encounter in using mirroring model, i.e., that it can unexpectedly or unintentionally consume large amounts of disk space on a user’s personal computer. Finally, Google announced that they had acquired AppBridges, which provides tools to facilitate migrating multiple types of content to Google Drive. Both File Stream and the AppBridges products are currently under evaluation by the bConnected team for rollout to the Berkeley campus.

The theme of day three was open source software, and during the keynote speech the success of both Kubernetes and Tensorflow across a wide range of use cases was highlighted along with Google’s many other open source contributions. I attended the presentation titled Microservices and Kubernetes: New functionality to assemble and operate applications at a higher level, which covered publishing, discovery, and provisioning of microservices using a service catalog. Research IT will be considering how the microservices pattern could provide common capabilities to UC Berkeley researchers. In any case, Kubernetes, GRPC, and the Go programming language, all open source technologies, are a powerful mix.

One of the topics that interested me most was Apache Beam, the open source version of Google DataFlow. Beam is a unified programming model to create data processing pipelines in either batch or streaming mode. Workflows created in Beam can run on Apache Flink, Apache Spark, and Google Cloud Dataflow; and Beam has both Java and Python SDKs. An in-depth but very interesting description of the challenges that Dataflow was created to solve can be found in Tyler Akidua’s two part article (1, 2) “The World Beyond Batch”. A number of the sessions I attended included demonstrations of data management and processing capabilities, usually by a partner company, employing either Dataflow or Beam to construct data pipelines.

A number of attendees gathered to talk with Apache Beam’s Frances Perry following her presentation, Apache Beam: Portable and Parallel Data Processing. One of the others in the group, I discovered, was Jon Dugan from ESNet. The ESNet software engineering team is already using Beam for data analysis, and I’m looking forward to learning more about their progress with this technology. I talked with Google engineers, such as Preston Holmes, about the challenges of processing large sets of images efficiently and will consult with them further on the specifics of several BRC projects on our near-term horizon.

There were also a number of well-attended presentations on Google’s Internet of things (IoT) technologies Weave and Android Things. Companies have developed innovative approaches for connecting devices to Google’s analytics and machine learning capabilities, including Noa, which gave a demo of tracking bikes on the Google campus using a small “donut” attached to the wheel. Research IT may soon have the opportunity to facilitate research teams that need to employ similar devices, such as mobile apps on phones, or wearables to collect information for the teams’ studies.

While some of the information presented at any conference is sales-oriented or at an introductory level, the availability of many Google engineers provided a great opportunity to dig into the details of the company’s cloud technologies. I was also energized by the opportunity to connect with other engineers from commercial companies and educational institutions to exchange ideas and observations on cloud technologies. Research IT and I thank UC Berkeley CTO Bill Allison for arranging my conference pass.