2019-01-24

Open Source Year In Review - Facebook Code

Windwing - Open Source Year In Review
POSTED ON  TO OPEN SOURCE
Open Source Year In Review

At Facebook, we believe in the value of open source technology to achieve a shared goal of improving tools and frameworks used by the entire community. To continue our work toward that goal, we released 153 new open source projects in 2018. Our active portfolio (after removing or archiving outdated repos) contains a total of 474 projects. Collectively, these projects had more than 94,000 commits this year, nearly 28,000 of which came from amongst our over 2,700 external contributors. This healthy and vibrant ecosystem has grown to more than 1.03 million followers, including 257,000 new followers this year.

Windwing - Open Source Year In Review

PyTorch, our open source Python-based deep learning platform, announced its 1.0 stable release with new capabilities and partners. PyTorch is now the second-fastest-growing open source project on GitHub. We also a released pair of kernel libraries (QNNPACK and FBGEMM) that make it easier for mobile devices and servers to run the latest AI models, and PyText, a framework that accelerates NLP development.

PyTorch also provided the foundation for Horizon, the first open source end-to-end platform that uses applied reinforcement learning (RL) to optimize systems in large-scale production environments. We also expanded ONNX to support additional AI tools. And Glow, our compiler for neural network hardware accelerators, used the power of community to gain industry partnerships for supporting it in future silicon products.

Facebook AI Research (FAIR) released its object detection framework, Detectron, as well as Mask R-CNN2Go, a computer vision model optimized for embedded and mobile devices. Through the Open Compute Project (OCP), we open-sourced the specifications for two AI-based server designs, Big Sur and Big Basin. Other projects, including TensorComprehensions, DensePose, Translate, and TorchCraftAI, were released as part of our open frameworks effort around artificial intelligence.

In addition to our work on machine learning, our work on development tools, mobile, networking, data infrastructure, virtual reality, and other pillars was well represented this year. We open-sourced Flipper, our new, extensible debugging tool for iOS and Android. For Python developers, we released a type-checker called Pyre and a code refactoring tool called Bowler. In networking, we released Katran, a scalable network load balancer, and Fizz, our C++14 implementation of the TLS-1.3 standard. We also released LogDevice, our distributed data store for sequential data, and the XAR system for self-contained executables.

Docusaurus, released in December 2017, gained a lot of traction as a premier tool to help open source projects build websites and documentation. In just over a year, the number of projects using Docusaurus has grown to more than 55, including React 360(rebranded this year from ReactVR); Profilo (our high-throughput, mobile-first performance tracing library); and Spectrum, a cross-platform image transcoding library.

In November, in partnership with the founders of GraphQL, the Linux Foundation, and key participants in the community, we kicked off the process of forming the GraphQL foundation.

Our open source program would be nothing without the tools that allow our project owners to deploy their projects quickly and with high quality. The Facebook Open Source Tooling team continued to build and improve upon these tools. ShipIt allows project owners to both easily export code changes from our internal codebase to GitHub and import pull requests from the community. This provides consistency through automated synchronization. The team also developed new tools for community interaction, incorporation of GitHub issues into our internal workflows, and compliance of repo content across our entire project portfolio.

Looking ahead to 2019, we will continue our commitment to open source and to bringing innovative technology to the community.

This series of posts looks back on the engineering work and new technologies we released in 2018. Read yesterday's post about our work in Data Centers and check back tomorrow to learn about our work in Artificial Intelligence.

Data Centers Year In Review - Facebook Code

Windwing - Data Centers Year In Review
POSTED ON  TO DATA CENTER ENGINEERING
Data Centers Year In Review
RACHEL PETERSON

For 2018, we knew we needed to continue scaling our data center capacity while simultaneously making our infrastructure even more powerful and efficient. Over the past year, we've expanded to a total of 15 data center locations, with new centers announced in Newton County, Georgia; Eagle Mountain, Utah; Huntsville, Alabama; and Singapore. We also announced expansions at our existing sites in Papillion, Nebraska; Henrico, Virginia; Luleå, Sweden; and Prineville, Oregon. And we announced that the Clonee Data Center is now up and running in Ireland.

In August, we shared our commitment to purchase 100 percent renewable energy and reduce our greenhouse gas emissions by 75 percent by 2020. Rising to that challenge drove creative design and technology solutions that helped us adapt to a variety of environments and scale to meet demand. To address the issue of scale, we designed the Fabric Aggregator, a distributed network system made up of simple building blocks. This system allows us to accommodate larger regions and varied traffic patterns, and adds the flexibility to adapt to future growth.

Earlier this year, we announced the StatePoint Liquid Cooling (SPLC) system, a new evaporative cooling system that uses water instead of air to cool data halls. The SPLC system allows us to build highly energy- and water-efficient data centers in places where direct cooling is not feasible.

Windwing - Data Centers Year In Review
Schematic of the SPLC cooling scheme for a data center: The SPLC units are deployed on the rooftop. These SPLC units produce cold water, which is then supplied to the fan-coil wall (FCW) unit. These FCW units use the cold water supplied by the SPLC units to cool the servers. The hot water from these FCW units is returned to SPLC units, where it will be cooled and recycled through the system.

In September, we broke ground on our data center in Singapore, our first custom-built center in Asia. Building for a location like Singapore presented new opportunities for design innovation and efficiency. To conserve space, the building will be our first multistory data center (11 stories). The building's façade is made of a perforated lightweight material that allows for airflow to help cool the building. It will also be one of the first to incorporate the abovementioned StatePoint Liquid Cooling system.

Windwing - Data Centers Year In Review
Singapore Data Center design

In addition to the solutions we've been able to implement this year, we've also open-sourced StateService, our state machine as a service, and shared new details and learnings about FBOSS, our open switching system.

Looking ahead to 2019, we will continue to focus on increasing our efficiency and reducing our environmental footprint.

This series of posts looks back on the engineering work and new technologies we developed in 2018. Check back tomorrow to read about our Open Source efforts and the next day to learn about our work in Artificial Intelligence.

AI Year In Review - Facebook Code

At Facebook, we think that artificial intelligence that learns in new, more efficient ways – much like humans do – can play an important role in bringing people together. That core belief helps drive our AI strategy, focusing our investments in long-term research related to systems that learn using real-world data, inspiring our engineers to share cutting-edge tools and platforms with the wider AI community, and ultimately demonstrating new ways to use the technology to benefit the world.

In 2018, we made important progress in all these areas. We presented new research highlighting the long-term feasibility and immediate benefits of working with less supervised data, in projects that ranged from improved image recognition to expanding the number of languages that our services can understand and translate. We released a number of platforms and tools to help others transition their AI research into production applications, including updating our popular open source PyTorch deep learning framework with a new, more versatile 1.0 version that includes additional support and entry points for newcomers. And in addition to publishing a wide range of public research papers and related models and data sets, we showed that AI has the potential to improve lives, by assisting with MRI scans, disaster relief efforts and tools to help prevent suicides. Here are some highlights of our efforts in AI throughout the year.

Advancing AI learning through semi-supervised and unsupervised training

One of the founding goals of the Facebook AI Research (FAIR) group is to work toward the development of systems with human-level intelligence. Achieving this milestone will take many more years of research, but we believe that our efforts in 2018 helped demonstrate a path towards AI that's more versatile, by learning from data that's less curated for the purposes of training. While most current AI systems use supervised learning to understand a specific task, the need for large numbers of labeled samples restricts the number of tasks they can learn, and limits the technology's long-term potential. That's why we're exploring multiple approaches to reducing amount of supervision necessary for training, including projects that show the benefits of learning from semi-supervised and even unsupervised data.

For example, in order to increase the number of languages that our systems can potentially translate or understand, we demonstrated a new method for training automatic translation NMT models on unsupervised data, with performance comparable to that of systems trained on supervised data. Our system's accuracy was a substantial improvement over that of previous unsupervised approaches. By reducing the field's reliance on large corpora of labeled training data, it opens the door to translating more languages, including low-resource languages such as Urdu, for which existing data sets are limited in comparison with English.

Windwing - AI Year In Review

Two-dimensional word embeddings in several languages can be aligned via a simple rotation.

Another project worked entirely with low-resource languages, with multiple approaches for circumventing the relative scarcity of labeled training data. This work included using multilingual modeling to leverage the similarities between dialects within a given language group, such as Belarusian and Ukrainian. This was applied research, and the range of techniques the team employed added 24 more languages to our automatic translation services this year. Also, in a collaboration with NYU, we added 14 languages to the existing MultiNLI data set, which is widely used for natural language understanding (NLU) research but was previously English-only. Among the languages in our updated XNLI data set are two low-resource languages (Swahili and Urdu), and our approach contributes to the overall adoption of cross-lingual language understanding, which reduces the need for supervised training data.

We also demonstrated variations on data supervision, such as a new method of combining supervised and unsupervised data — a process called omni-supervised learning — through data distillation. And for our study of hashtag-based image recognition, we developed a creative use of existing, non-traditional labels to generate large training sets of what was essentially self-labeled data, including a set of 3.5 billion public Instagram images. That project proposed that user-supplied hashtags could act as data labels, turning existing images into weakly supervised training examples. Our results not only proved that using billions of data points could be highly effective for image-based tasks, but it also allowed us to break a notable record, beating the previous state-of-the-art image recognition model's accuracy score on the ImageNet benchmark by 1 percent.

Windwing - AI Year In Review

Hashtags can help computer vision systems go beyond general classification terms in order to recognize specific subcategories and additional elements in an image.

Accelerating the transition from AI research to production

AI has become foundational to nearly every product and service at Facebook, and that diversity of applications is reflected in the broad range of AI-based platforms and tools that our engineers are building and enhancing. But a common theme developed over the course of our platform work in 2018: turning the AI techniques we're researching into AI systems we can deploy.

Since our release of PyTorch in 2017, the deep learning framework has been widely adopted by the AI community, and it's currently the second-fastest-growing open source project on GitHub. PyTorch's user-friendly interface and flexible programming environment made it a versatile resource for rapid iteration in AI development. And its open design has ensured that the framework would continue to grow and improve, thanks to codebase contributions and feedback. For 2018, we wanted to give the PyTorch community a more unified set of tools, with a focus on turning their AI experiments into production-ready applications. That meant enough of an overhaul to justify a new version: PyTorch 1.0.

We announced the updated framework at our F8 conference in May, detailing how it integrates the modular, production-oriented capabilities of Caffe2 and the newly expanded ONNX to streamline the entire AI development pipeline, from prototyping systems to deploying them. In October, we released the PyTorch 1.0 developer preview at the first PyTorch Developers Conference, where we presented the framework's rapidly growing partner and platform ecosystem. Google, Microsoft, NVIDIA, Tesla, and many other technology providers discussed their current and planned integration with PyTorch 1.0 at that event, and both fast.ai and Udacity have created courses that use the new version to teach deep learning.

We completed the rollout of PyTorch 1.0 earlier this month, with a full release that includes all the new features we've been working on, such as a hybrid front end for transitioning seamlessly between eager and graph execution modes, revamped distributed training, and a pure C++ front end for high-performance research. We've also released tools and platforms this year that extend PyTorch's core capabilities, including a pair of kernel libraries (QNNPACK and FBGEMM) that make it easier for mobile devices and servers to run the latest AI models, and PyText, a framework that accelerates natural language processing (NLP) development.

PyTorch also provided the foundation for Horizon, the first open source end-to-end platform that uses applied reinforcement learning (RL) to optimize systems in large-scale production environments. Horizon takes RL's much researched but rarely deployed decision-based approach and adapts it for use with applications whose data sets might include billions of records. After deploying the platform internally at Facebook, in use cases such as optimizing streaming video quality and improving M suggestions in Messenger, we open-sourced Horizon, making this unprecedented bridge between RL research and production-based RL available for anyone to download.

Windwing - AI Year In Review

A high-level diagram showing the feedback loop for Horizon. First, we preprocess some data that the existing system has logged. Then, we train a model and analyze the counterfactual policy results in an offline setting. Finally, we deploy the model to a group of people and measure the true policy. The data from the new model feeds back into the next iteration, and most teams deploy a new model daily.

We also released Glow, an open source, community-driven framework that enables hardware acceleration for machine learning (ML). Glow works with a range of different compilers, hardware platforms and deep learning frameworks, including PyTorch, and is now supported by a partner ecosystem that includes Cadence, Esperanto, Intel, Marvell, and Qualcomm Technologies Inc. And to further encourage the use of ML throughout the industry, we released a new ML-optimized server design, called Big Basin v2, as part of the Open Compute Project. We've added the new, modular hardware to our data center fleet, and the specs for Big Basin v2 are available for anyone to download at the OCP Marketplace.

2018 marked the transition of Oculus Research into Facebook Reality Labs, and new explorations of the overlap between AI and AR/VR research. And as part of our ongoing effort to open source as many of our AI-related tools as possible, we've released the data and models for our DeepFocus project, which uses deep learning algorithms to render realistic retinal blur in VR. The first system to accomplish this in real time at the image quality necessary for advanced VR headsets, DeepFocus is a novel application of deep learning in AR/VR work, using an entirely novel network structure that's applicable to our own Half Dome prototype headset, as well as to other classes of promising head-mounted displays.

In the year ahead, we hope to get more feedback about all of these releases. And we'll continue to build and open-source tools that support PyTorch 1.0's mission to help the entire developer community get cutting-edge AI systems out of labs and research papers and into production.

Building AI that benefits everyone

We have a long track record of working on technologies that deliver the benefits of AI very broadly, such as creating systems that generate audio descriptions of photos for the visually impaired. This past year, we continued to deploy AI-based features aimed at benefiting the world, including an expansion of our existing suicide prevention tools that use text classification to identify posts with language expressing suicidal thoughts. This system uses separate text classifiers to analyze the text of posts and comments, and then, if appropriate, send them to our Community Operations team for review. This system leverages our established text-understanding models and cross-lingual capabilities to increase the number of people we can connect with support services.

We also released a method for using AI to quickly and accurately help pinpoint the areas most severely affected by a disaster without having to wait for manually annotated data. This approach, which was developed in a collaboration with CrowdAI, has the potential to get aid and rescuers to victims with greater speed and efficiency. In the future, this technique could also be used to help quantify the damage from large-scale disasters such as forest fires, floods, and earthquakes.

We deployed an ML system called Rosetta that extracts text from more than a billion public images and video frames every day, and uses a text recognition model to understand the context of the text and the image together. Rosetta works on a range of languages, and it helps us understand the content of memes and videos, including for the purposes of automatically identifying policy-violating content.

Windwing - AI Year In Review

Two-step model architecture used for Rosetta's text extraction.

And 2018 marked the beginning of fastMRI, a long-term collaboration with NYU School of Medicine to improve diagnostic imaging technology, starting with accelerating MRI scans by as much as 10x. Current scans take up an hour or more, which makes them infeasible for some patients and conditions, and this joint research project is intended to increase the availability of this potentially life-saving diagnostic tool by using deep learning to generate images from less raw scanner data. The goal of fastMRI isn't to develop a proprietary process but to accelerate the field's understanding of this technique, and our partnership has already produced the largest-ever collection of fully sampled MRI raw data for research purposes (fully anonymized and released by NYU School of Medicine), as well as open source models to help the wider research community get started on this task. We also launched an online leaderboard, where others can post and compare their results.

Windwing - AI Year In Review

(L) Raw MRI data before it's converted to an image. To capture full sets of raw data for a diagnostic study, MRI scans can be very time-consuming. (R) MRI image of the knee reconstructed from fully sampled raw data.

In 2018, we also published blog posts detailing our work in other areas. These included ways to use AI to improve our systems (Getafix, predictive test selection, SapFix, Sapienz, and Spiral) and enhance our products (SLAM and AI in Marketplace), as well as other research efforts (wav2letter++, combining multiple word representations, multilingual embeddings, and audio processing.)

We're excited by the progress we've made in 2018 in our pillars — conducting fundamental research, deploying cutting-edge applications, and sharing new ways to use AI to help others — and we look forward to building on those efforts in the coming year.

 

This series of posts looks back on the engineering work and new technologies we developed in 2018. Read the previous posts about our work in Data Centers and our Open Source efforts.

 

2019-01-21

SanSha YongLe Blue Hole

SanSha YongLe Looong Palace
 
Windwing - SanSha YongLe Blue Hole
This Incredible Geographical Phenomenon Blue Hole Known As A Eye Of The South China Sea Is Situated JinQing Island, Yongle Atoll, XiSha Islands, SanSha City,China.
 
Windwing - SanSha YongLe Blue Hole
Address Coordinates For North Latitude 16° 31'30", East Longitude 111°46'05".
 
Windwing - SanSha YongLe Blue Hole
Blue Hole Deep More Than 300.89 Meters,Is That Deepest Most Known UnderWater SinkHole In The World.
 
Windwing - SanSha YongLe Blue Hole
The Depth Rank Of The Blue Hole In The World Is:SanSha Yongle Blue Hole (300.89m),Dean's Blue Hole, Long Island, Bahamas (202m),Dahab Blue Hole, Egypt (130m),Belize Great Blue Hole,Honduras (123m),Gozo Blue Hole,Malta (60m).
 

2019-01-13

The Growing Popularity of Chinese Social Media Outside China Poses New Risks in the West

Windwing - The Growing Popularity of Chinese Social Media Outside China Poses New Risks in the West
January 11, 2019 1:00 PM
Photo Credit: 
PIIE/William Melancon

 


Chinese technology firms have developed a variety of social media outlets in the last few years. Some are wildly popular at home, while failing to attract foreign users. This indifference outside of China changed in 2018, as a Chinese app similar to Instagram made the first significant foray into the Western market. It even became popular among the US armed forces.

The overseas penetration of Chinese social media poses a substantial security problem, however. Social apps gather a lot of data on users; if this information is sent to China, it can be easily accessed by the government and leveraged, say, to make Beijing's surveillance software better at recognizing Western faces, or at extracting intelligence on Western military activities. US and EU authorities have not paid sufficient attention to these risks.

THE RISE OF TIKTOK

TikTok is an app for sharing short videos. It is mostly used to film oneself lip-synching to recorded music or performing a task—such as a dance—set by a celebrity. Its core demographic consists of teenagers and young adults, who share their efforts in the hope of getting positive comments and possibly achieving fame. It is owned by Chinese unicorn ByteDance, sometimes referred to as the world's most valuable startup and also known for its popular news aggregator Toutiao.

According to market research firm SensorTower, by October 2018 TikTok commanded around 30 percent of monthly downloads of social media platforms on iPhones in the United States, surpassing Facebook's Instagram and Google's YouTube. At the end of the year, it charted at number six[1] in Google's worldwide ranking of apps for the Android mobile operating system, ahead of Netflix and Amazon Shopping.

Windwing - The Growing Popularity of Chinese Social Media Outside China Poses New Risks in the West
PIIE/William Melancon

ByteDance reported 400 million monthly active users of TikTok in China in November. While the company never released figures on international users, they can be estimated at around 200 million. In the United States, the app has been downloaded 80 million times[2] since it was first published globally in 2017. The numbers are not huge in absolute terms, at least by the standards of an industry where leaders enjoy billion-strong installed bases.[3] What is unprecedented for a Chinese app is the cross-border reach.

TikTok features several different subcommunities of users. In the United States, one of the most active consists of young military service members. They upload videos of themselves, often executing fitness exercises, in uniform and with ID tags in plain sight. The filming happens inside military facilities, and sometimes in what looks like a war theater. TikTok, like the vast majority of social apps, collects location data.[4]

CHINA, PRIVACY, AND SECURITY

On both sides of the Atlantic, Facebook is under increasing pressure for failing to protect user security and privacy. The controversy surrounding its handling of that pressure has contributed to a 22 percent fall in share prices in 2018. Can a Chinese company do better in this field, and can foreign users effectively monitor its behavior? At least in the world as it is now, no and no.

There is growing concern over online privacy in China in the wake of widespread identity theft. A stringent new standardfor personal data protection came into force in May 2018. Mounting concern does not provide an adequate guarantee against abuses, however. Uncertainty abounds over how the new standard will be implemented. More important, even if more limits are imposed on what private companies can do with user data, there is no reason to believe that government access will be curtailed.

Chinese authorities have ample leeway to request information from the private sector, on broadly defined public safety and security grounds, which include "stability maintenance"—another name for suppression of dissent. There have been multiplereports of dubiously motivated data access, some even involving Chinese operations of American companies.

TikTok's privacy policy for both the United States and the European Union states that data may be transferred to China. This transfer is legal, as long as users give their consent. ByteDance emphasizes its commitment to privacy and security. The company may well have the best interests of its users at heart, but once the information is beyond the Great Firewall, there is no telling what happens to it.

A DANGEROUS LACK OF ATTENTION

It would be unfair to say that Western authorities took no steps at all to address the implications of data transfers to China. In early 2018, the Committee on Foreign Investment in the United States (CFIUS) vetoed Alibaba's acquisition of money transfer service MoneyGram over data security concerns. Later in the year, the US Foreign Investment Risk Review Modernization Act (FIRRMA) explicitly mandated CFIUS to consider in its reviews "the extent to which a […] transaction is likely to expose […] personally identifiable information […] or other sensitive data of United States citizens to access by a foreign government […] that may exploit that information in a manner that threatens national security." The EU General Data Protection Regulation (GDPR), which came into force in May 2018, conditions international flows of personal data to the presence of certain privacy safeguards.

Notwithstanding these provisions, social apps primarily meant for entertainment are still not attracting the same level of scrutiny as innovations whose connection with national security is more obvious, such as quantum computing. When TikTok attracted public criticism it was because of sketchy child protection standards, not for its Chinese connection. GDPR did not stop its spread in the European Union. Along the same lines, blackmail and influence concerns raised by intelligence experts over the acquisition of gay dating app Grindr by China's Kunlun Group went unheard.

Ignoring the reach of these apps may prove to be a fatal mistake. The pervasiveness of social platforms and the depth of user information they collect make them very powerful tools for both espionage and manipulation of public opinion. TikTok per se may never expand its reach beyond teenagers, but it is only a matter of time before a Chinese app with broader appeal hits the US and EU markets. If widely adopted, such an app could become a Huawei-sized problem in terms of the access to the West potentially afforded to Chinese security services.

As grave as this threat is, it is not even the only one. The social media market shows network effects that precipitate winner-takes-all dynamics. If China keeps banning Western platforms while pushing for the internationalization of its own, it stands a chance of achieving global primacy. This ascendance could lead to an advantage in other fields, as social media are generally a part of broader ecosystems where personal data powers multiple products and services, including the development of artificial intelligence (AI) models. Innocuous-looking apps like TikTok could be among the Trojan horses of the AI race—China should not be allowed to wheel them around the world while killing competition at home.

NOTES

1. Google Play Store ranking of free Android apps. Accessed on December 31, 2018. Charts change daily.

2. No data on active monthly users is available for the United States. Data on the number of downloads overestimate the number of monthly active users, as some may download the app and never open it, use it very infrequently, or download it more than once on different smartphones.

3. In 2018, Facebook boasted 2.3 billion monthly active users spread all over the world except for China, where it is banned. Chinese market leader WeChat had over one billion users.

4. According to the privacy policy valid in the United States, current as of January 2019, "when you use the Platform on a mobile device, we will process information about your location, including location information based on your SIM card, IP address or mobile device location settings, and if activated on your mobile device, by use of a Global Positioning System (GPS). […] If you do not wish to share your location with us, you can switch off GPS functionality on your mobile device."

*******************************************************************************************************************

YES WE SCAN

Windwing - The Growing Popularity of Chinese Social Media Outside China Poses New Risks in the West

The US Government Has Been Defending Its So-Called PRISM Program, Arguing It Has Helped Prevent All Person's Privacy.

Windwing - The Growing Popularity of Chinese Social Media Outside China Poses New Risks in the West

FVEY:YES ONLY WE CAN

Five Eyes Intelligence Alliance:Each Member Of The Coalition Is Responsible For Intelligence Gathering And Analysis Of Specific Parts Of The World. Britain Monitors Europe, Western Russia, The Middle East And Hong Kong. At The Same Time, The United States Is Monitoring The Middle East, Along With China, Russia, Africa And The Caribbean. Australia Is Responsible For South Asia, East Asia And New Zealand's South Pacific And Southeast Asia. Canada Spied On Russia And China And Parts Of Latin America.