Networks Jian Huang “yellow teacher“？ Technology giants have adopted new methods，

The author:(作者)aaa
published in(发表于) 2017/1/11 11:36:02
Networks Jian Huang “yellow teacher“？ Technology giants have adopted new methods，

English

中文

Networks Jian Huang "yellow teacher"? Technology giants have adopted new methods-network yellow, green dam-IT information

Artificial yellow increasingly fierce market competition, currently Ali, General technology, green NET, Tencent Vientiane maps team has occupied a large market share, in this environment, many companies attempt to offer more comprehensive services from the Red Sea in a piece.

Live identify general areas in which Wong?

Typically, live yellow through the video screenshot, image recognition, speech technology, barrages control, keyword extraction capabilities, such as intelligent recognition of pornography. In front of the image recognition services provided to customers, will be invited to make live streaming platform user experience testing, collect some live platform-specific data, such as different live background, ambient light intensities, topics, content, customized training models, different live streaming platform will receive a customized proprietary image recognition service.

Which video live content of review identification can from following several steps: recognition image in the whether exists people objects levy, statistics number; recognition image in the characters of gender, and age interval; recognition characters of colour, and body organ exposed degree; recognition characters of body profile, analysis action behavior; except image recognition zhiwai, also can from audio information in the extraction key features, judge whether exists sensitive information; real-time analysis play scene text content, judge current video whether exists violations behavior, dynamic regulation image collection frequency.

In terms of image recognition, where each minute video capture key frame frequency can be set by the client, from 1 second to 10 seconds. For example, you can capture a key frame is used to identify default 5 seconds or dynamic adjustment of sampling frequency in case of suspected alarm, speed up to one per second.

You mentioned about the audio key feature extraction, this can tell you?

Audio analysis in the following areas:

Through voice recognition technology, determines whether the current broadcast anchors as a registered host himself, to recognize the identity anchors.

Speech keyword search the contents of the anchor, existence of a taboo, sensitive words.

Identify the specific continuous speech data, the existence of bad information.

Counterpart of advertising broadcast frequency statistics, analysis of advertising effect.

But video and audio dual-channel detection programme decisions by users, show live image detection can usually meet most of the demands, may be more suitable for voice audio detection focused live streaming platform. Combination will greatly improve accuracy, reduce the rate of false positives, but the cost will increase, so the user can select according to business requirements.

Current accuracy, false alarm rate, recall is probably how many? whether it will conduct a manual review?

Currently live streaming platform yellow image detection accuracy rate of up to 99% per cent involved, false positive rate of less than 1%, and require manual review of customer ratio does not exceed 3%. Usually does not provide a manual review of the service, but the suspected image is marked for manual review and alert the user. After manual review of data will be collected by the iterative training so that you can continuously improve recognition accuracy.

Live real-time, machine image recognition processing speed is particularly high, especially high for the computing power of the machine will? what are treated?

Network video live real-time sex strong, on service end image recognition processing of speed requirements special high, except on bandwidth has high of requirements outside, also need recognition server has powerful of GPU operation capacity, especially application depth machine learning algorithm for model training stage, powerful of GPU cluster server is indispensable of, and based on full links layer of characteristics to except on training image size of limit, fast upgrade algorithm processing speed. Also in video pictures can also be adjusted using dynamic sampling frequency approach, usually a few seconds a frame, sensitive information appears to speed up acquisition rate, yellow-related information can be more timely recognition and raised the alarm.

Model of how big is the amount of data needed to? what would shadow production identification accuracy?

To limit, for example, the underlying data set has tens of millions of photos, and every day will be an additional 20,000 samples of various positive and negative picture, iterative training, constantly fine-tuning to optimize recognition accuracy. Make a basic training a week, once every 1-2 days incremental iterative training model.

Impact identification accuracy level, is mainly the lack of data, sample scenario coverage is not comprehensive lead training models are false positives, false negatives or identify errors, with the maturing of advanced machine learning algorithms, data sources, diversity, professionalism has become the model construction of priority.

In addition, the anchors of deliberate interference detection means, such as shielding sensitive parts, picture-in-picture, and so on, will also affect to some extent the machine recognition of judgments.

Machines can automatically process: screen, delete, ban, etc?

Yellow pictures involved testing service deployments in the cloud is not in itself a network path can be exposed to users of broadcast management system, so cannot be blocked automatically, delete, suspend broadcast activity. However, if the user selects a private cloud deployment and authorized the recognition server can access the broadcast management system, then the yellow Studio involved, stop operation can be achieved.

Intelligent artificial yellow, Huang Xiang cost how much?

To live a month to 100,000 hours small live platform, for example, if using the traditional audit techniques, 100 per month the cost of the content management team at around 800,000. If the content monitoring by means of artificial intelligence, human input can be reduced to around 10 people, synthetic inputs but between 100,000 to 200,000, will significantly reduce labor costs and management costs. There are also saved as a result of monitoring equipment, Office space, and so on.

Erotic and non-erotic boundaries how to grasp and handle?

First of all, at the time of the establishment of such a model, there will be a manual annotation data of images, there are some subjective judgment errors, but also in the context of public understanding. Outside of pornography and normal, results, there is a suspected or known as sexy category, these are matched based on approximate values after the machine.

网络鉴黄靠“鉴黄师”？科技巨头们已采用新方法 - 网络鉴黄,绿坝 - IT资讯

人工智能鉴黄市场竞争愈发激烈，目前图普科技、阿里绿网、腾讯万象优图等团队已占据大量市场份额，在此环境下，不少公司试图通过提供更全面的服务从这片红海中分一杯羹。

直播鉴黄一般从哪些方面进行鉴定？

通常情况下，直播鉴黄通过视频截图、图像识别、语音技审、弹幕监控、关键字抽取等能力智能识别色情内容。在向客户正式提供图像识别服务前，会先邀请直播平台用户进行体验测试，收集一些直播平台专属特征数据，比如不同的直播背景、环境光线强度、话题内容等，进行定制化的训练模型，不同的直播平台将获得定制化的专属图像识别服务。

其中视频直播内容的审查鉴定可以从以下几个步骤：识别图像中是否存在人物体征，统计人数；识别图像中人物的性别、年龄区间；识别人物的肤色、肢体器官暴露程度；识别人物的肢体轮廓，分析动作行为；除了图像识别之外，还可以从音频信息中提取关键特征，判断是否存在敏感信息；实时分析弹幕文本内容，判断当前视频是否存在违规行为，动态调节图像采集频率。

在图像识别方面，其中每分钟视频采集关键帧的频率可以由客户设定，从1秒到几十秒均可。例如可以默认5秒采集一次关键帧用于识别，也可以在出现疑似告警时动态调节采集频率，加快至每秒一张。

您刚提到音频关键特征提取，这个可以深入讲讲吗?

音频分析主要有以下几个方面：

通过声纹识别技术，判断当前直播间的主播是否为注册主播本人，对主播身份进行识别。

对主播的语音内容进行关键词检索，是否存在禁语、敏感词。

对特定的连续语音数据段进行识别，是否存在不良信息。

对口播广告的播出频次进行统计，分析广告投放效果。

不过视频、音频双通道检测的方案由用户来决策，秀场直播通常用图像检测就可以满足绝大部分需求，音频检测可能更适用于语音内容为主的直播平台。两者结合起来会大大提高识别准确率、降低误报率，但成本也会相应提高，所以用户可以根据业务需求进行选择。

目前的准确率、误报率、召回率大概是多少?是否会进行人工复审?

目前直播平台涉黄图像检测的准确率高达99%以上，误报率低于1%，需要客户进行人工复核的比例不超过3%。通常情况下不提供人工复审的服务，但是会对疑似的图像进行标注并提醒用户进行人工复核。人工复核后的数据会被收集起来进行迭代训练，这样可以不断提升识别的准确率。

直播的实时性、对于机器的图片识别处理速度要求特别高，对于机器的计算能力会不会特别高?采用什么样的方式进行处理?

网络视频直播实时性强，对服务端图像识别处理的速度要求特别高，除了对带宽有较高的要求外，还需要识别服务器拥有强大的GPU运算能力，尤其是应用深度机器学习算法进行模型训练阶段，强大的GPU集群服务器是不可或缺的，并基于全链接层的特性去除了对训练图像大小的限制，快速提升算法处理速度。此外在采集视频图片时也可以采用动态调节采集频率的办法，通常情况下几秒一帧，出现敏感信息后加快采集频率，可以更及时的识别涉黄信息并提出告警。

模型训练所需的数据的量有多大?一般什么原因会影晌鉴定准确率？

以极限元为例，基础数据集有几千万张图片，此外每天还会追加两万张各类正、负样本图片，用于迭代训练，不断微调优化识别准确率。每周会进行一次基础模型训练，每1-2天会进行一次增量模型迭代训练。

至于鉴定准确率影响层面，主要还是数据量的匮乏，样本对应用场景的覆盖不全面导致训练出的模型存在误报、漏报或者识别错误，随着深度机器学习算法的日趋成熟，数据来源的多样性、专业性反而成为模型构造的重中之重。

此外，主播刻意进行一些干扰检测的手段，比如遮挡敏感部位、画中画等等，也会一定程度上影响到机器的识别判断。

机器能不能自动处理:屏蔽、删除、禁播等？

涉黄图片检测服务部署在云端，本身没有网络路径可以接触到用户的直播间管理系统，因此无法自动屏蔽、删除、暂停直播间的活动。但是如果用户选择私有云的部署方式，并授权识别服务器可以访问直播间管理系统，那么对涉黄直播间的删、停等操作是可以实现的。

智能鉴黄相对于人工鉴黄、成本下降多少?

以一家月直播10万小时的中小直播平台为例，如果采用传统的内容审核技术，100人的内容管理团队每月所花费的成本在80万上下。如果借助人工智能进行内容监控，人力投入可以削减到10人左右，综合投入不过10万到20万之间，将大大降低人力成本和管理费用。此外还有因此而节省的监视设备费、办公场地费等等。

色情和非色情的界限怎么把握、拿捏?

首先，在建立这样一个分类模型时，会有人工对图像大数据进行标注，存在一定主观判断误差，但也在大众理解的范围内。识别结果除了色情和正常外，还存在一个疑似或者称之为性感的类别，这些都是根据机器识别后的近似值进行匹配。