AdRelated Problem Formulation

Two Approaches to tackle the problem


IE + IR

Similarity Problem

IE + IR

Similarity Problem

How Search Engines Work ? 

How Search Engines Work ? 

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

How Search Engines Work ? 

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

How Search Engines Work ? 

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

How Search Engines Work ? 

Crawl the web using spiders

Index the crawled pages

Rank the results

Post results enrichment

How Search Engines Work ? 

Crawl the web using spiders

Index the crawled pages

Rank the results

Post results enrichment

How Search Engines Work ? 

Crawl the web using spiders

Index the crawled pages

Rank the results

Post results enrichment

How Search Engines Work ? 

Crawl the web using spiders

Index the crawled pages

Rank the results

Post results enrichment

Crawl the web using spiders

- Scale

- Scheduling

- Cache invalidation

Index the crawled pages

speed of retrieval

Inverted Index

Inverted Index

Rank the results

Boolean ranking

TF-IDF

PageRank

Boolean Retrival 

TF IDF

Page Rank

Post results enrichment

Did you mean

Snippets

Did you mean

How Search Engines rank results?

Unlocking the black box

How Search Engines rank results?

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

How Search Engines rank results?

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

How Search Engines rank results?

The Metrics

The Basic Approach

The Signals

What is next?

The Metrics

How many selected items are relevant?

The Metrics

How many relevant items are selected?

The Metrics

Examples

Precision is 1 and bad recall

Examples

Precision is 1 and bad recall

- A System that returns only one result that is exact match

Examples

Recall is 1 and bad precision

Examples

Recall is 1 and bad precision

- A system that returns all books!

Challenges with Precision and Recall

The result must be relevant or non-relevant!

Challenges with Precision and Recall

How to know all the relevant! 

Precisions and Recall Tuning

Products Search

Precisions and Recall Tuning

Knowledge Discovery 

The Metrics (Assumptions)

Highly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks)

The Metrics

Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than non-relevant documents.

The Metrics

Normalized Discounted Cumulative Gain (NDCG)

Normalized Discounted Cumulative Gain (NDCG)

Σ

Normalized Discounted Cumulative Gain (NDCG)

Σ(gain(i))

Normalized Discounted Cumulative Gain (NDCG)

Σ(gain(i) = relevancy grade (result i))

Normalized Discounted Cumulative Gain (NDCG)

the gain of an earlier result (smaller i) is bigger than the gain of a later result. 

Normalized Discounted Cumulative Gain (NDCG)

to deal with different lengths of results

Basic Formula

TF IDF

Term Frequency

The Importance (raw freq) of the term inside the document

Inverse Document Frequency

The Importance  of the term inside the Corpus (The dictionary)




Inverse Document Frequency

The Importance  of the term inside the Corpus (The dictionary)


How this term differentiate the document from the other documents

The Signals

Site-Level Signals

Site-Level Signals

Authority/Trust

Classifications

Internal link ratios

Localization

Domain history

The Signals

Page Level Signals

Page Level Signals

Meta data

Classifications (and Localization)

Entities

Authority/trust (external links)

The Signals


Semantic signals

Linguistic indicators

Prominence factors (bold, headings, italics, lists, etc.)

The Signals


Semantic signals

Linguistic indicators

Prominence factors (bold, headings, italics, lists, etc.)

The promise

Introduction to Knowledge Graphs

How Google met your AdRelated!

Flashback

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Flashback

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Keyword Search

The promise

Introduction to Knowledge Graphs


Introduction to Knowledge Graphs


Graph

Graph

Directed vs Undirected

Links/Edges/Relations

Nodes/Vertices/Entities

Weights

Attributes ??

simplest unit in the graph is 

Meet the Triplet

How to create Knowledge Graph

How to create Knowledge Graph

Expert Developed




How to create Knowledge Graph

Expert Developed

Community Developed



How to create Knowledge Graph

Expert Developed

Community Developed

Extracting from a Corpora


How to create Knowledge Graph

Expert Developed

Community Developed

Extracting from a Corpora

Extracting from Free Text (Web)

Infer new relations

Infer new relations

The promise

Introduction to Knowledge Graphs

How Google met your AdRelated!

Flashback

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Flashback

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Keyword Search

The promise

Query Understanding

Query Understanding

Language Identification

Query Understanding

Char Filters

Char Filters

- papa vs papá

- أحمد vs احمد

Query Understanding

Tokenization

Tokenization

Hyphen

Twenty-five vs California-based



Tokenization

Hyphen

Twenty-five vs California-based

 Non-English

- Fruchtsalat vs Frucht salat

Query Understanding

Spelling Correction

Query Understanding

Stemming and Lemmatization

Query Understanding

Query Rewriting

Query Rewriting

- Query Expansion




Query Rewriting

- Query Expansion

- Query Relaxation



Query Rewriting

- Query Expansion

- Query Relaxation

- Query Segmentation


Query Segmentation

white dress shoes > white "dress shoes"

Query Rewriting

- Query Expansion

- Query Relaxation

- Query Segmentation

- Query Scoping

Query Scoping

white dress shoes > white "dress shoes"


Query Scoping

white dress shoes > white "dress shoes" 

> white dress "category:shoes"

Query ReWriting

Named Entity Recognition (NER) 

Semantic Parsing

Demos

Create a presentation like this one
Share it on social medias
Share it on your own
Share it on social medias
Share it on your own

How to export your presentation

Please use Google Chrome to obtain the best export results.


How to export your presentation

New presentation

by ih99f

25 views

Public - 6/10/16, 12:30 PM