~amirouche/dwmt

65e50828187bb2cf827590aefe12e230fe700cf6 — Amirouche 1 year, 11 months ago e0b9fa6
wip
5 files changed, 456 insertions(+), 48 deletions(-)

A data/artrm.txt
M dwmt.scm
M dwmt/aho-corasick.scm
A sisi.scm
M test.scm
A data/artrm.txt => data/artrm.txt +286 -0
@@ 0,0 1,286 @@
retail news update - taming the data deluge
retail news update
by quicksoft services
about us
retail
home
rss
»
recent posts
the latest retail news update! https://t.co/azbqtcxy6z #retailing #retailtech @quicksoft
the latest retail news update! https://t.co/gvmi5vdptp #retail #restaurantsreopen @quicksoft
the latest retail news update! https://t.co/pymoorcfqn thanks to @neilretail @retailcouncil #retailmatters @quicksoft
the latest retail news update! https://t.co/nxexxfaib3 thanks to @retaildesignbg #taxonliquor #excisedepartment @quicksoft
the latest retail news update! https://t.co/cjhbmk17je #retail #podcast @quicksoft
retail news categories
android (6)
apparel / garment (7)
barcode (17)
barcode software (5)
beauty salon (2)
beverages (1)
biometrics (12)
book store (2)
brand management (5)
category management (14)
chain stores (136)
convenience store (31)
crm (19)
department store (8)
discount stores (30)
drug stores (3)
entertainment (24)
erp system (6)
food court (10)
franchise (3)
gift voucher (1)
govt policy & taxation (7)
grocery stores (11)
hardware (5)
health care (10)
internet / mobile (85)
kirana (4)
leisure & lifestyle (34)
luxury stores (24)
mall management (9)
marketing (97)
mini markets (4)
mini markets (1)
mobile payments (7)
mobile stores (4)
modern retail (14)
my free thoughts (4470)
news (678)
news & articles (729)
online shopping (40)
optical stores (1)
packaging (2)
petrol-pumps (1)
pharmacies (2)
pos software (8)
product launch (40)
real estate (4)
restaurant (11)
retail (845)
retail formats (141)
retail management (94)
retail software (12)
retail technology (34)
retail verticals (48)
rfid (32)
seminars & events (10)
shelf management (5)
shop-in-shop (11)
smaller format superstores (10)
software (15)
store layout (6)
supermarket/hypermarket (26)
supply chain mgt (28)
tips (3)
vending machine (2)
vmi (10)
window dressing (2)
archives
july 2020
june 2020
may 2020
april 2020
march 2020
february 2020
january 2020
december 2019
november 2019
october 2019
september 2019
august 2019
july 2019
june 2019
may 2019
april 2019
march 2019
february 2019
january 2019
december 2018
november 2018
october 2018
september 2018
august 2018
july 2018
june 2018
may 2018
april 2018
march 2018
february 2018
january 2018
december 2017
november 2017
october 2017
september 2017
august 2017
july 2017
june 2017
may 2017
april 2017
march 2017
february 2017
january 2017
december 2016
november 2016
october 2016
september 2016
august 2016
july 2016
june 2016
may 2016
april 2016
march 2016
february 2016
january 2016
december 2015
november 2015
october 2015
september 2015
august 2015
july 2015
june 2015
may 2015
april 2015
march 2015
february 2015
january 2015
december 2014
november 2014
october 2014
september 2014
august 2014
july 2014
june 2014
may 2014
april 2014
march 2014
february 2014
january 2014
april 2013
march 2013
september 2012
august 2012
july 2012
may 2012
april 2012
march 2012
february 2012
january 2012
december 2011
november 2011
october 2011
september 2011
july 2011
june 2011
november 2010
july 2010
april 2010
july 2009
june 2009
may 2009
april 2009
march 2009
february 2009
january 2009
december 2008
november 2008
october 2008
september 2008
august 2008
july 2008
june 2008
may 2008
april 2008
march 2008
february 2008
january 2008
december 2007
november 2007
october 2007
september 2007
meta
log in
entries rss
comments rss
wordpress.org
bookmark retail news blog by artrm
del.icio.us
blink
stumble
furl it
digg
newsvine
simpy
spurl
newsgator
yahoo
reddit
technorati
blogcatalog
search engine submission - addme
‹ prev	next ›	
apr26
taming the data deluge
by admin on april 26th, 2012 at 2:34 pm
posted in: brand management, internet / mobile, marketing, modern retail, news & articles, retail formats, retail management, retail technology
marketers and consumers struggle with the volume of data the world now generates. david benady asks how the two sides can jointly control the tide, including the advent of brand ‘data stores’.
data is inundating the economy, overwhelming consumers and businesses with swathes of information that they struggle to comprehend. the overload is set to spiral as social media, mobile and geo-location technologies spew forth yet more reams of data.
with billions of web searches made every month, more than 20,000 new books published weekly and more texts sent daily than there are people on earth, data is increasing exponentially. the number of exabytes (eb – equal to 1bn gb) of information created in 2011 hit 1750, double the 2009 figure, according to idc estimates. there is twice as much data as storage capacity.
this torrent of data makes it hard for marketers to ensure their brand messages are heard above the noise. consumers have become reluctant to open the floodgates to receiving more irrelevant information, and some are wary of providing personal details.
research company tns has analysed the way in which consumers ‘eat’ at this table of information and created five consumer segments based on their readiness to absorb data. it calls the data deluge ‘information obesity’, and looks at the way people create their own ‘eating plans’.
you are what you ‘eat’
‘fast foodies’, it says, consume the easiest, lightest data they can find. ‘supplementers’ devour as much information as they can. ‘carnivores’ consume only meaty chunks – whole books and in-depth research. ‘fussy eaters’ are loath to consume information from any source, while ‘balanced dieters’ never consume too much information; what they do take comes from a variety of sources.
tns marketing sciences director russell bradshaw says these ‘eating plans’ are a good way for marketers to target resistant consumers. ‘by understanding the predominant “eating plans” that exist among their brand franchises, brand managers and chief marketing officers have a tool for maximising the reach, resonance and values of their campaigns,’ he says.
tns analysis suggests that ‘carnivores’ are more likely to shop at marks & spencer, while ‘fussy eaters’ tend to stock up at asda. this gives m&s leeway to bolster its communications, giving customers big, meaty chunks of information they can savour slowly. asda, meanwhile, would do well to deliver information in bursts and offer online nuggets such as tweets to appeal to voucher-hungry customers.
marketers acknowledge that segmenting consumers by their propensity to consume information can be useful, but many see it as an add-on to the already tough task of identifying relevant audiences.
david torres, global manager of chemicals technology at shell research, says that shell intends to embed the tns eating plans into its work, adding that brands need to search the data they have for clear and relevant insights.
meanwhile, stephanie maurel, head of retention at sport england, says the ‘eating plans’ could be useful if blended with other tools. ‘the tns data obesity segmentation makes a lot of sense and rings true anecdotally. it is a great idea to segment by the information consumers are prepared to receive, although perhaps this is an extra step to be added to current tools,’ she adds.
maurel’s role at sport england is to use data to help various sports’ governing bodies to increase participation and attendance, a challenge for smaller sports, such as hockey. one solution is to take data from grassroots sources, such as social media, and integrate it with i n fo r m at i o n from elite sports events.
while small sports may be unsophisticated when it comes to data collection, maurel says some governing bodies are using real-time data to build their popularity.
british cycling, for example, gets feedback from locally organised sky ride mass-cycling events and feeds it through to its board meetings. this, in turn, helps it shape the way in which sky rides are organised.
for many brands, the uk’s data-chain is dominated by retailers. they control the all-important information about sales, which they then sell back to brandowners. nonetheless, retailers, too, are suffering from information overload, according to chris osborne, retail principal at software supplier sap. a recent survey by sap found that more than half of retailers believe they have more information than they can handle. ‘structured’ data – such as till receipts showing items purchased, times of day, quantities and prices – has been around for decades. osborne advocates combining this information with ‘unstructured’ data – such as the random chat of social media – as the next great challenge for brands and retailers.
the prize will be to build a total view of each customer’s likes, behaviour and loyalty, and target offers accordingly. a crucial step is ensuring both types of data are gathered and acted upon in real-time.
osborne believes the development that will enable this is ‘in-memory’ data analytics, where the data is stored in the computer’s memory for quick retrieval, rather than on a conventional database where it is stored on a hard disk, making it harder to access and wasting capacity.
he envisages a two-track economy where success will depend on efficient use of data. ‘the retailers that win out will be the ones that are very careful about how they use data and don’t swamp consumers with irrelevant offers,’ adds osborne. ‘retailers that create competitive advantage are (also) careful about how often they communicate with consumers.’
useful data vs ‘noise’
given the retailers’ iron grip on data, some brands have turned to comparison website mysupermarket.co.uk to gain access to information about their own performance through mini-shops on the site. reckitt benckiser, kellogg, danone and nivea are among those to have created such stores.
james foord, vice-president of business development at mysupermarket.co.uk, says brands are only just beginning to grasp the distinction between ‘data noise’ and what is useful. the site allows brand-owners to create a direct relationship with consumers and thus control their data. brands can analyse the battle between their products and stores’ own-label versions, for example – data retailers rarely release. ‘this is the tip of the iceberg of what is possible. brand stores will open up a whole new level of insight that has real value,’ adds foord.
the battle for data control is about more than simply capturing as much information as possible and keying it into a database. finding ‘smart’ data can save time and money in research and bring significant benefits for brands. the challenge is to find the pieces of information that help a brand locate its best customers and give insights into their motivation for buying a product.
mike dodds, chief executive of integrated agency proximity, recalls a cat-food brand’s crm programme in which customers were questioned about their behaviour. the question that delivered the best data was: ‘do you celebrate your cat’s birthday?’ the responses helped the brand discover the most involved and valuable customers.
a potential barrier to the development of data-driven marketing will be consumers’ attitudes to privacy and control of their personal details. the online giants, such as google, facebook and twitter, have built their businesses on getting users to give up their data in return for ‘free’ services. if the public refuse to play, this could put a spoke in the wheel of the data economy.
chris combemale, executive director at the direct marketing association, says brands have to be upfront about privacy and make their policies simple and readable: ‘if you can’t put the policy on one page and make it clear, you have an issue.’ he also warns brands to avoid being ‘creepy’ online – by serving ads based on details consumers thought were private – which, he argues, can make digital marketing appear intrusive.
modern marketing is essentially a battle for data. however, consumers themselves have the ultimate weapon: to switch off and stop sharing their information.
technology was supposed to make life easier, but, in reality, it has made the world far more complex. the task of creating marketing campaigns that get heard above the din will only get harder still in a society deluged with data.

marketing © brand republic
└ tags: asda, brand, branding, carnivores, crm, customer behaviour, data, data noise, data-chain, exabytes, facebook, fast foodies, geo-location, google, insight, loyalty, marks & spencer, mysupermarket.co.uk, retailers, sap, social media, supplementers, twitter, voucher-hungry customers
comment ¬
cancel reply
you must be logged in to post a comment.
pos solution for retail verticals
retail links
rss news
recent comments
http://www.floodmapdesktop.com/media/louisvuitton+5508.asp on reliance mutual fund sells 5% stake to eton park
gucci outlet online boxing day sale on home
baby gucci boxing day deals on home
mulberry bayswater coral on home
jordan shoes for toddlers black friday sale on home
blogroll
artrm
blog on petrolpump – artrmpp
chain store news
documentation
plugins
portal on petrolpump
quicksoft services
suggest ideas
support forum
themes
wordpress blog
wordpress planet
©2007-2020 retail news update | powered by wordpress with easel | subscribe: rss | back to top ↑

M dwmt.scm => dwmt.scm +29 -14
@@ 1,6 1,7 @@
(import (only (chezscheme) import time))
(import (scheme base))
(import (scheme list))
(import (scheme fixnum))
(import (scheme char))
(import (scheme file))
(import (scheme write))


@@ 141,17 142,17 @@
(define filename "data/CC-MAIN-20200702045758-20200702075758-00039.warc.wet")

(define (generator->string* generator)
  (utf8->string (apply bytevector (generator->list generator))))
  (string->list (string-downcase (utf8->string (apply bytevector (generator->list generator))))))

(define (warc-record-generator generator)
  (call-with-values (lambda () (warc-record-read wet-generator))
    (lambda (headers body)
      (generator-consume body)))
  

  (lambda ()
    (call-with-values (lambda () (warc-record-read wet-generator))
      (lambda (headers body)
        (generator->string* body)))))
        (values headers (generator->string* body))))))

(define wet-generator (file-generator filename))



@@ 159,6 160,10 @@

(define ac (make-aho-corasick))

(define count* 0)

(define total* 0)

(let loop ((keywords '("search"
                       "engine"
                       "algorithm"


@@ 172,17 177,27 @@
    (loop (cdr keywords))))

(time (aho-corasick-compile! ac))
(time (aho-corasick-compile-2! ac))

;; (aho-corasick-debug ac)

(let loop ()
  (guard (ex (else (pk 'finished)))
    (let ((body (warc-record-reader)))
      (when body
        (let ((matches (delete-duplicates (aho-corasick-match ac (string->list body)))))
          (when #t
            #t))))
    (loop)))


(pk (hash-table->alist timing))
(pk 'searching2)


(time
 (let loop ()
   (call-with-values warc-record-reader
     (lambda (h body)
       (let ((matches (delete-duplicates
                       (aho-corasick-match ac body))))
         ;; (unless #f ;; (null? matches)
         ;;   (display total*) (display " ")
         ;;   (display (headers-ref h (string-downcase "WARC-Target-URI")))
         ;;   (display " ") (display matches) (newline)))))
         (when (fx<? 4 (length matches))
           (display (headers-ref h (string-downcase "WARC-Target-URI")))
           (newline)
           (set! count* (fx+ count* 1))))))
   (pk total* count*)
   (set! total* (fx+ total* 1))
   (loop)))

M dwmt/aho-corasick.scm => dwmt/aho-corasick.scm +96 -34
@@ 3,6 3,7 @@
  (export make-aho-corasick
          aho-corasick-add!
          aho-corasick-compile!
          aho-corasick-compile-2!
          aho-corasick-debug
          aho-corasick-match)



@@ 11,10 12,12 @@
          (scheme set)
          (scheme write)
          (scheme time)
          (scheme process-context)          
          (scheme comparator)
          (scheme hash-table)
          (scheme fixnum)
          (dwmt siset))
          (dwmt siset)
          (only (chezscheme) pretty-print eval))

  (define-syntax define-syntax-rule
    (syntax-rules ()


@@ 52,7 55,7 @@
        uid)))

  (define-record-type <state>
    (%make-state uid char success? transitions parent match longest-strict-suffix)
    (%make-state uid char success? transitions parent match longest-strict-suffix next)
    state?
    (uid state-uid)
    (char state-char)


@@ 60,7 63,8 @@
    (transitions state-transitions)
    (parent state-parent)
    (match state-match state-match!)
    (longest-strict-suffix state-longest-strict-suffix state-longest-strict-suffix!))
    (longest-strict-suffix state-longest-strict-suffix state-longest-strict-suffix!)
    (next state-next state-next!))

  (define (make-state char parent)
    (%make-state (uid-generator)


@@ 69,22 73,26 @@
                 (make-transitions)
                 parent
                 #f
                 #f
                 #f))

  (define-record-type <aho-corasick>
    (%make-aho-corasick root size finalized?)
    (%make-aho-corasick root size states finalized?)
    aho-corasick?
    (root aho-corasick-root)
    (size aho-corasick-size aho-corasick-size!)
    (states aho-corasick-states aho-corasick-states!)
    (finalized? aho-corasick-finalized? aho-corasick-finalized?!))

  (define (make-aho-corasick)
    (%make-aho-corasick (make-state #f #f) 0 #f))
    (%make-aho-corasick (make-state #f #f) 1 #f #f))

  (define(aho-corasick-add! aho-corasick word)

    (define (make-new-state char parent)
      (define new (make-state char parent))
      (aho-corasick-size! aho-corasick
                          (fx+ (aho-corasick-size aho-corasick) 1))

      (hash-table-set! (state-transitions parent) char new)



@@ 97,8 105,6 @@
               (state (aho-corasick-root aho-corasick)))
      (if (null? chars)
          (begin
            (aho-corasick-size! aho-corasick
                                (fx+ (aho-corasick-size aho-corasick) 1))
            (state-success?! state #t)
            (state-match! state word))
          (let ((state (hash-table-ref (state-transitions state)


@@ 145,6 151,73 @@
    (display "\n}")
    (newline))


  (define (aho-corasick-compile-2! aho-corasick)
    (define root (aho-corasick-root aho-corasick))

    (define (make-state-comparator)
      (make-comparator state? eq? #f (lambda (x) (number-hash (state-uid x)))))

    (define todo (make-siset (make-state-comparator)
                             ;; XXX: magic number ahead?
                             ;; TODO: document magic number!
                             (max (round (/ (aho-corasick-size aho-corasick) 30)) 1)
                             root))

    (define (add! state)
      (unless (vector-ref (aho-corasick-states aho-corasick) (state-uid state))
        (siset-add! todo state)))

    (define (next-add! state)
      (unless (state-next state)
        (siset-add! todo state)))

    (define (compute-next state)
      `(lambda (char fallback)
         (case char
           ,@(map (lambda (char* state*) `((,char*) ,(state-uid state*)))
                  (hash-table-keys (state-transitions state))
                  (hash-table-values (state-transitions state)))
           (else (fallback char #f)))))

    (aho-corasick-states! aho-corasick
                          (make-vector
                           (aho-corasick-size aho-corasick) #f))

    (let loop ()
      (unless (siset-empty? todo)
        (let ((state (siset-pop! todo)))
          (vector-set! (aho-corasick-states aho-corasick) (state-uid state) state)
          (hash-table-for-each (lambda (_ child) (add! child))
                               (state-transitions state)))
        (loop)))

    (set! todo (make-siset (make-state-comparator)
                           ;; XXX: magic number ahead?
                           ;; TODO: document magic number!
                           (max (round (/ (aho-corasick-size aho-corasick) 30)) 1)))

    (state-next! root 
                 (eval `(lambda (char _)
                          (case char
                            ,@(map (lambda (char* state*)
                                     `((,char*) ,(state-uid state*)))
                                   (hash-table-keys (state-transitions root))
                                   (hash-table-values (state-transitions root)))
                            (else 0)))))
    
    (for-each next-add!
              (hash-table-values (state-transitions root)))
    
    (let loop ()
      (unless (siset-empty? todo)
        (let ((state (siset-pop! todo)))
          (unless (state-next state)
            (state-next! state (eval (compute-next state)))
            (for-each next-add!
                      (hash-table-values (state-transitions state)))))
        (loop))))

  (define (aho-corasick-compile! aho-corasick)
    (define root (aho-corasick-root aho-corasick))



@@ 191,28 264,16 @@
        (siset-add! todo child)
        (search-longest-strict-suffix! child)))

    (define (siset-add!* s a)
      (siset-add! s a))

    (define (siset-contains?* s a)
      (siset-contains? s a))

    (define (siset-pop!* s)
      (siset-pop! s))

    (define (siset-empty?* s)
      (siset-empty? s))
    
    (when (aho-corasick-finalized? aho-corasick)
      (error 'aho-corasick "aho-corasick is already finalized!"))

    (state-longest-strict-suffix! root root)

   
    (let loop ()
      (unless (siset-empty?* todo)
        (let ((state (siset-pop!* todo)))
          (unless (siset-contains?* done (state-uid state))
            (siset-add!* done (state-uid state))
      (unless (siset-empty? todo)
        (let ((state (siset-pop! todo)))
          (unless (siset-contains? done (state-uid state))
            (siset-add! done (state-uid state))
            (hash-table-for-each proc
                                  (state-transitions state))))
        (loop)))


@@ 233,21 294,22 @@

    (define root (aho-corasick-root aho-corasick))

    (define fallback (state-next root))

    (define (dg o)
      (unless (fx=? o 0)
        (pk o)))
    
    (unless (aho-corasick-finalized? aho-corasick)
      (error 'aho-corasick "aho-corasick is not finalized"))

    
    (let loop ((state root)
               (chars word)
               (out '()))
      (if (null? chars)
          out
          (let ((state* (hash-table-ref (state-transitions state)
                                        (car chars)
                                        (lambda ()
                                          (hash-table-ref/default
                                           (state-transitions root)
                                           (car chars)
                                           root)))))
            (loop state*
          (let* ((uid ((state-next state) (car chars) fallback))
                 (next (vector-ref (aho-corasick-states aho-corasick) uid)))
            (loop next
                  (cdr chars)
                  (append (lookup-matches state*) out)))))))
                  (append (lookup-matches next) out)))))))

A sisi.scm => sisi.scm +44 -0
@@ 0,0 1,44 @@
(import (only (chezscheme) import))
(import (scheme base))
(import (scheme list))
(import (scheme time))
(import (scheme char))
(import (scheme file))
(import (scheme write))
(import (scheme fixnum))
(import (dwmt aho-corasick))
(import (only (chezscheme) time))

(define x (make-aho-corasick))


(define (read-file filename)
  (call-with-input-file filename
    (lambda (port)
      (let loop ((char (read-char port))
                 (out '()))
        (if (eof-object? char)
            (reverse out)
            (loop (read-char port) (cons (char-downcase char) out)))))))

(define text (read-file "data/artrm.txt"))

(let loop ((keywords '("search"
                       "engine"
                       "algorithm"
                       "engineer"
                       "software"
                       "library"
                       "program"
                       "technology")))
  (unless (null? keywords)
    (aho-corasick-add! x (car keywords))
    (loop (cdr keywords))))

(time (aho-corasick-compile! x))
(time (aho-corasick-compile-2! x))

;; (display (aho-corasick-debug x)) (newline)

(pk 'match)
(pk (time (delete-duplicates (aho-corasick-match x text))))

M test.scm => test.scm +1 -0
@@ 28,6 28,7 @@


(time (aho-corasick-compile! x))
(time (aho-corasick-compile-2! x))

;; (display (aho-corasick-debug x)) (newline)