Digi Squeeb

Wednesday, 16 November 2011

The Web on the Move, Part 2

Taking what we had learned from Monday's lesson, our computer lab exercise asked us to design a mobile device app that fulfils our needs as City students. There was a lot of discussion and good ideas flying around; a lot of looking at our own smartphones and seeing how they presented their apps and such. There was also much checking of City's Moodle system and its many flaws and drawbacks.

I'd initially been a bit worried about this exercise, not knowing exactly what I could bring to it; but the conversation and ideas were so stimulating, I soon became very excited about designing my own app. As soon as I got home, I started making a mockup of a City Moodle app on Photoshop.

A City Moodle app mockup. Does not discriminate between smartphones (but only because I don't actually have an iPhone).

Mail - Instant access to your City email account.
Discussion - Feed of the latest posts on the Moodle discussion boards with the option of instant posting/replying.
Compass - GPS or wi-fi based showing your campus location. It allows you to key in your classes and get room directions.
Bluetooth - Allows you to instantly download lecture notes and slides so that you can read them on the move.
Scanner - Liam's brilliant idea! See a book in the bookshop, and scan the barcode to check its availability at City's library.
Library - Access to your library account and the library catalogue; allows you to reserve and renew books whilst on the move.

On top of this there is a drop-down menu which allows you to access your course details whenever you want. The drop-down menu unfolds over the length of the screen, and folds when you touch outside the body of the menu. I'm sure there's a lot more that could be integrated into this. I think it would be a great idea to have such an app for City students; the exercise made me realise just how useful it could really be.

The Web on the Move, Part 1

So I guess this lesson succeeded in that it made me upgrade my ancient (i.e. 2 year old) phone and get a new touch-screen smartphone! So unfair that I have to wait hours for it to charge before I use it! T_T

Frankly, I was surprised I managed to hold out for so long. Smartphones are now pretty ubiquitous, and being a bit of a gadget fan, the temptation was certainly there to capitulate to the web as mobile platform. So why are smartphones so popular?

Having a computer in your pocket whilst on the move is certainly a big pro. The main pro of mobile devices is that they are context aware. The know where they are, and can provide lots of information about their location, e.g. in-built GPS system. The cons are obviously the limited screen and keyboard size (not so much of a problem for someone with tiny hands like me, and I like bite-sized phones, but hey...). Other cons like limited connectivity and battery life are Moore's Law problems, and will probably get solved if you sit around for another couple of years (which is what I ended up doing before I got my new phone, lol!).

The main problem with mobile devices is that they don't really fulfil user's information needs. It is the current technology that defines our needs. For example, with my old phone, I knew there was no point in trying to connect to City's Moodle system, because my phone simply could not handle it. It would have been mighty useful to have access to Moodle like I would have had on a Smartphone, but it was impossible and so I would just wait until I had access to my laptop.

Looking at it from another angle, technology can also provide users with things they never knew they wanted. Just now I saw an advert for the new iPhone 4S, with voice control technology. Want to know if you need an umbrella when you go out tonight? Just ask your phone. It'll give you an immediate answer. But who would have thought you'd even want this before it became reality?

My laptop has in-built voice recognition technology. Wow, I thought! I can really do with this. After setting it up and using it for a day to get it to learn my speech patterns, guess what - I never used it again. Maybe I didn't need it so much after all.

Wednesday, 9 November 2011

Personalisation and increased functionality on Web 2.0

A couple of weeks back we discussed the idea of the internet as a platform being one of the defining features of Web 2.0. This week we learned more about how the ‘internet as platform’ functions – namely through the use of web services and API’s.

Web services allow the personalisation of information that is passed through the web to the user. Essentially it is ‘middleware’, re-processing machine-readable data from the server in order to present it in a form that is uniquely tailored to the client. As the client, we neither have to see nor understand how that data is re-processed in order to gain access to it. It is delivered to us in a simple and convenient form by the web service without needing any significant programming or computing knowledge on our part.

For example, I have 2 laptops – one for home use and one for Uni work. How can I have access to my web browsing data from just one machine when I am away from the other? Google allows me to sync my browser data via my Google account; so when I use the Chrome browser from anywhere, I still have access to my bookmarks, passwords, usernames and other saved data.

XML is the ‘language’ of web services, similar to HTML yet much more flexible. Learning to write HTML back in the late 90’s, I discovered that it was basically rigid – HTML ‘tags’ or commands are prescribed and can only be read by certain programs. XML is similar in that it uses such tags and commands, but these can be created by the user and provide unseen, machine-readable metadata. It is also readable by a wide number of programs. For example, object data for the game the Sims is encoded in XML, and can be manually manipulated in order to create custom content.

Nowadays, most programming is done through the use of API’s (Application Programming Interfaces), which essentially hides complex internal structures from the user of the API, thus making it more user-friendly. API’s can be used by other programmers and developers to create their own widgets, gadgets, apps and countless other useful little gizmos which can personalise your data. Hence we get Twitter feed apps for our iPhones and Google Map sat nav on our tablets.

In the ‘real world’, API’s have really brought out the functionality of my website. I am able to embed a slideshow of all my creations onto my website, so users can immediately see what I have to offer. I also have a Revolver Maps gadget embedded into my page, so that I can keep track of where my visitors come from, and who is currently browsing my site. I also have a link to an RSS feed, and a Google +1 gadget added so that people can recommend and keep up to date with my site. The power of the API and the web service lies in the fact that they are so accessible, so flexible, and able to tailor information to both you and your audience. Ten years ago, setting up a sophisticated visitor counter to your website would’ve probably required some technical know-how. Nowadays, thanks to API’s, it’s all there at the click of a button.

Sunday, 30 October 2011

DITA Assignment, Part One - Web 1.0 - Data Retrieval vs Information Retrieval

This short essay seeks to answer a seemingly innocuous question: what are the differences between data retrieval (DR) and information retrieval (IR)? To the layman, the difference between the two concepts may seem hazy. Yet both are inherently different.

First, however, it is important to be clear on what data and information actually are. Data may be described as “a term for quantitative or numerically encoded information”, whilst information is “data that has been processed into a meaningful form” (Feather & Sturges, 2003).

Data is usually stored in a database, a “systematically ordered collection of information”(Feather & Sturges, 2003). Retrieving data from the database requires the use of a query language, such as SQL. This is a “structured way for retrieving search requests”, using artificial language commands (Feather & Sturges, 2003).

According to Baeza-Yates and Ribiero-Neto (1999) a “data retrieval language aims at retrieving all objects which satisfy clearly defined conditions such as those in a regular expression or in a relational algebra expression. Thus, for a data retrieval system, a single erroneous object among a thousand retrieved objects means total failure.”

To clarify, database queries are structured as such:-

select ColumnA from TableB where CriteriaC_is_met

Any error in this structure - however minor - will result in the failure of the search, i.e. no matches. (For more examples of SQL search queries, see here.)

Information, however, is largely unstructured, existing in a number of formats and indexed in different ways. Consequently information retrieval is based upon user information needs, and these are naturally subjective (Rosenfeld & Morville, 2007). This means two things: –

search queries will be based on those user needs and;
search results will either be relevant or not.

To take point one, information queries may be divided into different types:- navigational (searching for a website); transactional (searching for a service); or informational (searching for information on a certain subject) (MacFarlane, 2011). The user may know exactly what they want to find; then again, they may not. This ‘anomalous state of knowledge’ (ASK) informs the type of search query the user makes. Where IR departs from DR is that IR search queries may take on different forms, for example, natural language and Boolean queries. (For a table outlining the differences between IR and DR, refer to Appendix A).

From personal study using various search queries on two different search engines (Google and Bing), natural language queries generally return relevant results, although using quotation marks and deleting stop words will narrow the search and increase precision. Boolean operators also returned different results, as both search engines interpreted search queries in different ways (see Appendix B for the results of the above study).

Depending on the type of information required e.g. transactional, informational etc., it is likely that search queries will return different results. For example, Anne is doing a project on the Captain Swing Riots of the 19^th century. She wants as much information as possible, and decides to use two different search engines and compare their results. In both Google and Bing she types in the natural language query ‘Who is Captain Swing?’ (minus quotation marks). Google’s results were all relevant. Bing’s top rated result was also relevant, but all the following results were irrelevant (returning information on a band called ‘Captain Swing’). Curious, Anne then deletes the stop words from her previous query, and types “Captain Swing” into both search engines (quotation marks included). This time four of Google’s top five results were relevant; one of Bing’s top five results was relevant. Therefore, of the two search engines, Google had satisfied her user needs more effectively.

Later, while using the natural language query ‘what are Jerusalem artichokes and how do I cook them?’, Anne discovers that many of the results are about growing artichokes. This time she uses another strategy to narrow down her search – Boolean operators. She types in ‘Jerusalem artichokes AND cook NOT grow’. This is effective in the Bing search engine, but not in the Google search engine. She later discovers that Google accepts other forms of Boolean operators, and that by typing ‘Jerusalem artichokes + cook – grow’, she will again find more relevant results.

As can be seen, natural language queries deal in a certain amount of ambiguity, and may not necessarily provide appropriate results. With data retrieval, a search provides either a match or no match. With information retrieval, a search must fulfil the user’s need. In short, it must be relevant.

There are two ways of judging relevance – binary judgement (where something is relevant or it is not), or graded judgement (when some results are more relevant than others). User satisfaction in IR may be evaluated by calculating the recall or the precision of the search results, where:-

It is important to note that there is an inverse relationship between recall and precision - where one increases, the other must decrease.

There are drawbacks to different methods of IR. Boolean operators are not intuitive, but rigid; a search on teaching French in schools may equally return results on teaching in French schools (Feather & Sturges, 2003). Likewise, natural language queries may result in low-precision results due to irrelevant documents that contain high levels of keywords “by chance or out of context” (Lee, Seo, Jeon and Rim, 2011). Deleting stop words and adding quotation marks decreases recall. There are many ways in which user needs may not be satisfied, and there is no ‘right way’ of improving search results. This is simply because it is the user’s needs that determine the type of search query used.

It is therefore important that the information to be searched is appropriately managed. For example, is it in the correct format? Should it be searched through keywords or keyphrases? What about conflating words, including synonyms, and ignoring stop words? These methods are all vital in making information more accessible to the user (MacFarlane, Butterworth and Krause, 2011).

To conclude, data and information retrieval could not be more different. Data has the advantage of not being subject-based. A database is built with its own well-defined semantics. It is the opposite for IR. There are no well-defined semantics, and so the IR system has to interpret the semantic content of the documents and bring together what it deems relevant. Reaching this goal appears to be a two-way street. The information in the document itself must be well-managed by the creator; the user must also use an appropriate IR method according to his or her own information needs. Likewise, the evaluation of search results will be determined subjectively by the user, according to those needs.

Blog URL:- http://digisqueeb.blogspot.com

References

Baeza-Yates, R. and Ribiero-Neto, B. (1999). Modern Information Retrieval. [online] Boston, Massachussetts: Addison Wesley Longman Inc. Available at: http://people.ischool.berkeley.edu/~hearst/irbook/index.html [Accessed: 22 October 2011].

Feather, J. and Sturges, R. P. eds. (2003). International Encyclopedia of Information and Library Science. 2nd ed. London: Routledge.

Karlgren, J. (2004). Information retrieval: introduction. [online] Available at: http://www.sics.se/~jussi/Undervisning/IRI_vt04/Overview.html [Accessed: 23 October 2011].

Lee, J., Seo, J., Jeon, J. and Rim, H. (2011). ‘Sentence-based relevance flow analysis for high accuracy retrieval.’ Journal of the American Society for Information Science & Technology [e-journal] 62 (9), pp. 1666-1675. Available through: JSTOR [Accessed: 25 October 2011].

MacFarlane, A. (2011). Lecture 04: Information Retrieval, INM348 Digital Information Technologies and Architectures. City University London [unpublished].

MacFarlane, A., Butterworth, R. and Krause, A. (2011) Lecture 03: Structuring and querying information stored in databases. INM348 Digital Information Technologies and Architectures. City University London [unpublished].

Rosenfeld, L. and Morville, P. (2007). Information Architecture for the World Wide Web. 3rd ed. Cambridge: O'Reilly.

Appendix A

The following table by The Swedish Institute of Computer Science (SICS) clearly summarises the difference between data and information retrieval. [Accessed: 23 October 2011]

Information vs Data Retrieval

	DR	IR
Matching	Exact match	Partial match
Model	Deterministic	Probabilistic
Query language	Artificial	Natural (... well)
Query specification	Complete	Incomplete
Items wanted	Matching	Relevant

Appendix B

The results of an exercise calculating the precision of various search results from Google and Bing. The original spreadsheet may be viewed at http://www.student.city.ac.uk/~abkb824/Exercises.xlsx

Wednesday, 26 October 2011

Web 2.0 - The internet as platform

This week's lecture introduced the idea of Web 2.0.

Web 2.0 is the idea of the internet as a platform, rather than a computer as a platform. It is the web that can be written to as well as read. Back in the day, if you wanted to put something on the web, it involved learning HTML and writing up a web-page manually. Waaaaaay back in 1997, I had to go to my college library, get out a book on HTML, learn the basics, sit down in Notepad and type out my website. Yawn. And more often than not, it looked pretty pants.

Now fast-forward to 2011. Why the hell would you want to bother with typing out pages of HTML just to make a website. Google Sites has pre-made templates for you already. Dreamweaver can write all the code for you. Your thoughts can be put up on the web in a matter of minutes if you have a blog. Seconds if you're on Facebook or Twitter. And whatever type of computer platform you have, everything all looks and works virtually the same.

This is the crux of Web 2.0. Everyone can publish on it without having any technical skill at all. Web 2.0 effectively harnesses network effects that get better the more people use them. Social networks are at the heart of Web 2.0, and Web 2.0 can be said to be so successful because, through sites like Facebook, they mirror the social networks and interactions in our everyday lives.

Interaction is basically at the heart of Web 2.0. And this interaction isn't only of the purely social kind. We all have the ability to become pseudo-experts by contributing to Wikipedia. We can make Amazon better by leaving reviews. We can create our own tags on Flickr and Delicious - indeed, in the ocean of un-indexed junk out there, folksonomies are becoming one of the most effective ways of organising web information.

The World Wide Web (version 2.0).

But all this inevitably comes at a price. Some of the issues we discussed are mapped out here:-

Does having the ability to log every mundane event in your everyday life with ease create propensity to narcissism?
Is Wikipedia as reliable as, say, the Encyclopaedia Britannica, and does it promote a culture of amateurism?
Does the internet, as a platform for freedom of speech, promote a 'safe' environment to for people to act in an offensive and derogatory manner?
Would it be fair to say that important events somehow become trivialised by the hype of the web?
What does the privacy settings change on Facebook in February 2010 have to say about the public nature of the data we may unwittingly put on the web?
How do we deal with the ephemeral nature of the web?

What do you think?

Wednesday, 19 October 2011

Information Retrieval versus Data Retrieval

This week's exercises finally put last week's mind-mushing exercises into perspective.

The purpose was to draw a line between data retrieval (what we did last week) and information retrieval (what we did this week). Data retrieval is the kind of thing we do when we query a database. With information retrieval, the results are 'subjectively' relevant - I have a huge heap of documents, and I need to decide what are relevant to my needs or not.

I am a prolific user of Google. I use it at least a dozen times a day, if not more. It is easy to think from the perspective of a user. There is something I want. I type it in the search engine, and I hope I find what I'm looking for. Sometimes, a lot of frustration ensues.

But why is this? Why is it sometimes so difficult to find what you're looking for?

More often than not, the reason is because people don't index their 'documents' effectively. They lose sight of what it's like to be a user themselves.

As the owner of a website (or two), this is pretty pertinent for me. It is all too easy to speed through the creation of a website summary, keywords or tags. Usually you just want to get the creation bit out of the way and get going. But do those keywords fulfil user's needs? For example, you run a football website. 'The best football site in the world', you may profess. But what if an American fan is searching for a football site. What if he uses the keyword 'soccer'? Your site may very well be the best in the world, but you're immediately cutting off a large proportion of your potential audience simply by not giving enough thought to your indexing terms.

All sites have different indexing needs, and these should be made with the user in mind. A Shakespeare website may want to index its documents by phrases. For example, a user may be searching by a particular line or quote, e.g. "To be or not to be." Therefore, indexing will have to be tailored to user's needs.

In the computer lab, our task was to query two search engines - Google and Bing. Easy, I thought. Perhaps too easy. But of course, I was wrong.

We had to use different search models in order to get to different types of information. And depending on the type of information we had to find, certain search models worked better.

For example -

Natural language queries. I don't use these very often. But they often turned up very useful results, particularly on informational, exploratory information (when that information was explicit. Finding out about the Civil War levellers confused the search a bit, since 'levellers' could be any number of things).
Quotation marks - I use these most often. They're very useful for finding documents that contain certain words or short phrases. When a natural language query failed, adding quotation marks and deleting stop words usually helped narrow the search.
Boolean operators. Something I NEVER use. I'd always thought of them as kind of antiquated and redundant. So they proved to be - sometimes. I discovered that they simply do not work with Google. However, they were compatible with Bing (Bing automatically uses the AND operator anyway). An example - searching for Jerusalem artichokes and how to cook them. For some reason, the search turned up quite a bit on growing Jerusalem artichokes. Finally I found a use for the NOT operator. With Google, this simply turned up more hits on growing artichokes. With Bing, it seemed to work as intended.

Another part of the exercise was to calculate the precision of each search engine's results. What did I discover?

Why, that Google has a higher precision rate than Bing. Of course. ;)

Addicted to Google and the "Church of Search".

Sunday, 16 October 2011

DITA exercises #3 - the AFTERMATH.

I finally got some feedback about my SQL exercises from session 3 of DITA.

What I have learned from the feedback is that there are many little 'pointers' that help to 'tighten up ' the commands and thus return more reliable data from your query.

For example:-

Using the = sign with > or < in order to actually include the number typed in the query, rather than just those numbers lesser or greater than it. (e.g. using >= 1980 as opposed to > 1980; the former includes 1980 in the search).
The use of % to make sure you get as many returns on your query as possible (e.g. using "%Prentice Hall%" as opposed to "Prentice Hall"; the former includes all matches including the name Prentice Hall, not just matches comprising ONLY the name Prentice Hall).
Keep in mind the difference between numbers and characters - in SQL, 0028007484 is treated as a number, whilst 0-0280074-8-4 is treated as a string of characters. Therefore the command = will work with the former but not the latter; the 'like' command should be used instead.

Helen, a fellow student, had an excellent way of explaining how queries should be arranged:-

"The columns of data you want to obtain from the database (SELECT)The table or tables that this data is sitting in (FROM)The clauses that limit this data to exactly what you are interested in and no more (WHERE)i.e.select columnA, columnB, columnC from tablenameXYZwhere criteria1_is_met and criteria2_is_me"

I also found that having a diagram of the database's structure was very useful, as it helped me to visualise where and how I could retrieve the data.