I am having performance issues with my relational database when I scale up the data I use in it. I have tried analyzing my queries and tuning them, but am wondering is my actual model the issue. I have talked to a few people and when I tell them I am dealing with millions of rows of data they all just say 'go the NoSQL route'. I know very little about NoSQL but to me my needs best fit a relational model, so any advice I can glean from here would be much appreciated.
My application is a web app that allows users upload research papers and publications and search them using predefined keywords. Papers can be grouped in libraries, and each paper will contain a collection of sections. The system will allow the user to view:
1. For each paper, the number of sections, and words in the paper. The number of matches that paper has against the keyword list.
2. For each section, the number of words, and the text of all sentences that contain matches against the keyword list..
3. For each library, the number of papers in the library, and the number of matches of all papers in the library against the keyword list.
The application is effectively a 'keyword' mining tool for papers, and really allows the user ask questions such as 'Show me the sentences from Section 1 that match keywords "XXX, YYY"".
I have structured my relational model so that I have, the following tables (interesting columns are listed)
Paper : Title, Word Count, LibraryId
Section: Title, Word Count, PaperId
Sentence: WordCount, Text, SectionId
KeyWordHit: KeyWord, SentenceId
This all works fine at small scale, but when I start testing at large scale, as in thousands of Users, each which hundreds of books, the queries really start to slow down.
I really want to prioritize read time over write time but it seems that no matter what I do with my indexing I always end up having issues with disk reads taking a long time when trying to retrieve the sentences that contain specific keywords in a book or over a whole library.
So my question is, is there a better way to model this using either relational or other non-SQL datastore technologies