It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
avatar
Wishbone: I know. He's new at this, I'm trying not to overload him with information all at once.
True, but having the comparisons with shortened table names would cause fewer long term issues, especially if he's usually using quotes to identify his tables suggesting he has odd usages of names for his tables.
avatar
MichaelFurlong: snip
I think others have got this pretty much handled, and there do seem to be a good handful of programmers in the gog forums, but for future reference and in case you haven't been acquainted yet one of the standard go-to resources for coding questions in general is this place:

http://stackoverflow.com/

or its various sub-domain sites in the stackexchange network. In particular, this one is focused on databases:

http://dba.stackexchange.com/
Post edited August 28, 2015 by the.kuribo
I'll be back tomorrow. I'm going to set up some test-data which I can share. I'll also play around with what's been suggested so far. Thank you.

BTW Gar = Lepisosteus Oculatus, the Gar fish. I'm trying to manipulate the genomes of three species. The other two are the chicken and human genomes. I'm trying to create some scripts to quickly compare gene locations between species to measure/analyse changes in various species genomes over time.

Currently I do hand analysis using excel files, but I want to automate it.
Not sure if it's just a copy-paste mistake or if what you're using cares about it at all, but SQL statements like to be ended with a semicolon.

SELECT field FROM table;
avatar
popperik: Not sure if it's just a copy-paste mistake or if what you're using cares about it at all, but SQL statements like to be ended with a semicolon.

SELECT field FROM table;
I usually work with MSSQL, and that doesn't care. I don't know about OpenOffice Base.
I created a "Test" file consisting of a small subset of the rows in my dataset. I used to have all three species in one large table, but I was advised to split each species into a separate table so that I could retrieve data from each table independently. However I still have a problem with information from one table influencing the other.

Each table possesses a key which allows me to keep information consistent between tables. There are many columns but I have hidden them for convenience. The First two columns come from Gar, the 3rd and 4th from Chi, and the 5th and 6th from Hum.

I only want to return a row from a table if the orthology confidence has a value of 1.
As you can see in the picture, my current script/command(?) returns a row with a confidence value of 0 in chicken if the confidence value in Human is 1, and vice versa. I don't want the results of one table to influence the results of the other.

Idealy I want to be able to process the Chi table and add it to Gar. Then process the Hum table and add it to Gar. That way they won't interfere with each other. I've looked online for many hours and haven't found a solution.
Attachments:
test1.png (218 Kb)
Post edited August 28, 2015 by MichaelFurlong
avatar
MichaelFurlong: I created a "Test" file consisting of a small subset of the rows in my dataset. I used to have all three species in one large table, but I was advised to split each species into a separate table so that I could retrieve data from each table independently. However I still have a problem with information from one table influencing the other.

Each table possesses a key which allows me to keep information consistent between tables. There are many columns but I have hidden them for convenience. The First two columns come from Gar, the 3rd and 4th from Chi, and the 5th and 6th from Hum.

I only want to return a row from a table if the orthology confidence has a value of 1.
As you can see in the picture, my current script/command(?) returns a row with a confidence value of 0 in chicken if the confidence value in Human is 1, and vice versa. I don't want the results of one table to influence the results of the other.

Idealy I want to be able to process the Chi table and add it to Gar. Then process the Hum table and add it to Gar. That way they won't interfere with each other. I've looked online for many hours and haven't found a solution.
You are inner joining all three tables together. This means that any row you get will contain data from all three tables.

Try this:
SELECT
"Gar"."Gar Key",
"Gar"."Ensembl Gene ID",
'Chicken' AS Species,
"Chi"."Orthology confidence [0 low, 1 high]" AS Confidence,
Chi.chifield2 AS CommonField2 (essentially, whatever other fields you want)
FROM Chi
JOIN Gar on Chi.var = Gar.var
WHERE Chi.Orthology = 1
UNION
SELECT
"Gar"."Gar Key",
"Gar"."Ensembl Gene ID",
'Human' AS Species,
"Hum"."Orthology confidence [0 low, 1 high]" AS Confidence,
Hum.humfield2 AS CommonField2 (same list of fields as the Chi table, with the same names)
FROM Hum
JOIN Gar on Hum.var = Gar.var
WHERE Hum.Orthology = 1
ORDER BY 2 (this will use the second column for the ordering)

Seriously though, a database is not a spreadsheet. You really ought to change the column names to something more sensible, or writing queries is going to be hell for you. Don't use whitespaces in column names, ever.

Also, if all three tables contain the same data, they should be one table. I don't know what you mean by "retrieve data from each table independently". That's what selection criteria are for. And you can just as easily join a table with itself in a query as you can join it with another table.
If you use SQLfiddles for explaining us the data and the queries, helping gets easier.
avatar
The-Business: If you use SQLfiddles for explaining us the data and the queries, helping gets easier.
Huh, never heard of it. Looks like an excellent tool. Thanks :-)
avatar
MichaelFurlong: Each table possesses a key which allows me to keep information consistent between tables. There are many columns but I have hidden them for convenience. The First two columns come from Gar, the 3rd and 4th from Chi, and the 5th and 6th from Hum.
Although you shouldn't hide them, actually giving us a dump of what the table structure looks like might help. So far my impression is the chicken key, human key, and Gar key are the same... And confidence is... what?? Probably an output field when doing comparisons as an output field, so they are probably nil when they start.

Although overall i'm getting the feeling you really don't understand how the databases are suppose to work... Which is understandable.
avatar
Wishbone: Snip
I've tried what you advised, as well as trying hybridised versions. I either get syntax errors or I get an error telling me that it isn't a retrieval function. I've decided to show my original data in hopes it may simplify things, as I think I've made this more complicated than it has to be.

Some things I've heard in this thread conflict with advise I've already taken in processing my data, so I'll start fresh. I should have done this from my first post, I'm sorry for making this more difficult for the people trying to help me.

I've attached two files:
garchihum: My original data. Key columns were added by myself, in addition to colouration.
garchihum 2: I have removed the keys and whitespace. I've also removed capitals and shortened names to aid in script/command(?) writing.

Every row represents a gene in the gar fish, the reference genome. Each gene is matched to corresponding genes in two other species, chicken and human (in the same row). I have coloured the information relating to each species in three colours. Yellow for Gar, brown for chicken, and green for human.

I have created tables in openoffice base, and I am using sql to write scripts/commands(?) to retrieve only select data from each species.

Right now I am able to select all rows where the confidence of chicken or human are 1 AND I am able to select all rows where the confidence of chicken and human are 1.

However, I want to be able to select gar and chicken data if chicken confidence is 1 and exclude the human data when human confidence is 0, or vice versa.
Unfortunately, whether I use a single table or multiple tables, I am in a position where I either get low confidence data from one species if the other species is high confidence OR I am getting data from neither species if one of them have low confidence data.

If anyone wants to be particularly helpful, I can invite them into a dropbox folder with some sample data.
Attachments:
Post edited August 29, 2015 by MichaelFurlong
avatar
MichaelFurlong: <snip>
Right now I am able to select all rows where the confidence of chicken or human are 1 AND I am able to select all rows where the confidence of chicken and human are 1.

However, I want to be able to select gar and chicken data if chicken confidence is 1 and exclude the human data when human confidence is 0, or vice versa.
Unfortunately, whether I use a single table or multiple tables, I am in a position where I either get low confidence data from one species if the other species is high confidence OR I am getting data from neither species if one of them have low confidence data.
Glancing at the data, i assume the keys as before was to ensure uniqueness, but that's not relevant here... So in the end you have species, genome name, start and finish, and a confidence level...

In this case it would make more sense to add a field to specify if they are human, gar or chi. Recommended field would be misc and the values would be:
0 = low
1 = high
10 = human
11 = gar
12 = chicken

As for how to use it, you'd replace the values from one table when you are accessing it, which automatically converts between it's more readable value type and then removes the need to include the explanation in the table name.

my MySQL is a little rusty so...

[code] Select T.* FROM Table T LEFT JOIN Misc M on T.conf=M.value and T.species=M.value WHERE ... [/code]

If you're doing multiple tables and only interested in getting if the confidence is 1, then you're better off doing 3 calls one for each table, as there's no reason to merge these unless they somehow hold the data you're looking for. None of the other data is really relational in a way that would be easy to do using a database.
Post edited September 01, 2015 by rtcvb32