Very well said, David! Data and conclusions are the two sides of the same coin in science. Collecting data for the sake of collecting data leads to nothing, and thinking up hypotheses on the basis of other hypotheses (=guesses) might be OK for philosophy, but also does not advance us in the natural sciences. At least in some areas of our science, I recognize a tendency towards data worshipping: The idea that, if we just collect enough data and then play whatever statistical tricks on them (often without much consideration whetehr they make sense, if only the P-value is right), they will tell us "the truth". However, there are a number of problems with this approach, and that's where methodology comes in: You need a null hypothesis to test, and you need the right analytical procedures to carry out such a test. And it is in the second step where the thing gets tricky: There are often a number of procedures that can be applied and that will give you some result, but which is most suited for the data you have? This question again depends on the problem you want to solve (=test your null hypothesis), so data and hypothesis cannot be separated.
This orientation towards high amounts of data and statistical analysis can also lead to stifling scientific progress: I had reviews where I was told that the data is insufficient to formulate any hypothesis. So, what are you to do? Collect fossils for another 20 years or so, until you feel you have enough data? In my opinion, it is better to formulate a hypothesis on the basis of sparse data that can then be tested by additional data than to just collect data in a hypothesis-free field (how can you tell then if the data is significant or not?). Furthermore, formulating hypotheses always has the positive potential to make other people think about that problem, maybe because they dislike your hypothesis and want to prove it wrong - if they suceed, scientific progress has been made!
I think the positions of both hypothetical scientists are off-track. On the one hand, data should not be "guesses" or otherwise indefensible, but on the other hand there is no such thing as perfect data. All data have errors, uncertainty, and sampling imperfections. In hypothetico-deductive science, data do not prove conclusions, data falsify conclusions. The preferred hypothesis is the one that best explains the observed data, taking into account its flaws and uncertainties. If Scientist 1 has taken into account the uncertainties in her data, then her conclusions are likely to be both sound and testable. However, if Scientist 1's data truly are "guesses" then the data are probably arising from the hypothesis rather than testing it. Scientist 2, regardless of the quality of data, will never make a contribution because data can always be improved ad infinitum.
I would add that I feel strongly that the strongest test of a hypothesis is made with new data, not reanalysis of published data. Reanalysis of old data may expose analytical errors, but it perpetuates errors in the data and short-sightedness and limitations in the way the data are assembled. Except in the trivial case where the original author mis-analyzed, existing data will support existing conclusions, whereas new data from new sources may reveal that the original data were unrepresentative or were not the best way of testing the hypothesis. I am, of course, not referring to cases where data are repurposed to address a completely new question in a new way. A study of the functional morphology of a fossil that was collected for biostratigraphy is unlikely to perpetuate errors that arise from uneven temporal sampling; however, reanalysis of the biostratigraphic zonation based on the original collection will perpetuate unrecognized biases in a way that recollecting would not. The position of Scientist 2, which focuses on amalgamating data as the definition of advances in science, has the potential of stifling scientific progress by channeling scientific efforts into reanalysis of the same data rather than creative thinking about how to test questions in new ways.
I therefore favor Scientist 3, who attacks a problem in a new way by collecting new data that focus on the crux of the problem. Scientist 3 examines carefully the biases and uncertainties in her data, but is unafraid to draw forward-thinking, testable conclusions that attempt to explain those data and generalize from them to the extent that flaws allow. Scientist 3 is an excellent scientist because she has generated new data, new ways of thinking, new tests, and new ideas. Most importantly, she has paved the way for Scientist 4 to prove her wrong if he chooses to adopt an equally scientific approach.