Discussion:
Is my almost decade old ATI Radeon 4870 video card finally failing?
Add Reply
Ant
2018-08-04 03:51:53 UTC
Reply
Permalink
Raw Message
Recently, I turned off my old full-tower PC off for a few hours last
month (overnight) and earlier today due to the hot summer weather. My
room temperature was like over 80F degrees.

After I powered on my PC, its updated 64-bit W7's boot up splash screen
showed a partial white flash for a second and then corruptions. I think
it got to its login screen and then with a corrupted blue screen which
was unreadable with more white blocks. Loading Image...
and Loading Image... for my blurry iPhone 4S photographs
of it. Also, it was able to make a memory dump as shown below when I was
able to get back in after pressing the PC's reset button (not power off
and on):

Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is:
srv*c:\symbols*https://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 7 Kernel Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS Personal
Built by: 7601.24150.amd64fre.win7sp1_ldr_escrow.180528-1700
Machine Name:
Kernel base = 0xfffff800`03013000 PsLoadedModuleList = 0xfffff800`03252c90
Debug session time: Sat Jul 7 10:08:48.150 2018 (UTC - 7:00)
System Uptime: 0 days 0:00:44.040
Loading Kernel Symbols
...............................................................
................................................................
................
Loading User Symbols

Loading unloaded module list
....
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 116, {fffffa8009143010, fffff88004837efc, 0, 2}

*** ERROR: Module load completed but symbols could not be loaded for
atikmpag.sys
Probably caused by : atikmpag.sys ( atikmpag+8efc )

Followup: MachineOwner
---------

6: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa8009143010, Optional pointer to internal TDR recovery
context (TDR_RECOVERY_CONTEXT).
Arg2: fffff88004837efc, The pointer into responsible device driver
module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last
failed operation.
Arg4: 0000000000000002, Optional internal context dependent data.

Debugging Details:
------------------


FAULTING_IP:
atikmpag+8efc
fffff880`04837efc 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`079fe9c8 fffff880`048eb388 : 00000000`00000116
fffffa80`09143010 fffff880`04837efc 00000000`00000000 : nt!KeBugCheckEx
fffff880`079fe9d0 fffff880`048eb092 : fffff880`04837efc
fffffa80`09143010 fffffa80`07c76d50 fffffa80`07c76010 :
dxgkrnl!TdrBugcheckOnTimeout+0xec
fffff880`079fea10 fffff880`0560ffcb : fffffa80`09143010
00000000`00000000 fffffa80`07c76d50 fffffa80`07c76010 :
dxgkrnl!TdrIsRecoveryRequired+0x1a2
fffff880`079fea40 fffff880`05639d45 : 00000000`ffffffff
00000000`0000083b 00000000`00000000 00000000`00000002 :
dxgmms1!VidSchiReportHwHang+0x40b
fffff880`079feb20 fffff880`0563848b : 00000000`00000102
00000000`00000000 00000000`0000083b 00000000`00000000 :
dxgmms1!VidSchiCheckHwProgress+0x71
fffff880`079feb50 fffff880`0560b2f2 : ffffffff`ff676980
fffffa80`07c76010 00000000`00000000 00000000`00000000 :
dxgmms1!VidSchiWaitForSchedulerEvents+0x1fb
fffff880`079febf0 fffff880`0563804a : 00000000`00000000
00000000`0000000f 00000000`00000080 fffffa80`077fe748 :
dxgmms1!VidSchiScheduleCommandToRun+0x1da
fffff880`079fed00 fffff800`0335bbe0 : 00000000`059c8b64
fffff800`031fd180 00000000`00000001 fffffa80`075d1060 :
dxgmms1!VidSchiWorkerThread+0xba
fffff880`079fed40 fffff800`030bd8c6 : fffff800`031fd180
fffffa80`075d1060 fffff800`0320d1c0 00000000`00000000 :
nt!PspSystemThreadStartup+0x194
fffff880`079fed80 00000000`00000000 : fffff880`079ff000
fffff880`079f9000 fffff880`044a2c20 00000000`00000000 :
nt!KiStartSystemThread+0x16


STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
atikmpag+8efc
fffff880`04837efc 4055 push rbp

SYMBOL_NAME: atikmpag+8efc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: atikmpag

IMAGE_NAME: atikmpag.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 517f30ef

FAILURE_BUCKET_ID: X64_0x116_IMAGE_atikmpag.sys

BUCKET_ID: X64_0x116_IMAGE_atikmpag.sys

Followup: MachineOwner
---------

6: kd> lmvm atikmpag
start end module name
fffff880`0482f000 fffff880`0488e000 atikmpag (no symbols)
Loaded symbol image file: atikmpag.sys
Image path: \SystemRoot\system32\DRIVERS\atikmpag.sys
Image name: atikmpag.sys
Timestamp: Mon Apr 29 19:48:15 2013 (517F30EF)
CheckSum: 000613E7
ImageSize: 0005F000
File version: 8.14.1.6264
Product version: 8.14.1.6264
File flags: 8 (Mask 3F) Private
File OS: 40004 NT Win32
File type: 3.4 Driver
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Advanced Micro Devices, Inc.
ProductName: AMD driver
InternalName: atikmpag.sys
OriginalFilename: atikmpag.sys
ProductVersion: 8.14.01.6264
FileVersion: 8.14.01.6264
FileDescription: AMD multi-vendor Miniport Driver
LegalCopyright: Copyright (C) 2007 Advanced Micro Devices, Inc.
6: kd> lmvm atikmpag
start end module name
fffff880`0482f000 fffff880`0488e000 atikmpag (no symbols)
Loaded symbol image file: atikmpag.sys
Image path: \SystemRoot\system32\DRIVERS\atikmpag.sys
Image name: atikmpag.sys
Timestamp: Mon Apr 29 19:48:15 2013 (517F30EF)
CheckSum: 000613E7
ImageSize: 0005F000
File version: 8.14.1.6264
Product version: 8.14.1.6264
File flags: 8 (Mask 3F) Private
File OS: 40004 NT Win32
File type: 3.4 Driver
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Advanced Micro Devices, Inc.
ProductName: AMD driver
InternalName: atikmpag.sys
OriginalFilename: atikmpag.sys
ProductVersion: 8.14.01.6264
FileVersion: 8.14.01.6264
FileDescription: AMD multi-vendor Miniport Driver
LegalCopyright: Copyright (C) 2007 Advanced Micro Devices, Inc.


It can't be its hardware's hot heat since I just turned it on after
several (five and ten) hours power off. Maybe too hot in my room which
was over 80F degees without AC. Maybe the video card is failing due to
bad caps? Maybe it needs to be cleaned from its dusts?
http://zimage.com/~ant/antfarm/about/MyComputerStuff.txt for the
detailed specifications for primary computer.

I hope someone can answer soon. Thank you for reading my long post. ;)
--
"You're kissing an ant hill." --Mike Nelson
Note: A fixed width font (Courier, Monospace, etc.) is required to see
this signature correctly.
/\___/\ If crediting, then use Ant nickname and URL/link.
/ /\ /\ \ Axe ANT from its address if e-mailing privately.
| |o o| | http://antfarm.ma.cx / http://antfarm.home.dhs.org
\ _ /
( )
Paul
2018-08-04 04:52:03 UTC
Reply
Permalink
Raw Message
Post by Ant
Recently, I turned off my old full-tower PC off for a few hours last
month (overnight) and earlier today due to the hot summer weather. My
room temperature was like over 80F degrees.
After I powered on my PC, its updated 64-bit W7's boot up splash screen
showed a partial white flash for a second and then corruptions. I think
it got to its login screen and then with a corrupted blue screen which
was unreadable with more white blocks. https://i.imgur.com/87PHHDL.jpg
and https://i.imgur.com/V9kmhTo.jpg for my blurry iPhone 4S photographs
of it. Also, it was able to make a memory dump as shown below when I was
able to get back in after pressing the PC's reset button (not power off
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
srv*c:\symbols*https://msdl.microsoft.com/download/symbols
atikmpag+8efc
fffff880`04837efc 4055 push rbp
DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT
"TDR stands for Timeout Detection and Recovery. This is a feature
of the Windows operating system which detects response problems
from a graphics card, and recovers to a functional desktop by
resetting the card."

Is the fan on the video card still working ?

Make sure the card is not overheating.

With the power off, take your finger and *gently*
verify the fan blade does not suffer from excessive
friction.

When the temperature was getting slightly higher than
normal on my low-end NVidia card, I took the heatsink
off and re-applied AS3 paste. I also carefully cleaned
the heatsink fins, and fan blades, of dust. Since I've
destroyed a cooling fan while cleaning it, I'm no longer
aggressive when cleaning. *Don't* lift up or
press down on the fan too much. That's the mistake
I made.

*******

NVidia had an issue with cracked solder balls on
the GPU FBGA part of the design.

ATI has had trouble too, but to a lesser extent.
They didn't get the same press coverage for it.

The ball count on GPUs is quite high, and underfill
next to the solder balls is needed to equalize stresses.
If the polymer isn't carefully selected, the service
life of the solder balls is reduced, and one could
crack, causing an intermittent fault.

*******

I'd make damn sure the fan is spinning...

It could be temperature that the GPU sees,
and not just the room temperature (ambient).

If the heatsink falls off a processor or GPU,
the chip temperature shoots up to 200C
pretty well instantly. Also, if a GPU fan stops
turning because you ripped the power connector
off the video card PCB, the chip heats up slower,
but the plastic fan body will start to melt.
For long term reliability, the silicon chip
should stay below 135C (that's what the head of
our little fab told me when I asked one day).
If you keep the chip below 135C, it'll last
for at least 100,000 hours (that's an
arbitrary definition for reliability purposes,
and does not mean very much to end-users).
You hope it lasts longer than that.

To view the fan while it operates, I use a
LED flashlight and a dental mirror. As it's not
always possible to get my head oriented next
to the computer case to check.

If the GPU has thermal damage, or a cracked
ball on the bottom of the GPU, then the
symptoms you see may require replacement
of the video card.

*******

Just about any DRAM chip can fail at any time.
Sometimes a video card failure is traceable to
a single bad DRAM chip on the video card surface.

I'm not aware of any "diagnostic on a floppy"
available to detect this. NVidia or ATI likely
have BIST collars around each independent
DRAM interface, and if they wanted to, they
could make a red LED light up if the hardware
tests bad. But nobody wants to spend an extra
$0.10 for a LED.

Paul
Ant
2018-08-04 18:17:47 UTC
Reply
Permalink
Raw Message
Post by Paul
Post by Ant
Recently, I turned off my old full-tower PC off for a few hours last
month (overnight) and earlier today due to the hot summer weather. My
room temperature was like over 80F degrees.
After I powered on my PC, its updated 64-bit W7's boot up splash screen
showed a partial white flash for a second and then corruptions. I think
it got to its login screen and then with a corrupted blue screen which
was unreadable with more white blocks. https://i.imgur.com/87PHHDL.jpg
and https://i.imgur.com/V9kmhTo.jpg for my blurry iPhone 4S photographs
of it. Also, it was able to make a memory dump as shown below when I was
able to get back in after pressing the PC's reset button (not power off
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
srv*c:\symbols*https://msdl.microsoft.com/download/symbols
atikmpag+8efc
fffff880`04837efc 4055 push rbp
DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT
"TDR stands for Timeout Detection and Recovery. This is a feature
of the Windows operating system which detects response problems
from a graphics card, and recovers to a functional desktop by
resetting the card."
Is the fan on the video card still working ?
Yes, right now. I can feel hot air being blown out behind my case from
its video card's exhaust port hole. I need to recheck to see if the fan
spins up right after power on my PC though.
Post by Paul
Make sure the card is not overheating.
It should't be overheating right after powering on PC when it was off
for several hours though, right?
Post by Paul
With the power off, take your finger and *gently*
verify the fan blade does not suffer from excessive
friction.
Do I still need to do this if the fan spins? I can set it to 100%
without any problems in ATI's CCC. Automatic fan speed is currently at
34%. If I manually force 50% and 100%. It works loudly too. So, fan is
working.
Post by Paul
When the temperature was getting slightly higher than
normal on my low-end NVidia card, I took the heatsink
off and re-applied AS3 paste. I also carefully cleaned
the heatsink fins, and fan blades, of dust. Since I've
destroyed a cooling fan while cleaning it, I'm no longer
aggressive when cleaning. *Don't* lift up or
press down on the fan too much. That's the mistake
I made.
*******
NVidia had an issue with cracked solder balls on
the GPU FBGA part of the design.
ATI has had trouble too, but to a lesser extent.
They didn't get the same press coverage for it.
The ball count on GPUs is quite high, and underfill
next to the solder balls is needed to equalize stresses.
If the polymer isn't carefully selected, the service
life of the solder balls is reduced, and one could
crack, causing an intermittent fault.
*******
I'd make damn sure the fan is spinning...
It could be temperature that the GPU sees,
and not just the room temperature (ambient).
I was checking my three theremeters in my room. They ranged from 80 to
almost 90F degrees.
Post by Paul
If the heatsink falls off a processor or GPU,
the chip temperature shoots up to 200C
pretty well instantly. Also, if a GPU fan stops
turning because you ripped the power connector
off the video card PCB, the chip heats up slower,
but the plastic fan body will start to melt.
For long term reliability, the silicon chip
should stay below 135C (that's what the head of
our little fab told me when I asked one day).
If you keep the chip below 135C, it'll last
for at least 100,000 hours (that's an
arbitrary definition for reliability purposes,
and does not mean very much to end-users).
You hope it lasts longer than that.
To view the fan while it operates, I use a
LED flashlight and a dental mirror. As it's not
always possible to get my head oriented next
to the computer case to check.
Or just feel it hot air coming out from the video card's exhaust port?
;)
Post by Paul
If the GPU has thermal damage, or a cracked
ball on the bottom of the GPU, then the
symptoms you see may require replacement
of the video card.
I am planning to replace this decade old video card if it gets worse.
Post by Paul
Just about any DRAM chip can fail at any time.
Sometimes a video card failure is traceable to
a single bad DRAM chip on the video card surface.
...

I doubt it is this because it only happens when my PC boots up. I don't
see any problems when in Windows when using it. I can go many months of
uptimes including soft reboots without pressing the reset and power off
buttons.
--
Note: A fixed width font (Courier, Monospace, etc.) is required to see this signature correctly.
/\___/\Ant(Dude) @ http://antfarm.home.dhs.org / http://antfarm.ma.cx
/ /\ /\ \ Please nuke ANT if replying by e-mail privately. If credit-
| |o o| | ing, then please kindly use Ant nickname and URL/link.
\ _ /
( )
Ant
2018-08-05 03:41:17 UTC
Reply
Permalink
Raw Message
Post by Ant
Post by Paul
Post by Ant
Recently, I turned off my old full-tower PC off for a few hours last
month (overnight) and earlier today due to the hot summer weather. My
room temperature was like over 80F degrees.
After I powered on my PC, its updated 64-bit W7's boot up splash screen
showed a partial white flash for a second and then corruptions. I think
it got to its login screen and then with a corrupted blue screen which
was unreadable with more white blocks. https://i.imgur.com/87PHHDL.jpg
and https://i.imgur.com/V9kmhTo.jpg for my blurry iPhone 4S photographs
of it. Also, it was able to make a memory dump as shown below when I was
able to get back in after pressing the PC's reset button (not power off
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
srv*c:\symbols*https://msdl.microsoft.com/download/symbols
atikmpag+8efc
fffff880`04837efc 4055 push rbp
DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT
"TDR stands for Timeout Detection and Recovery. This is a feature
of the Windows operating system which detects response problems
from a graphics card, and recovers to a functional desktop by
resetting the card."
Is the fan on the video card still working ?
Yes, right now. I can feel hot air being blown out behind my case from
its video card's exhaust port hole. I need to recheck to see if the fan
spins up right after power on my PC though.
Post by Paul
Make sure the card is not overheating.
It should't be overheating right after powering on PC when it was off
for several hours though, right?
Post by Paul
With the power off, take your finger and *gently*
verify the fan blade does not suffer from excessive
friction.
Do I still need to do this if the fan spins? I can set it to 100%
without any problems in ATI's CCC. Automatic fan speed is currently at
34%. If I manually force 50% and 100%. It works loudly too. So, fan is
working.
Post by Paul
When the temperature was getting slightly higher than
normal on my low-end NVidia card, I took the heatsink
off and re-applied AS3 paste. I also carefully cleaned
the heatsink fins, and fan blades, of dust. Since I've
destroyed a cooling fan while cleaning it, I'm no longer
aggressive when cleaning. *Don't* lift up or
press down on the fan too much. That's the mistake
I made.
*******
NVidia had an issue with cracked solder balls on
the GPU FBGA part of the design.
ATI has had trouble too, but to a lesser extent.
They didn't get the same press coverage for it.
The ball count on GPUs is quite high, and underfill
next to the solder balls is needed to equalize stresses.
If the polymer isn't carefully selected, the service
life of the solder balls is reduced, and one could
crack, causing an intermittent fault.
*******
I'd make damn sure the fan is spinning...
It could be temperature that the GPU sees,
and not just the room temperature (ambient).
I was checking my three theremeters in my room. They ranged from 80 to
almost 90F degrees.
Post by Paul
If the heatsink falls off a processor or GPU,
the chip temperature shoots up to 200C
pretty well instantly. Also, if a GPU fan stops
turning because you ripped the power connector
off the video card PCB, the chip heats up slower,
but the plastic fan body will start to melt.
For long term reliability, the silicon chip
should stay below 135C (that's what the head of
our little fab told me when I asked one day).
If you keep the chip below 135C, it'll last
for at least 100,000 hours (that's an
arbitrary definition for reliability purposes,
and does not mean very much to end-users).
You hope it lasts longer than that.
To view the fan while it operates, I use a
LED flashlight and a dental mirror. As it's not
always possible to get my head oriented next
to the computer case to check.
Or just feel it hot air coming out from the video card's exhaust port?
;)
Post by Paul
If the GPU has thermal damage, or a cracked
ball on the bottom of the GPU, then the
symptoms you see may require replacement
of the video card.
I am planning to replace this decade old video card if it gets worse.
Post by Paul
Just about any DRAM chip can fail at any time.
Sometimes a video card failure is traceable to
a single bad DRAM chip on the video card surface.
...
I doubt it is this because it only happens when my PC boots up. I don't
see any problems when in Windows when using it. I can go many months of
uptimes including soft reboots without pressing the reset and power off
buttons.
Oh, I forgot to mention this too. I also had a few seconds power outage
a couple days ago in the early morning that my PC went off and then on
(no UPS' batter for it). It had no problems! The issue only happened
when the PC was manually turned off for hours and then manually powered
on.
--
Quote of the Week: "The ants are a people not strong, yet they prepare their meat in the summer." --Proverbs 30:25 (Bible)
Note: A fixed width font (Courier, Monospace, etc.) is required to see this signature correctly.
/\___/\Ant(Dude) @ http://antfarm.home.dhs.org / http://antfarm.ma.cx
/ /\ /\ \ Please nuke ANT if replying by e-mail privately. If credit-
| |o o| | ing, then please kindly use Ant nickname and URL/link.
\ _ /
( )
pk121
2018-08-05 13:35:43 UTC
Reply
Permalink
Raw Message
Post by Ant
Post by Paul
Post by Ant
Recently, I turned off my old full-tower PC off for a few hours last
month (overnight) and earlier today due to the hot summer weather. My
room temperature was like over 80F degrees.
After I powered on my PC, its updated 64-bit W7's boot up splash screen
showed a partial white flash for a second and then corruptions. I think
it got to its login screen and then with a corrupted blue screen which
was unreadable with more white blocks. https://i.imgur.com/87PHHDL.jpg
and https://i.imgur.com/V9kmhTo.jpg for my blurry iPhone 4S photographs
of it. Also, it was able to make a memory dump as shown below when I was
able to get back in after pressing the PC's reset button (not power off
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
srv*c:\symbols*https://msdl.microsoft.com/download/symbols
atikmpag+8efc
fffff880`04837efc 4055 push rbp
DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT
"TDR stands for Timeout Detection and Recovery. This is a feature
of the Windows operating system which detects response problems
from a graphics card, and recovers to a functional desktop by
resetting the card."
Is the fan on the video card still working ?
Yes, right now. I can feel hot air being blown out behind my case from
its video card's exhaust port hole. I need to recheck to see if the fan
spins up right after power on my PC though.
Post by Paul
Make sure the card is not overheating.
It should't be overheating right after powering on PC when it was off
for several hours though, right?
Post by Paul
With the power off, take your finger and *gently*
verify the fan blade does not suffer from excessive
friction.
Do I still need to do this if the fan spins? I can set it to 100%
without any problems in ATI's CCC. Automatic fan speed is currently at
34%. If I manually force 50% and 100%. It works loudly too. So, fan is
working.
Post by Paul
When the temperature was getting slightly higher than
normal on my low-end NVidia card, I took the heatsink
off and re-applied AS3 paste. I also carefully cleaned
the heatsink fins, and fan blades, of dust. Since I've
destroyed a cooling fan while cleaning it, I'm no longer
aggressive when cleaning. *Don't* lift up or
press down on the fan too much. That's the mistake
I made.
*******
NVidia had an issue with cracked solder balls on
the GPU FBGA part of the design.
ATI has had trouble too, but to a lesser extent.
They didn't get the same press coverage for it.
The ball count on GPUs is quite high, and underfill
next to the solder balls is needed to equalize stresses.
If the polymer isn't carefully selected, the service
life of the solder balls is reduced, and one could
crack, causing an intermittent fault.
*******
I'd make damn sure the fan is spinning...
It could be temperature that the GPU sees,
and not just the room temperature (ambient).
I was checking my three theremeters in my room. They ranged from 80 to
almost 90F degrees.
Post by Paul
If the heatsink falls off a processor or GPU,
the chip temperature shoots up to 200C
pretty well instantly. Also, if a GPU fan stops
turning because you ripped the power connector
off the video card PCB, the chip heats up slower,
but the plastic fan body will start to melt.
For long term reliability, the silicon chip
should stay below 135C (that's what the head of
our little fab told me when I asked one day).
If you keep the chip below 135C, it'll last
for at least 100,000 hours (that's an
arbitrary definition for reliability purposes,
and does not mean very much to end-users).
You hope it lasts longer than that.
To view the fan while it operates, I use a
LED flashlight and a dental mirror. As it's not
always possible to get my head oriented next
to the computer case to check.
Or just feel it hot air coming out from the video card's exhaust port?
;)
Post by Paul
If the GPU has thermal damage, or a cracked
ball on the bottom of the GPU, then the
symptoms you see may require replacement
of the video card.
I am planning to replace this decade old video card if it gets worse.
Post by Paul
Just about any DRAM chip can fail at any time.
Sometimes a video card failure is traceable to
a single bad DRAM chip on the video card surface.
...
I doubt it is this because it only happens when my PC boots up. I don't
see any problems when in Windows when using it. I can go many months of
uptimes including soft reboots without pressing the reset and power off
buttons.
Oh, I forgot to mention this too. I also had a few seconds power outage
a couple days ago in the early morning that my PC went off and then on
(no UPS' batter for it). It had no problems! The issue only happened
when the PC was manually turned off for hours and then manually powered
on.
--
You need to replace the atikmpag.sys file .
if you google your problem you will find many answers on how
https://answers.microsoft.com/en-us/windows/forum/windows_7-system/bsod-windows-7-atikmpagsys/bbf8278c-9ec8-4c98-a1be-7ff6aa62e51b
pk121
Paul
2018-08-05 15:39:06 UTC
Reply
Permalink
Raw Message
Post by Ant
Post by Ant
Post by Ant
Post by Ant
Recently, I turned off my old full-tower PC off for a few hours last
month (overnight) and earlier today due to the hot summer weather. My
room temperature was like over 80F degrees.
After I powered on my PC, its updated 64-bit W7's boot up splash >
screen
Post by Ant
showed a partial white flash for a second and then corruptions. I think
it got to its login screen and then with a corrupted blue screen
which
Post by Ant
Post by Ant
was unreadable with more white blocks.
https://i.imgur.com/87PHHDL.jpg
Post by Ant
Post by Ant
and https://i.imgur.com/V9kmhTo.jpg for my blurry iPhone 4S > >
photographs
Post by Ant
Post by Ant
of it. Also, it was able to make a memory dump as shown below when
I > > was
Post by Ant
Post by Ant
able to get back in after pressing the PC's reset button (not
power > > off
Post by Ant
Post by Ant
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
srv*c:\symbols*https://msdl.microsoft.com/download/symbols
atikmpag+8efc
fffff880`04837efc 4055 push rbp
DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT
"TDR stands for Timeout Detection and Recovery. This is a feature
of the Windows operating system which detects response problems
from a graphics card, and recovers to a functional desktop by
resetting the card."
Is the fan on the video card still working ?
Yes, right now. I can feel hot air being blown out behind my case from
its video card's exhaust port hole. I need to recheck to see if the fan
spins up right after power on my PC though.
Post by Ant
Make sure the card is not overheating.
It should't be overheating right after powering on PC when it was off
for several hours though, right?
Post by Ant
With the power off, take your finger and *gently*
verify the fan blade does not suffer from excessive
friction.
Do I still need to do this if the fan spins? I can set it to 100%
without any problems in ATI's CCC. Automatic fan speed is currently at
34%. If I manually force 50% and 100%. It works loudly too. So, fan is
working.
Post by Ant
When the temperature was getting slightly higher than
normal on my low-end NVidia card, I took the heatsink
off and re-applied AS3 paste. I also carefully cleaned
the heatsink fins, and fan blades, of dust. Since I've
destroyed a cooling fan while cleaning it, I'm no longer
aggressive when cleaning. *Don't* lift up or
press down on the fan too much. That's the mistake
I made.
*******
NVidia had an issue with cracked solder balls on
the GPU FBGA part of the design.
ATI has had trouble too, but to a lesser extent.
They didn't get the same press coverage for it.
The ball count on GPUs is quite high, and underfill
next to the solder balls is needed to equalize stresses.
If the polymer isn't carefully selected, the service
life of the solder balls is reduced, and one could
crack, causing an intermittent fault.
*******
I'd make damn sure the fan is spinning...
It could be temperature that the GPU sees,
and not just the room temperature (ambient).
I was checking my three theremeters in my room. They ranged from 80 to
almost 90F degrees.
Post by Ant
If the heatsink falls off a processor or GPU,
the chip temperature shoots up to 200C
pretty well instantly. Also, if a GPU fan stops
turning because you ripped the power connector
off the video card PCB, the chip heats up slower,
but the plastic fan body will start to melt.
For long term reliability, the silicon chip
should stay below 135C (that's what the head of
our little fab told me when I asked one day).
If you keep the chip below 135C, it'll last
for at least 100,000 hours (that's an
arbitrary definition for reliability purposes,
and does not mean very much to end-users).
You hope it lasts longer than that.
To view the fan while it operates, I use a
LED flashlight and a dental mirror. As it's not
always possible to get my head oriented next
to the computer case to check.
Or just feel it hot air coming out from the video card's exhaust port?
;)
Post by Ant
If the GPU has thermal damage, or a cracked
ball on the bottom of the GPU, then the
symptoms you see may require replacement
of the video card.
I am planning to replace this decade old video card if it gets worse.
Post by Ant
Just about any DRAM chip can fail at any time.
Sometimes a video card failure is traceable to
a single bad DRAM chip on the video card surface.
...
I doubt it is this because it only happens when my PC boots up. I don't
see any problems when in Windows when using it. I can go many months of
uptimes including soft reboots without pressing the reset and power off
buttons.
Oh, I forgot to mention this too. I also had a few seconds power outage
a couple days ago in the early morning that my PC went off and then on
(no UPS' batter for it). It had no problems! The issue only happened
when the PC was manually turned off for hours and then manually powered
on.
Bucket 116 is mentioned here, too.

https://answers.microsoft.com/en-us/windows/forum/windows_10-update/bcc116-bcc117-your-video-driver-has-stopped/fba02c53-7796-4166-83a2-9c44865c13af

One commenter changed drivers.

An old card like that, really shouldn't have "new code"
applied to it on the latest driver. As you go from one driver version
to the next, the code should be mostly the same.

On a newer card, playing driver roulette is worth it.
On an older card, not so much. On a brand new card,
the driver may not even be "finished" in a sense.
New hardware blocks might not even be in the command
set yet, and older ones (kept for compatibility)
are still being used.

With an older card, the driver should be pretty mature.

Paul
Ant
2018-08-08 16:21:30 UTC
Reply
Permalink
Raw Message
...
Post by Ant
Oh, I forgot to mention this too. I also had a few seconds power outage
a couple days ago in the early morning that my PC went off and then on
(no UPS' batter for it). It had no problems! The issue only happened
when the PC was manually turned off for hours and then manually powered
on.
FYI. Earlier, I turned off my PC for about 40 minutes earlier for kicks,
and then turned it on. Also, video card's fan does seem to spin when
powering on the PC too since I felt its cool air coming out in my low
80F degrees room. No problems there. I'll need to try again, but for
hours. Maybe I'll retest tonight when I don't need to use this old PC.
--
Quote of the Week: "You feel the faint grit of ants beneath your shoes,
but keep on walking because in this world you have to decide what you're
willing to kill." --Tony Hoagland from "Candlelight"
Note: A fixed width font (Courier, Monospace, etc.) is required to see this signature correctly.
/\___/\Ant(Dude) @ http://antfarm.home.dhs.org / http://antfarm.ma.cx
/ /\ /\ \ Please nuke ANT if replying by e-mail privately. If credit-
| |o o| | ing, then please kindly use Ant nickname and URL/link.
\ _ /
( )
Ant
2018-08-09 14:50:53 UTC
Reply
Permalink
Raw Message
Post by Ant
...
Post by Ant
Oh, I forgot to mention this too. I also had a few seconds power outage
a couple days ago in the early morning that my PC went off and then on
(no UPS' batter for it). It had no problems! The issue only happened
when the PC was manually turned off for hours and then manually powered
on.
FYI. Earlier, I turned off my PC for about 40 minutes earlier for kicks,
and then turned it on. Also, video card's fan does seem to spin when
powering on the PC too since I felt its cool air coming out in my low
80F degrees room. No problems there. I'll need to try again, but for
hours. Maybe I'll retest tonight when I don't need to use this old PC.
No problems from this morning's five hours manual power off and on. This
one will be hard to reproduce as I expected. Hmm...
--
"... I should hope not. My question is do any of these little piss ants
know where you keep 'em?" --Henry R. "Hank" Schrader from Breaking Bad
S1E6 (Crazy Handful of Nothin')
Note: A fixed width font (Courier, Monospace, etc.) is required to see
this signature correctly.
/\___/\ If crediting, then use Ant nickname and URL/link.
/ /\ /\ \ Axe ANT from its address if e-mailing privately.
| |o o| | http://antfarm.ma.cx / http://antfarm.home.dhs.org
\ _ /
( )
Loading...